Notice:
This post is older than 5 years – the content might be outdated.
We frequently help our customers implement data platforms on a grand scale: as a backend for user-facing applications, for business analytics or data science and machine learning projects. Two common trends across different business areas are (1) a growing amount of data arriving with high delivery speed, i.e. data streams of various formats and (2) the requirement to perform complex analytic slice-and-dice queries on recent and historic datasets.
One possible technical solution to address these needs is Apache Druid, a highly scalable distributed data store, optimized for event-oriented data and real-time analytics. Being a kind of crossover between a timeseries database, a search index and an analytical / OLAP database, Druid enables large-scale analytics on streaming data among others at AirBnB, Lyft and Criteo.
We at inovex have successfully used Druid in various projects and gained experience in building productive applications with it—especially regarding the tradeoff between its enhanced capabilities and the complexity of a distributed system with different kinds of services. However, this blog post does not focus around Druid itself—feel free to contact us for a in-depth discussion, or refer to our case study or the official documentation.
Instead, the focus of this post is a crucial aspect of interactive exploration and analysis of data which we see in almost every project – namely which (graphical) frontend is used. This may typically be the turf of „classic“ Business Intelligence platforms like Tableau or PowerBI; some of these (like e.g. MetaBase) also offer connectivity to Druid. Apart from that, specialized frameworks have been developed which are more tightly integrated and more specifically tailored towards the novel paradigms offered by Druid. One of the most recent options in this space is Turnilo. In this post, we introduce Turnilo, explain its configuration and usage and share our evaluation outcome. For completeness, we also provide a list of current alternatives.
Introduction / History
Until November 2016, Pivot, a graphical interface mainly developed by Druid’s co-authors, was available as open source software. As Pivot became a commercial product and closed source, the Polish e-commerce platform Allegro adopted a fork of the latest open version. Now under the new name of Turnilo it is being developed further, openly available under the Apache License.
Technical Overview
Turnilo is a simple web application written in TypeScript that runs everywhere Node.js 8.x or 10.x (and npm) is installed. It is specifically tailored to Druid and does not connect to other databases as of now. However, it is possible to load static files and inspect them with Turnilo.
In Turnilo terminology, users inspect data cubes, which mirror Druid data sources (or a static file). Dimensions and measures correspond to Druid’s dimensions and metrics. Dimensions can be split and filtered on, which is similar to a GROUP BY and WHERE clause in SQL.
Upon connecting to Druid, Turnilo automatically scans for data sources and their specifications, which helps with the initial setup. A powerful YAML file contains all the configuration of available data sources. The Plywood expression language can be used to create custom dimensions and metrics that are not already present in the underlying Druid data source. Even simple aggregations like average or min/max values for specific metrics need to be defined here.
With Turnilo, it is possible to explore data already stored in the connected Druid cluster; it cannot be leveraged to set up new data streams or batch uploads.
Turnilo doesn’t feature any user or access management. The frontend, and therefore all configured data sources, are openly and directly available.
It generates URLs that directly link to specific views, so sharing them with other users is quite simple. Though it is not possible to combine multiple views into custom dashboards.
The fast, tidy and self-explanatory frontend works mostly via point-and-click or drag-and-drop. It doesn’t allow for much customization other than the data drill-down. Aside from plain numbers (totals) and tables, only the most basic types of data visualizations are included: bar or line charts.
Installation and Configuration
The installation is very simple and well documented in the project’s README, so we’ll skip that part.
You can either start Turnilo with a configuration provided as a YAML file:
1 2 3 |
turnilo --config config.yaml |
Or you can start it with just the broker hostname and port of your Druid cluster and it will automatically scan the available data sources:
1 2 3 |
turnilo --druid :8082 |
It is a good idea to leverage the automatic scan, save the resulting configuration file and adjust it to your needs. You can retrieve it like so:
1 2 3 |
turnilo --druid :8082 --print-config > config.yaml |
The YAML file contains all the settings for both the server and the frontend (see the official documentation for an overview). Aside from some general connection parameters and default values, it defines the data cubes, their columns and aggregations that will be available to the user.
This is both a blessing and a curse. It means that every dimension and measure of a data source needs to be listed here, which in its most basic form looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
dataCubes: - name: wikipedia title: wikipedia clusterName: druid source: wikipedia refreshRule: rule: query defaultSortMeasure: added introspection: autofill-all attributeOverrides: dimensions: - name: __time title: Time kind: time formula: $__time - name: isAnonymous title: Is Anonymous formula: $isAnonymous - name: comment title: Comment formula: $comment - name: commentLength title: Comment Length formula: $commentLength - name: namespace title: Namespace formula: $namespace - name: page title: Page formula: $page - name: user title: User formula: $user [...] measures: - name: added title: Added formula: $main.sum($added) - name: deleted title: Deleted formula: $main.sum($deleted) - name: delta title: Delta formula: $main.sum($delta) |
Take a look at the formula
parameters. These define how columns should be calculated. For this purpose, Turnilo makes use of Plywood expressions. Plywood is a JavaScript library that acts as a middle-layer between data visualizations and data stores. It simplifies data queries with its chainable and extensible expression language.
In the basic example above, all dimensions simply mirror the existing Druid dimensions, so the formula
s are plain selectors. The measures, as assumed by default, are sum
aggregations of the respective Druid metrics.
You may remove and add dimensions and measures according to your needs. In fact, you have to define every aggregation that the user may select in the frontend as a measure in the config file. As we describe in the next paragraph, the frontend doesn’t include custom ad hoc aggregations.
On one hand, this means that you need to know ahead of time which information the users are going to be interested in. On the other hand, it enables you to define very specifically tailored measures and dimensions. The available Plywood expressions are already a powerful tool for this, plus, you can even add your own custom JavaScript aggregation functions.
Note that JavaScript needs to be enabled in Druid in order to make use of Plywood expressions and custom aggregations.
Usage
Since all of the configuration happens in the backend, Turnilo’s frontend is considerably minimalistic. Aside from a home screen where the available data cubes are listed, the only view is the analysis page.
To explore the data cube, drag and drop columns from the left side to the filter, split or measure section at the top. Filters and splits (comparable to WHERE and GROUP BY clauses) work on dimensions. Measures are the values that will be aggregated over the selected data. Remember that aggregation functions need to be pre-defined in the configuration file.
On the right side, the pinboard provides a convenient way to quickly access frequently used filter dimensions and toggle specific values. To make the most of this feature, define defaultPinnedDimensions
in the config file, so that they are available when the user first opens the data cube. Though pinned dimensions can always be added and removed in the frontend.
If you select multiple splits, an additional legend appears on the right side.
To zoom in on an interesting time frame, simply drag an interval across the chart.
A useful feature is the time shift, which enables comparison of a timeframe with a previous one. You can define a shift interval using ISO_8601 Duration Expressions like P1D
for one day, P2Y
for two years and so on.
Turnilo then displays the previous value (here: one hour earlier) next to the current one, as well as the absolute and relative difference.
Because every view change generates a new URL—that you can use to share your findings with others—you can navigate between them using the browser back and forward buttons.
You may set an auto-update interval of 30 minutes up to at minimum 5 seconds. Thus, Turnilo will frequently refresh the charts with the latest Druid data. This is particularly useful when the time filter is set to some latest interval.
Finally, hidden behind the gear icon in the top right corner, you always have the option to display the raw data for the current selection, and export it as CSV or TSV file.
All in all, the interface is quite self-explanatory. If your current data selection does not fit the chosen chart type, Turnilo will let you know why.
At times, Turnilo seems a bit limited in its options. For instance, there are only four types of visualization: Plain numbers, tables, line and bar charts. It is neither possible to adjust the axes nor the grouping or color scheme of bar charts with multiple splits.
However, its simplicity makes Turnilo very fast and therefore fun to use. During our evaluation, we didn’t see any inexplicable behaviour or bugs in the frontend.
Alternatives to Turnilo
Currently, Turnilo is the only interface that is specifically tailored to Druid, aside from the now closed-source Pivot. However, there are other data analytics solutions that also integrate with Druid. Below is a (non-exhaustive) list of alternatives with short descriptions to give you a quick overview.
Metabase
Freely available for AWS and Heroku, as a Docker image, plain .jar file or .dmg app for macOS. Also available under a commercial license for on-premises installation, with more features and support. Provides user and access management, and a very friendly, guiding interface. Data explorations are based on „questions“ that the user can ask, no SQL needed. Documentation seems helpful, though a little odd to navigate. Integrates with Druid and a handful of other common databases. – metabase.com
Superset
Probably the most widely used option. Has a modern look and feel which may be typical for recent Apache Incubator projects, including an extensive but partially unclear documentation. Integrates with Druid and most relational databases, provides a great variety of visualization options. Quite a complex, because feature-rich interface that is more suitable for fixed dashboards than for ad-hoc explorations. Access can be managed with a set of predefined user roles. Available as a Docker image or python module for bottom-up installation. – superset.incubator.apache.org.
Pivot (now part of Imply)
Similar to Turnilo but with more features. Includes more visualization options, the ability to create dashboards and access management with custom user roles. Only available as part of Imply, which can be used with an Imply Cloud account or installed on-premises, both fee-based.
Redash
The only one of the listed options that requires SQL for data selection. Queries can be saved and combined in dashboards, with a sufficient set of visualization types. Clean interface, helpful knowledge base. Integrates with a great set of data stores, provides user groups but relies on the database’s security model for access limitation. Open source, available as a hosted service with three different pricing models or for your own setup with Docker, AWS or Google Compute Engine. – redash.io.
Summary
To sum up our experience with Turnilo, we see the following advantages and disadvantages:
✅ Pro
- Little overhead, easy installation
- Easy to use (frontend)
- Powerful expression language for customization
- Open source, under active development
✖️ Con
- Limited visualization options
- Limited options for ad-hoc explorations
- Not suitable for real-time dashboards
- Missing user/access management
With this in mind, we think Turnilo may be a good fit …
- … to quickly get an overview of the data stored in Druid
- … for rather static reports with recurring questions
- … for analysts familiar with JavaScript and/or Plywood expressions
- … if you don’t need to integrate with other database technologies
- … if you don’t need dashboards with multiple visualizations
To wrap up, naturally the choice of an appropriate real-time analytics frontend also strongly depends on the individual needs of each project. While other tools like Superset, Pivot and Metabase may be more widely known, Turnilo is still in an early stage, but in our opinion it is worth to keep an eye Turnilo for promising future developments.
Read on
Have a look at our analytics/BI offering or consider joining us as a BI consultant.
Disclaimer: I’m a member of Turnilo dev team 🙂
Thanks for article, I really like your opinions! Feel free to create issues about missing features but please keep in mind Turnilo manifesto (high usability for non-technical users over sophisticated but rarely used features).
Could you also elaborate about „Not suitable for real-time dashboards“? For sure Turnilo is not suitable for dashboards but it plays nicely with realtime data, visualisations are updated constantly as new data is ingested into Druid (e.g from Kafka).