So, if you have worked with Tableau, this is a question you have either asked or have been asked (or at least) wondered about?
Why would you want to use a TDE (aka DATA Extracts)?
If you are in business side of things you will probably be wondering why you even need to make this decision. This is an IT decision, typically. isn’t it?
If you are on the IT end of the business (pun intended), you might say, what the heck?
I have to maintain another source of data?
so, let’s get some things straight today.
This article will help you understand the what, the why and the how if the TDEs.
well, at least my perspective on it, anyway.
WTH is a TDE?
so, very early in the development of the Tableau software, they found that their software wasn’t able to fulfil its potential in terms of performance, simply because of the data.
what do I me a by the data?
well, it so happens that DBAs are very smart people but it also happens that a lot of them don’t have databases that are tuned as best as they could. Heck, I will go as far as to say that a lot of those databases are very poorly maintained.
so what happens when you connect to them? well, you get used to taking long breaks.
this obviously didn’t sit well with Tableau.
so, what did they do? They did what great innovating companies do.
they created their own format for storing the data.
Thus was born TDE. A highly performing columnar based technology for storing data.
So, what do we have here?
I am providing some data about what a Tableau Data Extract may be able to do for you.
Consider a scenario
You are a data worker, you have 25M rows with a bunch of dimension (let’s say about 10) and you would like to do some analysis on it.
Now, you are a little bit technical, not to be a DBA but enough to read about databases, maybe install MySQL, and guess what?
Maybe, even load a MySQL table with that 25M rows that you had in a (…… wait for it….. an excel)
So, doing what a typical user like that would do, I loaded these 25M rows in a bunch of databases. And, I am presenting to you the analysis of a few of common queries you might run on this data across on four different databases and providing some results for you to ponder
I created 4 charts on this dataset
As you can see above, I have on the top-left a count of rows, then on top-right a trend over time, then on bottom-left, a stacked chart and bottom-right, a scatter plot.
Then, I ran this dashboard on all 4 databases and here are the results
Now, as you can see there are HUGE performance gains when you use a Tableau Data Extract in the scenario above. Quite frankly, you will probably get pretty awesome performance gains compared to most databases out there.
But, please please (I beg you) don’t use TDE as a silver bullet for your performance issues.
I have seen customers creating extracts without giving it a thought to “how to manage these data extracts?”
See, the TDE is a copy of your data and once you have created a copy, you need to think about ensuring you can trust this data as much as you can trust the original source.
This means things such as understanding
- when and when not to create an unnecessary copy of your data.
- how to validate this copy as “certified” so it can be trusted as a source of truth
- how to refresh this copy,
- and about 10 more reasons
In other words, don’t create a TDE chaos regardless of whether you decide to go with TDEs or not. These need to be managed just like any other data source.
I can not emphasize this enough. Please don’t create a mess that you can’t manage and then hand it over to someone else to manage.
(Oh, before I forget to add), I completely understand I didn’t put any effort into optimizing the tables that I loaded in the database and I understand that does make my results look more favourable for the TDE)
But, consider this, I value, time-to-market as a important variable when comparing any two tools, or techniques and in this case the amount of time to perfectly tune a few tables for a a combination of queries that may require me to create 5 indexes wasn’t worth spending. So. I did what I could in the least amount of time, which was
Load the data and then start analysing
Be safe out there