So, you might have heard that we launched multiple table extract functionality in 2018.3
I wanted to share my experiences with the beta build so far. Who knows the actual build might make this experience even better 🙂
Data Setup
Let’s understand the tables first
- There are total of 2.7 million rows in the Security_SalesFact_Tall table (each row is an order)
- Each order has a zip-code attribute and an employee attribute
- There is an alignment table (Security_ZipEmp_Alignment)
- that has emp to zip-code assignments (2445 rows)
- The OrgMaster_Employees has list of all employees
- more than 300k+
- Out of these 300k+ rows, only 603 employees has order data
- The Ref_ZipCodes has list of all zip-codes
- more than 42k+
- Out of 42k+ zip-codes, only 579 zip-codes have orders
Prior to this functionality
Extracted this data using the “Single table”
- creates a total of 11.4 million rows
- create a .hyper file of 275 MB
- takes 500 seconds to create
With this functionality
Extracted this data using the “Multiple table”
- creates a total of 3 million rows (you see that during extract creation, the count goes up to 2.7m (fact table), then starts from 0 and goes up to 300k (emp dim)… you get the picture)
- when you do SUM([Number of Records]), you will see exact same number as in the denormalized extract (11.4 m)
- create a .hyper file of 30M
- takes 80 seconds to create
Quick Comparison
Faster extract creation – 84% reduction in this case
Smaller extract sizes – 90% reduction in this case
- The queries are returning the same results
- Examples below with
All Data
With One State
With One Employee
Summary
I am loving the first implementation of multiple tables so far, that exceeded the expectation I had of a beta release