OK, we come to highly highly highly (yes, its worth saying it 3 times) issue with this one.
COUNTD is an awesome function. This is a great way to get visibility into your business like its nobody’s business. COUNTD is typically done at run-time as well.
In business parlance, it means that if you want to understand your “number of unique customers for month A” and then compare it to “number of unique customers for month B” and also have “number of unique customers for both months A&B together”, you will want to use COUNTD
example, I have 100 unique customers in Jan, I also have 90 unique customers in Feb.
Now, if I have 180 unique customers across those 2 months, I have an issue. We don’t have a lot of repeat customers
However, if I have 110 unique customers across those 2 months, I don’t have a lot of new customers. For this type of analysis, COUNTD is typically not stored at the database layer as you can’t predict how you would want to analyze the data. So, COUNTD is typically done at the time of analysis.
BUT (yes, there is always a BUT), COUNTD is an expensive operation. This requires your application to basically sort the entire data and then COUNT at the time of analysis. My advice in this case is you understand your data and if you have too many unique values (let’s say over 100K), then stay away from COUNTD
Or, at least make your audience be aware that COUNTD can be sometimes slow but it can provide you with very useful information. You might be surprised how many business users will be happy just understanding the reason behind the possible slowness of this operation