Tuesday, 23 June 2020

1. Improve readability of your code by following a style guide

2. Avoid sub-queries where possible and create temporary tables, views or use CTEs (common table expressions) like "WITH" (TBH I've never had to use recursive CTE before so you should be fine!)

3. Choose the right primary index column(s) - use columns you will join on or use in a group by and make sure the column has low skew i.e. there is a good distribution of values - to check, simply run `select potential_column, count(*) from table group by 1 order by 2 desc`

4. Know the limitations for temp tables in different systems - e.g. in T-SQL you can collect statistics on temporary tables whilst in Redshift, you need to create the table using DDL then use "into" syntax to retain the original column encodings - learn the intricacies of the system to unlock maximum efficiency

5. Don't trust that the optimiser will automatically do the right thing when joining huge tables - manually filter only the data you need and put them into correctly partitioned temporary tables, then join them!

Thursday, 27 February 2020

Knowledge Discovery Overview


Hello All,

I tried to provide an overview of stages involved in knowledge discovery in below video.
This is intended for non-monetary and free knowledge sharing/community mentoring purpose.

https://nusbact.com/2020/02/27/2019-2020-semester-2---session-2/



Thanks to NUS BACT[Business Analytics Consulting Team] for hosting the video on knowledge discovery.

Your feedback is highly appreciated for further improvement/refinement.

Thank you,


Best Regards,
Shriranga Kulkarni.
https://www.linkedin.com/in/shrirangakulkarni/