Tuesday 23 June 2020

1. Improve readability of your code by following a style guide

2. Avoid sub-queries where possible and create temporary tables, views or use CTEs (common table expressions) like "WITH" (TBH I've never had to use recursive CTE before so you should be fine!)

3. Choose the right primary index column(s) - use columns you will join on or use in a group by and make sure the column has low skew i.e. there is a good distribution of values - to check, simply run `select potential_column, count(*) from table group by 1 order by 2 desc`

4. Know the limitations for temp tables in different systems - e.g. in T-SQL you can collect statistics on temporary tables whilst in Redshift, you need to create the table using DDL then use "into" syntax to retain the original column encodings - learn the intricacies of the system to unlock maximum efficiency

5. Don't trust that the optimiser will automatically do the right thing when joining huge tables - manually filter only the data you need and put them into correctly partitioned temporary tables, then join them!

No comments:

Post a Comment