Statistical Analysis using SQL.
The mighty SELECT statement is quite versatile. In addition to cleaning/wrangling data, below are some of the ways it helps in initial statistical analysis:
Measures of Central Tendency:
➡Mean using aggregate function AVG
➡Median using window function PERCENTILE_CONT
➡Mode (single) using TOP and ORDER BY clause
➡Mode (bi-modal or multi-modal) using TOP (with TIES) and ORDER BY clause
Measures of Variability:
➡Range using aggregate functions MIN and MAX
➡Standard Deviation using statistical function STDEV
➡Variance using statistical function VAR
➡Quartiles using window function PERCENTILE_CONT
➡IQR/Z-Score using combination of Quartiles and CTE
Distribution Type:
➡Uniform (mean = median = mode)
➡right skewed (mean > median > mode)
or
➡left skewed (mean < median < mode)
Distribution Pattern:
➡Creating a Histogram (aka Frequency Distribution table) using CASE WHEN statement and CTE
Note: Syntax/Function name differs slightly depending on the SQL flavor.