Pooja Chavan
Feb 1, 2024

Statistical Analysis using SQL.

The mighty SELECT statement is quite versatile. In addition to cleaning/wrangling data, below are some of the ways it helps in initial statistical analysis:

Measures of Central Tendency:
➡Mean using aggregate function AVG
➡Median using window function PERCENTILE_CONT
➡Mode (single) using TOP and ORDER BY clause
➡Mode (bi-modal or multi-modal) using TOP (with TIES) and ORDER BY clause

Measures of Variability:
➡Range using aggregate functions MIN and MAX
➡Standard Deviation using statistical function STDEV
➡Variance using statistical function VAR
➡Quartiles using window function PERCENTILE_CONT
➡IQR/Z-Score using combination of Quartiles and CTE

Distribution Type:
➡Uniform (mean = median = mode)
➡right skewed (mean > median > mode)
or
➡left skewed (mean < median < mode)

Distribution Pattern:
➡Creating a Histogram (aka Frequency Distribution table) using CASE WHEN statement and CTE

Note: Syntax/Function name differs slightly depending on the SQL flavor.

Pooja Chavan

Data Cleaning/Wrangling/Visualization | Excel | SQL |Tableau