5 real-world problems, explained step by step
All 5 problems at a glance — click to explore
Universal PCA formula (same for every problem)
Variance retained / accuracy — comparison
Original features (8 dimensions)
Step-by-step solution
What each component means
| Component | Driven by | Business meaning |
|---|---|---|
| PC1 | income, order_value, spending_score | Purchasing Power (wealth) |
| PC2 | page_views, email_open_rate | Digital Engagement |
| PC3 | age, days_since_last | Customer Lifecycle stage |
Key numbers
Choosing k — quality vs compression tradeoff
Step-by-step process
Hidden correlations (why PCA helps)
| Feature pair | Correlation | What PCA does |
|---|---|---|
| total_cholesterol + hdl + ldl | r = 0.95 | Merges into 1 component |
| blood_pressure + heart_rate | r = 0.78 | Partially merges |
| bmi + waist_hip_ratio | r = 0.82 | Merges into 1 component |
| glucose + triglycerides | r = 0.69 | Partially merges |
PCA components — each has a clinical meaning
Model performance: before vs after PCA
5 hidden factors PCA discovers
Factor 2 stock loadings (Tech vs Energy axis)
The scale of the problem
PCA discovers hidden spam topics
| Component | Top words | Spam topic |
|---|---|---|
| PC1 | free, win, prize, click, cash, winner | PRIZE_SPAM |
| PC2 | buy, cheap, discount, sale, deal, save | COMMERCIAL_SPAM |
| PC3 | viagra, pills, pharmacy, prescription, drug | PHARMA_SPAM |
| PC4 | bank, account, verify, password, login | PHISHING |
| PC5 | million, inheritance, nigeria, transfer, funds | SCAM_SPAM |
Accuracy vs number of components k