Data Science Track

Goal
Focus: Extracting truth from noise.
This phase emphasizes statistical reasoning and machine learning techniques used to distinguish real signals from randomness, uncertainty, and misleading patterns in data.
Curriculum
Inferential Statistics
Topics:
- Hypothesis Testing
- Two Population Parameters
- Population Variances
- ANOVA
Business Statistics: A Decision-Making Approach (10th Ed.)
This section covers the statistical tools used to draw conclusions about populations based on sampled data. Learners study how assumptions, variability, and uncertainty affect conclusions, and how to formally test claims using data.
Emphasis is placed on understanding:
- How hypotheses are formulated and tested
- How to compare groups and populations
- How variance impacts reliability of conclusions
- When observed differences are statistically meaningful
Evaluation Metrics
Metrics for Machine Learning Model
Metrics covered:
- Precision
- Recall
- F1-Score
- Brier Score
- Matthews Correlation Coefficient (MCC)
- Additional classification and regression metrics as needed
This section deepens understanding of how model performance is measured, especially in noisy or imbalanced datasets. Learners compare metrics, interpret trade-offs, and analyze how different metrics highlight different types of model errors.
Unsupervised Learning – Clustering
An Introduction to Statistical Learning
K-Means Clustering
DBSCAN Clustering
Techniques:
- K-Means
- DBSCAN
This section introduces methods for discovering structure in unlabeled data. Learners study how clustering algorithms group data points based on similarity and density, and how parameter choices affect results.
Topics include:
- Distance measures and similarity
- Cluster shapes and density assumptions
- Strengths and limitations of each method
Visualization
Python for Data Visualization
Visualization is used throughout the curriculum to support interpretation and communication of results. Learners practice using visual tools to explore data distributions, cluster structures, and model outputs, helping to surface patterns and anomalies that may not be obvious numerically.
Machine Learning
Stanford CS229: Machine Learning Course
Dimensionality Reduction in Machine Learning
This section introduces core machine learning concepts at a theoretical level, including supervised and unsupervised methods, optimization objectives, and generalization. The focus is on understanding assumptions, trade-offs, and mathematical foundations rather than implementation details.
Capstone
Apply your data science skills to a mentor-provided real-world problem integrating statistics, evaluation, and machine learning techniques.