Data Science Track

Goal

Focus: Extracting truth from noise.

This phase emphasizes statistical reasoning and machine learning techniques used to distinguish real signals from randomness, uncertainty, and misleading patterns in data.

Curriculum

Inferential Statistics

Topics:

Hypothesis Testing
Two Population Parameters
Population Variances
ANOVA

book

advanced

Business Statistics: A Decision-Making Approach (10th Ed.)

20-30 hours

This section covers the statistical tools used to draw conclusions about populations based on sampled data. Learners study how assumptions, variability, and uncertainty affect conclusions, and how to formally test claims using data.

Emphasis is placed on understanding:

How hypotheses are formulated and tested
How to compare groups and populations
How variance impacts reliability of conclusions
When observed differences are statistically meaningful

Evaluation Metrics

article

beginner

Metrics for Machine Learning Model

30-45 minutes

Metrics covered:

Precision
Recall
F1-Score
Brier Score
Matthews Correlation Coefficient (MCC)
Additional classification and regression metrics as needed

This section deepens understanding of how model performance is measured, especially in noisy or imbalanced datasets. Learners compare metrics, interpret trade-offs, and analyze how different metrics highlight different types of model errors.

Techniques:

K-Means
DBSCAN

This section introduces methods for discovering structure in unlabeled data. Learners study how clustering algorithms group data points based on similarity and density, and how parameter choices affect results.

Topics include:

Distance measures and similarity
Cluster shapes and density assumptions
Strengths and limitations of each method

Visualization

course

beginner

Python for Data Visualization

4-6 weeks

Visualization is used throughout the curriculum to support interpretation and communication of results. Learners practice using visual tools to explore data distributions, cluster structures, and model outputs, helping to surface patterns and anomalies that may not be obvious numerically.

Machine Learning

course

advanced

Stanford CS229: Machine Learning Course

20-30 hours

article

beginner

Dimensionality Reduction in Machine Learning

15-20 minutes

This section introduces core machine learning concepts at a theoretical level, including supervised and unsupervised methods, optimization objectives, and generalization. The focus is on understanding assumptions, trade-offs, and mathematical foundations rather than implementation details.

Capstone

Apply your data science skills to a mentor-provided real-world problem integrating statistics, evaluation, and machine learning techniques.

Goal​

Curriculum​

Inferential Statistics​

Business Statistics: A Decision-Making Approach (10th Ed.)

Evaluation Metrics​

Metrics for Machine Learning Model

Unsupervised Learning – Clustering​

An Introduction to Statistical Learning

K-Means Clustering

DBSCAN Clustering

Visualization​

Python for Data Visualization

Machine Learning​

Stanford CS229: Machine Learning Course

Dimensionality Reduction in Machine Learning

Capstone​

Goal

Curriculum

Inferential Statistics

Evaluation Metrics

Unsupervised Learning – Clustering

Visualization

Machine Learning

Capstone