Skip to main content

Data Science Track

Data Science Track

Goal

Focus: Extracting truth from noise.

This phase emphasizes statistical reasoning and machine learning techniques used to distinguish real signals from randomness, uncertainty, and misleading patterns in data.

Curriculum

Inferential Statistics

Topics:

  • Hypothesis Testing
  • Two Population Parameters
  • Population Variances
  • ANOVA
book
advanced

Business Statistics: A Decision-Making Approach (10th Ed.)

20-30 hours

This section covers the statistical tools used to draw conclusions about populations based on sampled data. Learners study how assumptions, variability, and uncertainty affect conclusions, and how to formally test claims using data.

Emphasis is placed on understanding:

  • How hypotheses are formulated and tested
  • How to compare groups and populations
  • How variance impacts reliability of conclusions
  • When observed differences are statistically meaningful

Evaluation Metrics

article
beginner

Metrics for Machine Learning Model

30-45 minutes

Metrics covered:

  • Precision
  • Recall
  • F1-Score
  • Brier Score
  • Matthews Correlation Coefficient (MCC)
  • Additional classification and regression metrics as needed

This section deepens understanding of how model performance is measured, especially in noisy or imbalanced datasets. Learners compare metrics, interpret trade-offs, and analyze how different metrics highlight different types of model errors.

Unsupervised Learning – Clustering

book
intermediate

An Introduction to Statistical Learning

Comprehensive book
video
beginner

K-Means Clustering

10-15 minutes
video
beginner

DBSCAN Clustering

10-15 minutes

Techniques:

  • K-Means
  • DBSCAN

This section introduces methods for discovering structure in unlabeled data. Learners study how clustering algorithms group data points based on similarity and density, and how parameter choices affect results.

Topics include:

  • Distance measures and similarity
  • Cluster shapes and density assumptions
  • Strengths and limitations of each method

Visualization

course
beginner

Python for Data Visualization

4-6 weeks

Visualization is used throughout the curriculum to support interpretation and communication of results. Learners practice using visual tools to explore data distributions, cluster structures, and model outputs, helping to surface patterns and anomalies that may not be obvious numerically.

Machine Learning

course
advanced

Stanford CS229: Machine Learning Course

20-30 hours
article
beginner

Dimensionality Reduction in Machine Learning

15-20 minutes

This section introduces core machine learning concepts at a theoretical level, including supervised and unsupervised methods, optimization objectives, and generalization. The focus is on understanding assumptions, trade-offs, and mathematical foundations rather than implementation details.

Capstone

Apply your data science skills to a mentor-provided real-world problem integrating statistics, evaluation, and machine learning techniques.