Natural Language Processing & Large Language Models Track

NLP & LLM Track

Goal

Focus: Master the complete lifecycle of Large Language Models-from foundational architectures and embeddings to building production-ready applications with RAG, agents, and fine-tuning.

Learn how language models understand, generate, and reason with text, and develop the skills to evaluate, optimize, and deploy LLM systems in real-world scenarios.

Curriculum

1. Text Representation & Classical NLP

Key Topics:

Text preprocessing techniques: tokenization, normalization, stopword removal
Bag-of-Words representation and TF-IDF weighting
N-grams and basic statistical language modeling intuition
Vocabulary construction, dimensionality growth, and sparsity issues
Core limitations of classical NLP (loss of context, semantics, scalability)

Action Items:

Implement a full preprocessing pipeline on a sample dataset
Build BoW and TF-IDF vectors and compare feature distributions
Experiment with different n-gram sizes and analyze performance impact
Measure vocabulary size growth and sparsity levels
Compare classical feature-based models with embedding-based approaches

course

beginner

Stanford XCS224U: NLU I Contextual Word Representations, Part 1: Guiding Ideas

4-5 hours

tutorial

beginner

Scikit-learn Text Feature Extraction Guide

3-4 hours

2. Word Embeddings & Distributional Semantics

Key Topics:

Distributional hypothesis and meaning-from-context principle
Word2Vec architectures: CBOW vs Skip-gram
Negative sampling and efficient training approximation
GloVe model intuition and global co-occurrence learning
Embedding geometry, cosine similarity, and semantic relationships
Bias and social artifacts encoded in vector representations

Action Items:

Train Word2Vec on a small corpus using CBOW and Skip-gram
Compare embedding quality using similarity queries
Visualize embeddings with PCA or t-SNE
Experiment with negative sampling parameters
Evaluate and detect bias patterns in trained embeddings

course

intermediate

3. Attention Mechanisms & Transformers

Key Topics

Core intuition behind attention and weighted context aggregation
Self-attention and token-to-token interaction modeling
Query, Key, Value (QKV) formulation and attention score computation
Positional encoding and sequence order representation
Transformer encoder and decoder architecture components
Computational efficiency and scalability advantages over RNNs

Action Items

Implement a basic self-attention layer from scratch
Visualize attention weight matrices for sample inputs
Experiment with positional encodings and observe output changes
Build a minimal Transformer encoder block
Training speed and parallelism with transformer models

course

intermediate

Stanford CS231N Deep Learning I 2025 (Lecture 8)

2-3 hours

paper

intermediate

Attention Is All You Need

2-3 hours

tutorial

intermediate

Transformer Neural Networks, ChatGPT's foundation

2-3 hours

4. Large Language Models (LLMs)

Key Topics:

Language modeling objective and next-token prediction principle
End-to-end LLM training pipeline (data, architecture, optimization)
Pretraining vs fine-tuning roles and interaction
Causal (GPT-style) vs masked (BERT-style) language models
Scaling laws and performance vs compute/data trade-offs
Emergent behaviors from scale (reasoning, in-context learning)
Common limitations: hallucinations, bias, brittleness, context limits

Action Items:

Implement a mini GPT-style model on a small dataset
Compare causal vs masked model behavior on the same task
Run scaling experiments with different model sizes
Analyze failure cases and hallucinated outputs
Benchmark fine-tuned vs base model performance

course

intermediate

Building GPT from scratch (Andrew Karpathy)

3-4 hours

course

intermediate

Large Language Models from scratch (Book) (Chapter 3~5)

3-4 hours

5. Fine-Tuning Methods

Key Topics:

Difference between pretraining, fine-tuning, and prompting
Full fine-tuning vs parameter-efficient fine-tuning (PEFT) trade-offs
LoRA fundamentals and low-rank adaptation mechanism
QLoRA and quantized training for memory-efficient scaling
GPU memory, compute, and precision optimization strategies
Training stability, overfitting risks, and regularization techniques
Deployment strategies for fine-tuned and adapter-based models

Action Items:

Compare outputs of base, prompted, and fine-tuned models
Train a small LoRA adapter and measure VRAM usage
Run one QLoRA experiment on limited hardware
Monitor training vs validation loss to detect overfitting
Export and benchmark a deployment-ready fine-tuned model

course

intermediate

Fundamentals of LLM Fine-Tuning

4-6 hours

paper

advanced

LoRA: Low-Rank Adaptation of Large Language Models

2-3 hours

paper

advanced

QLoRA: Efficient Finetuning of Quantized LLMs

2-3 hours

docs

advanced

Practical Fine-Tuning with Unsloth

2-3 hours

6. RAG and AI Agents orchestration

Key Topics

Retrieval-Augmented Generation (RAG) pipelines
Document chunking strategies and context window optimization
Embeddings and vector database integration
Tool calling, function execution, and agent-based workflows
Context management, memory layers, and session handling
Single-agent vs multi-agent system design
Planning, execution loops, and coordination strategies

Action Items

Build a basic RAG pipeline using LangChain
Experiment with different chunk sizes and overlap strategies
Integrate a vector database and benchmark retrieval quality
Implement tool calling for external API interaction
Design a two-agent system with specialized roles

tutorial

advanced

Build a RAG agent with LangChain

3-4 hours

course

advanced

AI Agent Orchestration with CrewAI

4-6 hours

Capstone Project

Build an advanced LLM-powered system with RAG, vector databases, multi-agent orchestration, and a web interface for retrieval and intelligent interaction.

Goal​

Curriculum​

1. Text Representation & Classical NLP​

Stanford XCS224U: NLU I Contextual Word Representations, Part 1: Guiding Ideas

Scikit-learn Text Feature Extraction Guide

2. Word Embeddings & Distributional Semantics​

Lecture 2: Word Vectors and Word Senses

Word2Vec Paper

GloVe Paper (Conceptual Overview)

Practical Word Embeddings with Gensim

3. Attention Mechanisms & Transformers​

Stanford CS231N Deep Learning I 2025 (Lecture 8)

Attention Is All You Need

Transformer Neural Networks, ChatGPT's foundation

4. Large Language Models (LLMs)​

Building GPT from scratch (Andrew Karpathy)

Large Language Models from scratch (Book) (Chapter 3~5)

5. Fine-Tuning Methods​

Fundamentals of LLM Fine-Tuning

LoRA: Low-Rank Adaptation of Large Language Models

QLoRA: Efficient Finetuning of Quantized LLMs

Practical Fine-Tuning with Unsloth

6. RAG and AI Agents orchestration​

Build a RAG agent with LangChain

AI Agent Orchestration with CrewAI

Capstone Project​

Goal

Curriculum

1. Text Representation & Classical NLP

2. Word Embeddings & Distributional Semantics

3. Attention Mechanisms & Transformers

4. Large Language Models (LLMs)

5. Fine-Tuning Methods

6. RAG and AI Agents orchestration

Capstone Project