Natural Language Processing & Large Language Models Track

Goal
Focus: Master the complete lifecycle of Large Language Models-from foundational architectures and embeddings to building production-ready applications with RAG, agents, and fine-tuning.
Learn how language models understand, generate, and reason with text, and develop the skills to evaluate, optimize, and deploy LLM systems in real-world scenarios.
Curriculum
1. Text Representation & Classical NLP
Key Topics:
- Text preprocessing techniques: tokenization, normalization, stopword removal
- Bag-of-Words representation and TF-IDF weighting
- N-grams and basic statistical language modeling intuition
- Vocabulary construction, dimensionality growth, and sparsity issues
- Core limitations of classical NLP (loss of context, semantics, scalability)
Action Items:
- Implement a full preprocessing pipeline on a sample dataset
- Build BoW and TF-IDF vectors and compare feature distributions
- Experiment with different n-gram sizes and analyze performance impact
- Measure vocabulary size growth and sparsity levels
- Compare classical feature-based models with embedding-based approaches
course
beginnerStanford CS224N — NLP with Deep Learning (Lectures 1~2)
tutorial
beginnerScikit-learn Text Feature Extraction Guide
2. Word Embeddings & Distributional Semantics
Key Topics:
- Distributional hypothesis and meaning-from-context principle
- Word2Vec architectures: CBOW vs Skip-gram
- Negative sampling and efficient training approximation
- GloVe model intuition and global co-occurrence learning
- Embedding geometry, cosine similarity, and semantic relationships
- Bias and social artifacts encoded in vector representations
Action Items:
- Train Word2Vec on a small corpus using CBOW and Skip-gram
- Compare embedding quality using similarity queries
- Visualize embeddings with PCA or t-SNE
- Experiment with negative sampling parameters
- Evaluate and detect bias patterns in trained embeddings
course
intermediateLecture 2: Word Vectors and Word Senses
paper
intermediateWord2Vec Paper
paper
intermediateGloVe Paper (Conceptual Overview)
tutorial
intermediatePractical Word Embeddings with Gensim
3. Attention Mechanisms & Transformers
Key Topics
- Core intuition behind attention and weighted context aggregation
- Self-attention and token-to-token interaction modeling
- Query, Key, Value (QKV) formulation and attention score computation
- Positional encoding and sequence order representation
- Transformer encoder and decoder architecture components
- Computational efficiency and scalability advantages over RNNs
Action Items
- Implement a basic self-attention layer from scratch
- Visualize attention weight matrices for sample inputs
- Experiment with positional encodings and observe output changes
- Build a minimal Transformer encoder block
- Training speed and parallelism with transformer models
course
intermediateStanford CS231N Deep Learning I 2025 (Lecture 8)
paper
intermediateAttention Is All You Need
tutorial
intermediateTransformer Neural Networks, ChatGPT's foundation
4. Large Language Models (LLMs)
Key Topics:
- Language modeling objective and next-token prediction principle
- End-to-end LLM training pipeline (data, architecture, optimization)
- Pretraining vs fine-tuning roles and interaction
- Causal (GPT-style) vs masked (BERT-style) language models
- Scaling laws and performance vs compute/data trade-offs
- Emergent behaviors from scale (reasoning, in-context learning)
- Common limitations: hallucinations, bias, brittleness, context limits
Action Items:
- Implement a mini GPT-style model on a small dataset
- Compare causal vs masked model behavior on the same task
- Run scaling experiments with different model sizes
- Analyze failure cases and hallucinated outputs
- Benchmark fine-tuned vs base model performance
course
intermediateBuilding GPT from scratch (Andrew Karpathy)
course
intermediateLarge Language Models from scratch (Book) (Chapter 3~5)
5. Fine-Tuning Methods
Key Topics:
- Difference between pretraining, fine-tuning, and prompting
- Full fine-tuning vs parameter-efficient fine-tuning (PEFT) trade-offs
- LoRA fundamentals and low-rank adaptation mechanism
- QLoRA and quantized training for memory-efficient scaling
- GPU memory, compute, and precision optimization strategies
- Training stability, overfitting risks, and regularization techniques
- Deployment strategies for fine-tuned and adapter-based models
Action Items:
- Compare outputs of base, prompted, and fine-tuned models
- Train a small LoRA adapter and measure VRAM usage
- Run one QLoRA experiment on limited hardware
- Monitor training vs validation loss to detect overfitting
- Export and benchmark a deployment-ready fine-tuned model
course
intermediateFundamentals of LLM Fine-Tuning
paper
advancedLoRA: Low-Rank Adaptation of Large Language Models
paper
advancedQLoRA: Efficient Finetuning of Quantized LLMs
docs
advancedPractical Fine-Tuning with Unsloth
6. RAG and AI Agents orchestration
Key Topics
- Retrieval-Augmented Generation (RAG) pipelines
- Document chunking strategies and context window optimization
- Embeddings and vector database integration
- Tool calling, function execution, and agent-based workflows
- Context management, memory layers, and session handling
- Single-agent vs multi-agent system design
- Planning, execution loops, and coordination strategies
Action Items
- Build a basic RAG pipeline using LangChain
- Experiment with different chunk sizes and overlap strategies
- Integrate a vector database and benchmark retrieval quality
- Implement tool calling for external API interaction
- Design a two-agent system with specialized roles
tutorial
advancedBuild a RAG agent with LangChain
course
advancedAI Agent Orchestration with CrewAI
Capstone Project
Build an advanced LLM-powered system with RAG, vector databases, multi-agent orchestration, and a web interface for retrieval and intelligent interaction.