← Back to Course

Assignment 2: Attention Mechanisms & GPT-2 Implementation

🧠 Build the Foundation of ChatGPT!

Journey with AI pioneers to implement attention mechanisms and GPT-2 from scratch

🚀 Assignment Overview

This assignment has two main components:

  • Attention.py - Implement attention mechanisms from basic to multi-head (50 points)
  • gpt2_model.py - Build GPT-2 components with modern optimizations (50 points)

Complete all TODO sections in the starter code. The implementations should follow the hints and maintain the original function signatures.

Drop your file here or click to browse
Attention.py • Max 5MB
Complete implementation of BasicAttention, ScaledAttention, MultiHeadAttention, and create_causal_mask (50 points)
Drop your file here or click to browse
gpt2_model.py • Max 5MB
Complete implementation of QuickNorm, SwiGLU, CausalSelfAttention, TransformerLayer, and EfficientGPT2 (50 points)
📋 Submission Requirements:
  • Use the exact file names shown above (case-sensitive)
  • Complete all TODO implementations in the starter code
  • Maintain original function signatures and class names
  • Each file should not exceed 5MB