← Back to Course

Assignment 1: Text Processing & Tokenization

Required Files for Submission

Upload all four required files below. Each file must follow the exact naming format specified.

Drop your file here or click to browse
simple_tokenizer.py • Max 5MB
Phase 1: Word-level tokenizer implementation (25 points)
Drop your file here or click to browse
regex_tokenizer.py • Max 5MB
Phase 2: Pattern-based tokenizer implementation (25 points)
Drop your file here or click to browse
bpe_tokenizer.py • Max 5MB
Phase 3: Byte-Pair Encoding implementation (25 points)
Drop your file here or click to browse
performance_analyzer.py • Max 5MB
Phase 4: Performance analysis and comparison (25 points)
File Naming Requirements:
  • Use the exact file names shown above
  • File names are case-sensitive
  • Each file should not exceed 5MB