Scalable Data Mining (CS60021)

Instructor: Sourangshu Bhattacharya

Teaching Assistants: Kiran Purohit, Anurag Parvathgari

Class Schedule: Monday (8:00 - 9:55), Tuesday (12:00 - 12:55)

Classroom: CSE - 107

Last year course website: https://cse.iitkgp.ac.in/~sourangshu/coursefiles/cs60021_2022a.html

Announcements:

Course Schedule:

Week Dates Topic / Activity Links / Material
Week 0 1/8 Introduction Slides
Week 1 7/8, 8/8 Introduction to DM, ML, Stochastic gradient descent. Slides - Intro to ML, Slides - SGD + Acceleration
Week 2 + 3 14/8, 21/8, 22/8 SGD convergence rate, Pytorch Slides - SGD Convergence, Slides - Pytorch
Week 4 28/8, 29/8 Distributed Optimization, ADMM Slides - ADMM,
Week 5 4/9, 5/9 Map-reduce framework, Hadoop Slides - Hadoop,
Week 6 11/9, 12/9 Spark Slides - Spark,
Week 7+8 3/10, 9/10, 10/10 Locality Sensitive Hashing Slides - Shingling, Minhash, LSH, Gap - LSH
Multi-probe LSH
Week 9 16/10, 17/10 ANNS - HNSW Slides - HNSW
Week 10 30/10, 1/11 Streaming - Sampling, Set Membership, Distinct Count Slides - Sampling, Bloom Filter
Slides - Count distinct, Flajolet Martin
Week 11 6/11, 7/11 Streaming - Frequency Counting Slides - Misra Gries, Space saving, Count-min, Count sketch
Week 12 13/11, 14/11 Subset Selection Slides - Subset Selection, Submodular Optimization

Syllabus:

Software paradigms:

Optimization and Machine learning algorithms:

Algorithmic techniques:

References:

  1. Mining of Massive Datasets. 2nd edition. - Jure Leskovec, Anand Rajaraman, Jeff Ullman. Cambridge University Press. http://www.mmds.org/
  2. Tensorflow for Machine Intelligence: A hands-on introduction to learning algorithms. Sam Abrahams et al. Bleeding edge press.
  3. Hadoop: The Definitive Guide. Tom White. O'Reilly Press.
  4. Recent literature.