Big data is data so large that it does not fit in the main memory of a single machine. The need to process big data by space-efficient algorithms arises in Internet search, machine learning, network traffic monitoring, scientific computing, signal processing, and other areas.
This course will cover mathematically rigorous models for developing such algorithms, as well as some provable limitations of algorithms operating in those models. Some topics covered include:
Dimensionality reduction. General techniques and impossibility results for reducing data dimension while still preserving geometric structure.
Numerical linear algebra. Algorithms for big matrices (e.g. a user/product rating matrix for Netflix or Amazon). Regression, low rank approximation, matrix completion, etc.
Compressed sensing. Recovery of (approximately) sparse signals based on few linear measurements.
Sparse Fourier Transform. Fast algorithms for (approximately) computing the Fourier Transform of signals which are (approximately) sparse in the frequency domain.
This course is intended for both graduate students and advanced undergraduate students satisfying the following prerequisites: mathematical maturity and comfort with algorithms (e.g. CS 124 at Harvard, or 6.046 at MIT), discrete probability, and linear algebra.