Introduction to computational methods for identifying patterns and outliers in large data sets. Topics include the singular and eigenvalue decomposition, independent component analysis, graph analysis, clustering, linear, regularized, sparse and non-linear model fitting, deep, convolutional and recurrent neural networks. The computational textbook teaches the material to students step by step, by doing via autograded programming exercises and conceptual multiple-choice quizzes. Every codex contains an application that illustrates the ideas behind the algorithm, an exploration of why the algorithm works and when it fails (and can or cannot be fixed) as a way to understand, via mathematical principles, the strengths and weakness of the algorithms.
Raj is an Associate Professor of Electrical Engineering and Computer Science at the University of Michigan, Ann Arbor.
He received his Masters and PhD in Electrical Engineering and Computer Science at MIT as part of the MIT/WHOI Joint Program in Ocean Science and Engineering. His work is at the interface of statistical signal processing and random matrix theory with applications such as sonar, radar, wireless communications and machine learning in mind. Developing this 'living' computational book and teaching EECS 505 with it has been a career highlight.
Over the past few years of teaching the Computational Machine Learning & Data Science course at the University of Michigan, we've conceived and iterated on over two dozen distinct codices. Each codex is like a chapter of the computational book and have been organized into several units below. Select codices are available as a demo of the Pathbird platform.
Students will be introduced to the Julia programming language. By the end, students will be familiar with variables, arrays, functions, and everything else in Julia that they need to succeed in this course.
An introduction to matrix math and linear algebra.
Students will learn about vectors, matrices, arrays, and various operations on these objects.
An introduction to the concept of convolution and expressing convolution as a matrix-vector product.
An introduction to normal and non-normal matrices and the spectral theorem for normal matrices, as well as the eigenvalue decomposition and the singular value decomposition and their variational characterizations via eigshow
and svdshow
.
An introduction to vector spaces, subspaces, and the four fundamental subspaces of a matrix, including a discussion of basis vectors for subspaces and how the SVD of a matrix reveals these bases, as well as orthogonal projection matrices and how to efficiently compute the projection of a vector onto a matrix subspace without first computing and storing the associated projection matrix.
First difference matrix construction and the role of the Kronecker product and sparse constructions thereof.
How to setup and solve least squares problems of the form . Applications include fitting data to a higher order polynomial function, predicting search query time series results after an appropriate non-linear transformation.
Stochastic descent, Nesterov's accelerated method. Application: photometric stereo reconstruction using these algorithms.
How and why we need to regularize the solution of a system of equations of the form . Applications include better fitting data to a higher order polynomial function, image in-painting/graffiti removal with a first difference regularizer. Discussion of how the optimal regularization coefficient is selected.
Learning to recognize sparsity in its canonical and transformed manifestations and seeing (computationally) how that helps regularize the solution of a system of equations of the form in a regime where minimum norm least squares does not work well. Applications include compressed sensing, image in-painting/graffiti removal with a first difference regularizer and a discussion of how the optimal regularization coefficient is selected.
Convolution plays a role, directly or indirectly, in many data science techniques. Many seemingly complex image filters, for example, may be expressed elegantly using convolution.
Application illustrating how we need to split matrices into blocks and write a matrix-matrix multiplication routine from scratch for settings where the matrices are too big to fit into memory.
Learning to tell apart two classes.
Learning to tell apart multiple classes.
Learning to tell apart multiple classes with logistic regression. What logistic regression is about, why it is different than least squares and what the underlying loss function better captures than the mean squared error loss function in least squares.
Learning to tell apart multiple classes with the nearest subspace classification algorithm.
Fisher's linear discriminant analysis, the generalized eigen-problem and the equivalent formulation
First principles derivation of the support vector machine and how its loss function is different from least squares classification. Learning to tell apart multiple classes with the SVM.
False positive, false negatives and how we can change the classification output to weight one or the other. How decision theory informs the choice of decision thresholds.
How to generate inputs that can cause a classifier to give erroneous outputs; how to train an algorithm to be robust to such errors.
What is unsupervised learning and how the k-means algorithm can be used for that.
How to use the singular value decomposition (SVD) to isolate moving objects from the background of a video. Many real-world datasets consist of dynamic signals and noise on top of some static backdrop or baseline. The SVD can isolate these components, allowing us to focus on what we are interested in and disregard the rest.
How to use the SVD to fill in missing elements of a matrix with known rank. Random matrix theory to predict sparsity and incoherence conditions when near-perfect recovery is possible. Applications to low rank image inpainting and collaborative filtering.
The Eckart-Young theorem and its consequences. Applications include image compression and image denoising.
Implement Principal Component Analysis (PCA) and Independent Component Analysis (ICA). These methods reason about covariance and kurtosis to emphasize important data components.
Cumulant Theory of ICA
Viewing PCA and ICA as cumulant maximization problems. Understanding limits of each and finding ways to encode the error of higher order ICA and why kurtosis based ICA is the one used in practice.
Learn2Embed
Spectral algorithms for finding communities in a graph via modularity maximization. Applications include finding communities in a karate club dataset and inferring baseball team division structure from a dataset of how frequently they play each other. Investigation of limits of finding community structure in the stochastic block model.
Phase transitions in PCA and examination of how to reason about unlearnability of structure due outlier-induced phase transitions in the low-rank-plus-noise-plus-sparse outliers data model.
Synchronized Waveforms
Use Flux.jl to quickly design and train single input, single output shallow neural networks for 1-D function approximation
Use Flux.jl to quickly design and train 1-D and 2-D deep feedforward neural nets. See the power of deep nets and critically examine what it means for conv. nets to be "shift-invariant". Applications include handwritten digit recognition and 1-D signal recognition.
Use Flux.jl to quickly design and train convolutional neural nets. Applications include handwritten digit recognition.
Use Flux.jl to quickly design and train recurrent neural nets. Learn why they are used for sequence modeling, what they can do that Conv and Deep Feedforward nets cannot. Applications include image captioning, predicting the Fibonacci sequence and time series.
How to use neural networks in the context of regression and function approximation
Using methods learned to classify red wine versus white wine
Using methods learned to classify varieties of fruit
Using methods learned to tell apart gesures for "rock", "paper" and "scissors"
How to train a network to be robust to adversarial attacks
Deep, denoising and variational autoencoders. Application includes auto-encoding of handwritten digits.
Theory and practice include the choice of the loss function. Application includes generating realistic handwritten digits.
There is so much to learn in machine learning and so much about it is fun and exciting! Instructors usually run out of time before they run out of topics. The codices are designed to augment your (the instructor's) voice. Machine learning is tricky because it links math with code. Students need the instructor and the instructional staff's help when they get stuck -- and they will get stuck because computational machine learning algorithms require mastery math and programming and linking math to code and vice-versa.
The codices are self-contained and in that sense they are like a textbook. The computational component makes them more than that. They can be used as homework assignments to reinforce concepts in an instructor's lecture/notes or another textbook. We use it at Michigan as a lab component to the course. The instructor has peace of mind knowing that the codices have been tested on over a thousand students -- the platform allows an instructor to scale their to hundreds. The backend support for autograding programming assignments allows the instructional staff to engage a student at the moment when they are stuck when the learner is eager and ready to learn more. Codices will amplify the instructor's voice by linking in-class theory to computational practice.
YES! The reason we built Pathbird was to scale our ability to effectively teach students. We've taught 260+ students in class working on the codex live in class. (See below) The codices and platform help amplify an instructor's voice while helping solve the "assignment grading" problem for computational machine learning so the class can scale easily into the hundreds.
We are actively recruiting instructors to try the platform and the codices in their own courses as a supplement for existing course material either as homeworks or as in-class computational labs. You can express your interest using this Google form and we will get back to you as soon as possible.
The Pathbird platform is currently in beta. The price for the book includes the cost of cloud-computing, hosting and storage resources since every codex involves the learning learning by doing via computing on the platform.
The learner personalizes their book via their answers for the auto-graded programming assignments, the various free form question prompts and their ability to take notes within the platform. There is a "expert" mode for each codex which allows the learner to deepen their mastery of a code that they have already solved. Thus learners can continue learning (for a nominal fee) even when the course is completed.
The codices for this course were initially developed and used to teach EECS 505 at the University of Michigan. It has also been used to teach classes at MIT and for private training camps across the country.
Special thanks to Travis DePrato for building Pathbird from scratch. Thanks to Don Winsor for the introduction and thinking (correctly) that Travis was capable of way more than he was doing then. This way of writing and publishing a computational book would not have been possible without him.
Thanks to Jonas Kersulis for editing and proof-reading the codices. Thanks to Brian Moore for helping create an early version of the autograder -- they are what give the assignments in the codices an extra "oomph!". Thanks to Adrian Stoll r creating the API for the autograder -- this was a big step towards Pathbird.
This book would not be possible without them and the various graduate student instructors (Arvind Prasadan, David Hong, Hao Wu, Rishi Sonthalia, Dipak Narayan, Yash Sanjay Bhalgat and Raj Tejas Suryaprakash) who helped edit, test and refine the codices, right before they were about to go live to hundreds of students. Thanks to Simon Danisch for the helping start this journey in 2017 by porting my MATLAB demos to Julia and to David Sanders for the gitter post on a Jupyter forum whose response by Min Ragan-Kelley, I used to initiate the first conversation with Travis.
Thanks in particular to Gil Strang for his encouragement, feedback and support and for inspiring the idea behind the codices during the very special semester of Spring 2017 when we launched and taught 18.065 at MIT. Thanks to Jeremy Kepner for making that semester happening and for the opportunity to teach the course at the MIT Lincoln Laboratories -- seeing the excitement there for the ideas and math gave me the impetus to want to do, and reach out, more to a more diverse audience than the "typical" grad students who had taken my class. Thanks to Muralidhar Rangaswamy for the opportunity to reprise the course at AFRL Dayton and the many scientists and engineers there whose enthusiasm for the material in the codex format gave me hope that this format could succeed beyond just at Michigan.
Multiple thanks to Alan Edelman for years of encouragement and inspiration and for teaching me so much (including Julia). A learner experiencing this book by doing/coding might sometimes recognize their voice in the way I write and speak about the underlying math and code. That's no accident. This book is infused with their DNA and years of me soaking in their thoughts and ideas on so many matters, particularly on how elegant math produces elegant codes and vice versa. All they taught me about how to see math and linear algebra makes me love it, and to want to share with you in the codex way, even more.
There are several additional resources that we recommend. These resources may be used as a companion book or simply to supplement the concept presented here.
There is so much to learn and we are delighted that there so many resources that present the material in slightly different ways -- all come together to help a learner form a more complete picture of the material. One can never really stop learning with how much there is to learn! (That's part of the fun for this author!)