Researchers Crack Open Machine Learning's Black Box to Shine a Light on Generalization
Flying in the face of recent rival studies, these scientists point to generalization as key to order-of-magnitude performance gains.
Researchers from the Massachusetts Institute of Technology (MIT) and Brown University have taken steps to open up the "black box" of machine learning β and say that the key to success may lie in generalization.
"This study provides one of the first theoretical analyses covering optimization, generalization, and approximation in deep networks and offers new insights into the properties that emerge during training," explains co-author Tomaso Poggio, the Eugene McDermott Professor at MIT. "Our results have the potential to advance our understanding of why deep learning works as well as it does."
Machine learning has proven outstanding at a range of tasks, from surprisingly convincing chat bots to autonomous vehicles. It comes, however, with a big caveat: it's not always clear how or why a machine learning system comes to its outputs for a given input. Many networks operate as a black box, performing unknowable tasks on incoming data β and but the researchers' work is helping to open that box and shine a light within.
The team's work focused on two network types: fully-connected deep networks and convolutional neural networks (CNNs). A key part of their study involved investigating exactly what factors contribute to the state of "neural collapse," when a networks' training maps multiple class examples to a single template.
"Our analysis shows that neural collapse emerges from the minimization of the square loss with highly expressive deep neural networks," explains co-author and post-doctoral researcher Akshay Rangamani. "It also highlights the key roles played by weight decay regularization and stochastic gradient descent in driving solutions towards neural collapse."
That understanding led to another finding, which flips recent studies of generalization on their head. "[Our study] validates the classical theory of generalization showing that traditional bounds are meaningful," explains postdoc Tomer Galanti, of findings which proved new norm-based generalization bounds for convolutional neural networks. "It also provides a theoretical explanation for the superior performance in many tasks of sparse networks, such as CNNs, with respect to dense networks."
The study found that generalization could offer a performance orders of magnitude better than densely-connected networks β something the researchers claim has been "almost completely ignored by machine learning theory."
The team's work has been published in the journal Research under open-access terms.