Hi all,

Our next Information Theory and Applications seminar for the academic year will take place on Monday, November 18 at 10:00, in room A500.

The speaker this week is Shachar Shayovitz, who will tell us about the role of information theory in active learning. See title and abstract below.

See you there,
Or, Oron, Yuval and Alex

---------------------------------------------

Title: Information Theoretic Active Learning

Abstract: Active learning is a machine learning paradigm aimed at improving model efficiency by strategically selecting the most informative data points for labeling, thereby reducing reliance on large annotated datasets. This is particularly relevant in privacy-sensitive applications where user data cannot be easily or fully annotated. Unlike traditional active learning approaches, which assume that the training and test sets share the same distribution, our research introduces novel methodologies that function effectively without this assumption, both in stochastic (probabilistic) and individual (non-probabilistic) settings. 

In the first part of our research, we address active learning within the stochastic setting, where data is governed by a probability distribution from a known hypothesis class. We propose a novel information-theoretic criterion for active learning, grounded in the Redundancy-Capacity theorem from universal source coding. This criterion naturally balances the exploration-exploitation trade-off in feature selection and offers a more robust alternative to heuristic-based methods. Both theoretical analysis and empirical validation demonstrate that our approach outperforms conventional methods in various tasks.

The second part of our research focuses on active learning in the individual setting, where no probabilistic relationship between the training and test sets is assumed. Drawing on universal source coding principles, we introduce a new criterion for selecting data points that minimizes the min-max regret on the test set. For tasks such as binary classification and linear regression, our criterion coincides with established active learning strategies, offering a unified framework for general hypothesis classes.

However, applying active learning to deep learning models presents a unique challenge due to the complexity and high-dimensionality of modern neural networks. In deep learning scenarios, existing active learning techniques often struggle with computational efficiency and accuracy, as neural networks require a large amount of labeled data to generalize effectively. To address these challenges, we extended our criterion by incorporating variational inference to approximate the posterior distribution of model parameters. This approach allowed us to develop a highly efficient, low-complexity algorithm tailored for deep learning applications. Our algorithm not only simplifies the task of selecting informative data points but also enhances the performance of neural networks in scenarios where labeled data is scarce. Experimental results demonstrate that our approach outperforms state-of-the-art active learning methods for deep learning. In particular, we achieved a reduction of 15.4%, 11%, and 35.1% in the required labeled data for CIFAR10, EMNIST, and MNIST datasets, respectively, even in the presence of out-of-distribution data. These results highlight the robustness and practical applicability of our method in reducing the labeling effort for deep learning tasks, making it a promising solution for real-world applications where large-scale data annotation is prohibitive.

Joint work with Meir Feder