Hi all,
Our next Information Theory and Applications seminar for the academic year
will take place on Monday, November 18 at 10:00, in room A500.
The speaker this week is Shachar Shayovitz, who will tell us about the role
of information theory in active learning. See title and abstract below.
See you there,
Or, Oron, Yuval and Alex
---------------------------------------------
Title: Information Theoretic Active Learning
Abstract: Active learning is a machine learning paradigm aimed at improving
model efficiency by strategically selecting the most informative data
points for labeling, thereby reducing reliance on large annotated datasets.
This is particularly relevant in privacy-sensitive applications where user
data cannot be easily or fully annotated. Unlike traditional active
learning approaches, which assume that the training and test sets share the
same distribution, our research introduces novel methodologies that
function effectively without this assumption, both in stochastic
(probabilistic) and individual (non-probabilistic) settings.
In the first part of our research, we address active learning within the
stochastic setting, where data is governed by a probability distribution
from a known hypothesis class. We propose a novel information-theoretic
criterion for active learning, grounded in the Redundancy-Capacity theorem
from universal source coding. This criterion naturally balances the
exploration-exploitation trade-off in feature selection and offers a more
robust alternative to heuristic-based methods. Both theoretical analysis
and empirical validation demonstrate that our approach outperforms
conventional methods in various tasks.
The second part of our research focuses on active learning in the
individual setting, where no probabilistic relationship between the
training and test sets is assumed. Drawing on universal source coding
principles, we introduce a new criterion for selecting data points that
minimizes the min-max regret on the test set. For tasks such as binary
classification and linear regression, our criterion coincides with
established active learning strategies, offering a unified framework for
general hypothesis classes.
However, applying active learning to deep learning models presents a unique
challenge due to the complexity and high-dimensionality of modern neural
networks. In deep learning scenarios, existing active learning techniques
often struggle with computational efficiency and accuracy, as neural
networks require a large amount of labeled data to generalize effectively.
To address these challenges, we extended our criterion by incorporating
variational inference to approximate the posterior distribution of model
parameters. This approach allowed us to develop a highly efficient,
low-complexity algorithm tailored for deep learning applications. Our
algorithm not only simplifies the task of selecting informative data points
but also enhances the performance of neural networks in scenarios where
labeled data is scarce. Experimental results demonstrate that our approach
outperforms state-of-the-art active learning methods for deep learning. In
particular, we achieved a reduction of 15.4%, 11%, and 35.1% in the
required labeled data for CIFAR10, EMNIST, and MNIST datasets,
respectively, even in the presence of out-of-distribution data. These
results highlight the robustness and practical applicability of our method
in reducing the labeling effort for deep learning tasks, making it a
promising solution for real-world applications where large-scale data
annotation is prohibitive.
Joint work with Meir Feder