Zero-shot learning (ZSL) aims at understanding unseen categories with no training examples from class-level descriptions. To improve the discriminative power of zero-shot learning, we model the visual learning process of unseen categories with inspiration from the psychology of human creativity for producing novel art. We relate ZSL to human creativity by observing that zero-shot learning is about recognizing the unseen and creativity is about creating a likable unseen. We introduce a learning signal inspired by creativity literature that explores the unseen space with hallucinated class-descriptions and encourages careful deviation of their visual feature generations from seen classes while allowing knowledge transfer from seen to unseen classes.
[Related Article: Revolutionizing Visual Commerce with Computer Vision Models]
With hundreds of thousands of object categories in the real world and countless undiscovered species, it becomes unfeasible to maintain hundreds of examples per class to fuel the training needs of most existing recognition systems.
Zipf’s law, named after George Zipf (1902-1950), suggests that for the vast majority of the world-scale classes, only a few examples are available for training, validated earlier in language and later in vision. This problem becomes even more severe when we target recognition at the fine-grained level. For example, there exists tens of thousands of bird and flower species, but the largest available benchmarks have only a few hundred classes motivating a lot of research on classifying instances of unseen classes, known as Zero-Shot Learning (ZSL)
People have a great capability to identify unseen visual classes from text descriptions like, “the crested auklet is a subspecies of birds with dark-gray bodies tails and wings and orange-yellow bill. It is known for its forehead crests, made of black forward-curving feathers.” see the Fig1. We may imagine the appearance of “crested auklet” in different ways yet all are correct and may collectively help us understand it better. This imagination notion has been modeled in recent ZSL approaches successfully adopting deep generative models to synthesize visual examples of an unseen object given its semantic description. After training, the model generates imaginary data for each unseen class transforming ZSL into a standard classification task with the generated data.
Fig1. Generalizing the learning of zero-shot models requires a deviation from seen classes to accommodate recognizing unseen classes.
Challenges
However, these generative ZSL methods do not guarantee the discrimination between seen and unseen classes since the generations are not motivated with a learning signal to deviate from seen classes. For example, “Parakeet Auklet” as a seen class in Fig1 has a visual text description that significantly similar to “Crested Auklet” description, yet one can identify the Crested Auklet’s unique “black forward-curving feathers” against the Parakeet Auklet’s from the text.
We carefully modeled a learning signal that inductively encourages deviation of unseen classes from seen classes, yet not pushed far that the generation falls in the negative hedonic unrealistic range on the right and loses knowledge transfer from seen classes. Interestingly, this curve is similar to the famous Wundt Curve in Human Creativity literature (Martindale, 1990).
Key Features
1) We propose a zero-shot learning approach that explicitly models generating unseen classes by learning to carefully deviate from seen classes. We examine a parametrized entropy measure to facilitate learning how to deviate from seen classes. Our approach is inspired by the psychology of human creativity, and thus we name it Creativity Inspired Zero-shot Learning (CIZSL).
2) Our creativity inspired loss is unsupervised and orthogonal to any Generative ZSL approach. Thus it can be integrated with any GZSL while adding no extra parameters nor requiring any additional labels. % inductive/unsupervised?
3) By means of extensive experiments on seven benchmarks encompassing Wikipedia-based and attribute-based descriptions, our approach consistently outperformed state-of-the-art methods on zero-shot recognition, zero-shot retrieval, and generalized zero-shot learning using several evaluation metrics.
Broad Research
Imagination is one of the key properties of human intelligence that enables us not only to generate creative products like art and music but also to understand the visual world. My research focuses mostly on developing imagination-inspired techniques that empower AI machines to see (computer vision) or to create (e.g., fashion and art); “Imagine to See” and “Imagine to Create.” This work is under “Imagine to See” part of my research interest.
My group name is Vision-CAIR, which stands for Computer Vision-“C” Artificial Intelligence Research. “C” is for Content or Creative since we cover in our lab both Vision Content AI research and Vision-Creative AI research.
Next steps
In our environment, we sometimes encounter a bird and out of nowhere it flies away and we have no chance to take a photo so we know more about it and what subspecies of birds is it. In another situation, we fly away because otherwise, we could be a meal for this hungry bear. when it comes to maintaining biodiversity levels and to give back to mother nature, we care about both the bird and the bear. AI-powered with this technology could help to report sighting these creatures at the fine-grained without any pictures, just with language descriptions.
AI can be an additional arm to help mother nature. It could be further pushed forward in an interactive way in the future. So, we may ask this robot representing AI that loves our environment for more information about a bird that just flew away. The robot can ask us more questions to get more and more certain about what we really mean and suggest the answers that make us happy and helps us report the existence of this parakeet auklet at a certain location. I aim to see AI help giving more and more care to our environment
[Related Article: A Manager’s Guide to Starting a Computer Vision Program]
We draw inspiration from the psychology of human creativity to improve the capability of unseen class imagination for zero-shot recognition. We adopted GANs to discriminatively imagine visual features given a hallucinated text describing an unseen visual class. Thus, our generator learns to synthesize unseen classes from hallucinated texts. Our loss encourages deviating generations of unseen from seen classes by enforcing a high entropy on seen class classification while being realistic. Nonetheless, we ensure the realism of hallucinated text by synthesizing visual features similar to the seen classes to preserve knowledge transfer to unseen classes. A comprehensive evaluation on seven benchmarks shows a consistent improvement over the state-of-the-art for both zero-shot learning and retrieval with class description defined by Wikipedia articles and attributes.
Mohamed Elhoseiny, PhD
Dr. Mohamed Elhoseiny is an Assistant Professor of Computer Science at the Visual Computing Center at KAUST (King Abdullah University of Science and Technology) and Visiting Faculty at Stanford University. He received his PhD from Rutgers University under Prof. Ahmed Elgammal in October 2016 then spent more than two years at Facebook AI Research (FAIR) until January 2019 as a Postdoc Researcher. He later was an AI Research consultant at Baidu Research at Silicon Valley AI Lab (SVAIL) until September 2019. His primary research interests are in computer vision and especially about learning about the unseen or the least unseen by recognition (e.g., zero-shot learning) or by generation (creative art and fashion generation). Under the umbrella of how may benefit biodiversity, Dr. Elhoseiny’s 6-years long development of the zero-shot learning problem was featured in the United Nations biodiversity conference in November 2018 (~10,000 audience from >192 countries). His creative AI research projects were recognized at the ECCV18 workshop on Fashion and Art with the best paper award, media coverage at the New Scientist Magazine and MIT Tech review (2017, 2018), 20 min speech at the Facebook F8 conference (2018), the official FAIR video (2018), and coverage at HBO Silicon Valley TV Series (2018).