Small Data Challenges in Big Data Era: Unsupervised and Semi-Supervised Methods

Guo-Jun Qi and Jiebo Luo

Email: guojunq@gmail.com

A tutorial to be presented at IJCAI 2019, Macau, China

Download tutorial slides

Part I: unsupervised and semi-supervised [pdf]

Part II: few-shot learning [pdf]

1. Abstract

In this tutorial, we will review the recent progress towards overcoming the small data challenges with a limited amount of well annotated data in training deep neural networks. We will review the literature in both unsupervised and semi-supervised methods, including the underlying principles, criteria, considerations, and network designs and hope to shed some light on how to effectively leverage a large amount of unlabeled data to facilitate the model training and inference in both unsupervised and semi-supervised fashion.

The small data challenges have emerged in many learning problems since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address the challenges, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this tutorial, we will review the recent progress in these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. Specifically, we will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. For example, many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader overview of other emerging topics, from unsupervised and semi-supervised domain adaptation to zero-shot and few-shot learning. It is impossible for us to prepare an encyclopedia of all related works, but we seek to cover this research frontier by revealing where we are on the journey towards overcoming the small data challenges.

2. Outline

1. Overview: A big picture of the small data methods

2. Unsupervised methods

2.1. Transformation-Equivariant Representations

2.1.1. Group-Equivariant Convolutions

2.1.2. Auto-Encoding Transformations

2.2. Generative Representations

2.2.1. Auto-Encoders: Variational auto-encoders, denosing auto-encoders, contractive auto-encoders

2.2.2. GAN-based Representations: DCGAN, BiGAN, ALI, IntroAVE, VEEGAN

2.2.3. Disentangled Representations: InfoGAn, beta-AVE, FactorAVE

2.2.4. More Generative Models: GLOW, self-attention and Transformer models

2.3. Self-supervised methods: autoregressive models, and other image/video representations

3. Semi-Supervised Methods

3.1. Semi-supervised generative models.

3.1.1. semi-supervised auto-encoders

3.1.2. Semi-supervised GANs, Local GANs

3.1.3. semi-supervised Disentangled Representations

3.2. Teacher-Student Models

3.2.1. Noisy Teachers: GAMMA and PI Models

3.2.2. Teacher Ensemble: Temporal Ensembling, Mean Teacher

3.2.3. Adversarial Teachers: Virtual Adversarial Training

4. Domain Adaptation (will cover if time allows)

4.1. Unsupervised domain adaptation: Adversarial Discriminative domain adaptation, Gradient Reversal Layer

4.2. Semi-supervised domain adaptation.

3. Biography of Tutorial Speakers

Guo-Jun Qi (M14-SM18) is the Chief Scientist leading and overseeing an international R\&D team in the domain of multiple intelligent cloud services, including smart cities, visual computing service, medical intelligent service, and connected vehicle service at the Huawei Cloud, since August 2018. He was a faculty member in the Department of Computer Science and the director of MAchine Perception and LEarning (MAPLE) Lab at the University of Central Florida since August 2014. Prior to that, he was also a Research Staff Member at IBM T.J. Watson Research Center, Yorktown Heights, NY. His research interests include machine learning and knowledge discovery from multi-modal data sources (e.g., images, videos, texts, and sensors) in order to build smart and reliable information and decision-making systems. His research has been sponsored by grants and projects from government agencies and industry collaborators, including NSF, IARPA, Microsoft, IBM, and Adobe.

Dr. Qi has published more than 100 papers in a broad range of venues, such as Proceedings of IEEE, IEEE T PAMI, IEEE T KDE, IEEE T Image Processing, ICML, NIPS, CVPR, ECCV, ACM MM, SIGKDD, WWW, ICDM, SDM, ICDE and AAAI. Among them are the best student paper of ICDM 2014, ``the best ICDE 2013 paper" by IEEE Transactions on Knowledge and Data Engineering, as well as the best paper (finalist) of ACM Multimedia 2007 (2015).

Dr. Qi has served or will serve as a technical program co-chair for ACM Multimedia 2020, ICIMCS 2018 and MMM 2016, as well as an area chair (a senior program committee member) for ICCV, ICPR, ICIP, ACM SIGKDD, ACM CIKM, AAAI, IJCAI as well as ACM Multimedia. He is also serving or has served in the program committees of several major academic conferences, including CVPR, ICCV, ICML, NIPS, KDD, ECCV, BMVC, WSDM, CIKM, IJCAI, ICMR, ACM Multimedia, ACM/IEEE ASONAM, ICDM, ICIP, and ACL. He is also a member of steering committee for International Conference on MultiMedia Modeling. Dr. Qi is an associate editor for IEEE Transactions on Circuits and Systems for Video Technology (CSVT) and ACM Transactions on Knowledge Discovery from Data (TKDD). He was also a panelist for the NSF and the United States Department of Energy.

Jiebo Luo joined the University of Rochester in Fall 2011 after over fifteen prolific years at Kodak Research Laboratories, where he was a Senior Principal Scientist leading research and advanced development. He has been involved in numerous technical conferences, including serving as the program co-chair of ACM Multimedia 2010,IEEE CVPR 2012, ACM ICMR 2016, and IEEE ICIP 2017. He has served on the editorial boards of the IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE Transactions on Multimedia, IEEE Transactions on Circuits and Systems for Video Technology, IEEE Transactions on Big Data, ACM Transactions on Intelligent Systems and Technology, Pattern Recognition, Machine Vision and Applications, Knowledge and Information Systems, and Journal of Electronic Imaging. Dr. Luo is a Fellow of the SPIE, IAPR, IEEE, ACM, and AAAI.

Our Survey

[1] Guo-Jun Qi, Jiebo Luo. Small Data Challenges in Big Data Era: A Survey of Recent Progress on Unsupervised and Semi-Supervised Methods, arXiv:1903.11260. [pdf]

August 10, 2019