2020 will be remembered as a year chock full of significant challenges, but for data science, specifically AI, machine learning, and deep learning, the march forward continued unabated. We saw excellent progress with enterprise acceptance of machine learning across a wide swath of industries and problem domains. In terms of pure research, I had a good time tracking the acceleration of progress in the area of machine learning. Research innovations propagated at a quick pace throughout 2020. In this article, we’ll take a tour of my top pick of papers that I found intriguing and useful. In my attempt to stay current with the field’s research progress, the directions represented here are very promising. I hope you enjoy the results as much as I have. (Check my list from last year HERE).
batchboost: regularization for stabilizing training with resistance to underfitting & overfitting
Overfitting & underfitting and stable training are important challenges in machine learning. Current approaches for these issues are mixup, SamplePairing, and BC learning. This paper states the hypothesis that mixing many images together can be more effective than just two. Batchboost pipeline has three stages: (a) pairing: method of selecting two samples; (b) mixing: how to create a new one from two samples; and (c) feeding: combining mixed samples with new ones from dataset into batch (with ratio γ). The batchboost method is slightly better than SamplePairing technique on small data sets (up to 5%). Batchboost provides stable training on not tuned parameters (like weight decay), thus it’s a good method to test performance of different architectures. The PyTorch code for this paper is available HERE.
DeeperForensics-1.0: A Large-Scale Dataset for Real-World Face Forgery Detection
This paper presents an on-going effort of constructing a large-scale benchmark, DeeperForensics-1.0, for face forgery detection. The benchmark represents the largest face forgery detection data set by far, with 60, 000 videos constituted by a total of 17.6 million frames, 10 times larger than existing data sets of the same kind. Extensive real-world perturbations are applied to obtain a more challenging benchmark of larger scale and higher diversity. All source videos in DeeperForensics-1.0 are carefully collected, and fake videos are generated by a newly proposed end-to-end face-swapping framework. The quality of generated videos outperforms those in existing data sets, validated by user studies. The benchmark features a hidden test set, which contains manipulated videos achieving high deceptive scores in human evaluations. The code and data for this machine learning research paper are available HERE.
A Primer in BERTology: What we know about how BERT works
Transformer-based models are now widely used in NLP, but we still do not understand a lot about their inner workings. This machine learning research paper describes what is known to date about the famous BERT model (Devlin et al. 2019), synthesizing over 40 analysis studies. Also provided is an overview of the proposed modifications to the model and its training regime. Also included is an outline of the directions for further research.
Gradient Boosting Neural Networks: GrowNet
A novel gradient boosting framework is proposed where shallow neural networks are employed as “weak learners.” General loss functions are considered under this unified framework with specific examples presented for classification, regression, and learning to rank. A fully corrective step is incorporated to remedy the pitfall of greedy function approximation of classic gradient boosting decision tree. The proposed model rendered state-of-the-art results in all three tasks on multiple data sets. An ablation study is performed to shed light on the effect of each model component and model hyperparameters.
The Deep Learning Compiler: A Comprehensive Survey
The difficulty of deploying various DL models on diverse DL hardware has boosted the research and development of DL compilers in the community. Several DL compilers have been proposed from both industry and academia such as Tensorflow XLA and TVM. Similarly, the DL compilers take the DL models described in different DL frameworks as input, and then generate optimized codes for diverse DL hardware as output. However, none of the existing surveys has analyzed the unique design of the DL compilers comprehensively. This paper performs a comprehensive survey of existing DL compilers by dissecting the commonly adopted design in details, with emphasis on the DL oriented multi-level IRs, and frontend/backend optimizations. This is the first machine learning research survey paper focusing on the unique design of the DL compiler, and it is hoped this can pave the road for future research towards the DL compiler.
Cross-Iteration Batch Normalization
A well-known issue of Batch Normalization is its significantly reduced effectiveness in the case of small mini-batch sizes. When a mini-batch contains few examples, the statistics upon which the normalization is defined cannot be reliably estimated from it during a training iteration. To address this problem, this paper presents Cross-Iteration Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. On object detection and image classification with small mini-batch sizes, CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique. The PyTorch code for this paper can be found HERE.
Julia Language in Machine Learning: Algorithms, Applications, and Open Issues
Machine learning is driving development across many fields in science and engineering. A simple and efficient programming language could accelerate applications of machine learning in various fields. Currently, the programming languages most commonly used to develop machine learning algorithms include Python, MATLAB, and C/C ++. However, none of these languages well balance both efficiency and simplicity. The Julia language is a fast, easy-to-use, and open-source programming language that was originally designed for high-performance computing, which can well balance the efficiency and simplicity. This paper summarizes the related research work and developments in the application of the Julia language in machine learning. It first surveys the popular machine learning algorithms that are developed in the Julia language. Then, it investigates applications of the machine learning algorithms implemented with the Julia language. Finally, it discusses the open issues and the potential future directions that arise in the use of the Julia language in machine learning.
Masked Face Recognition Dataset and Application
In order to effectively prevent the spread of the COVID-19 virus, almost everyone wears a mask during the coronavirus epidemic. This almost makes conventional facial recognition technology ineffective in many cases, such as community access control, face access control, facial attendance, facial security checks at train stations, etc. Therefore, it is very urgent to improve the recognition performance of the existing face recognition technology on the masked faces. Most current advanced face recognition approaches are designed based on deep learning, which depend on a large number of face samples. However, at present, there are no publicly available masked face recognition data sets. To this end, this paper proposes three types of masked face data sets, including Masked Face Detection Dataset (MFDD), Real-world Masked Face Recognition Dataset (RMFRD), and Simulated Masked Face Recognition Dataset (SMFRD). The GitHub repo associated with this paper can be found HERE.
TensorFlow Quantum: A Software Framework for Quantum Machine Learning
This paper introduces TensorFlow Quantum (TFQ), an open-source library for the rapid prototyping of hybrid quantum-classical models for classical or quantum data. This framework offers high-level abstractions for the design and training of both discriminative and generative quantum models under TensorFlow and supports high-performance quantum circuit simulators. The paper provides an overview of the software architecture and building blocks through several examples and reviews the theory of hybrid quantum-classical neural networks. The TFQ functionalities are illustrated via several basic applications including supervised learning for quantum classification, quantum control, and quantum approximate optimization. Moreover, the authors demonstrate how one can apply TFQ to tackle advanced quantum learning tasks including meta-learning, Hamiltonian learning, and sampling thermal states. The authors hope this framework provides the necessary tools for the quantum computing and machine learning research communities to explore models of both natural and artificial quantum systems, and ultimately discover new quantum algorithms that could potentially yield a quantum advantage.
What is the State of Neural Network Pruning?
Neural network pruning—the task of reducing the size of a network by removing parameters—has been the subject of a great deal of work in recent years. This paper provides a meta-analysis of the literature, including an overview of approaches to pruning and consistent findings in the literature. After aggregating results across 81 papers and pruning hundreds of models in controlled conditions, the clearest finding is that the community suffers from a lack of standardized benchmarks and metrics. This deficiency is substantial enough that it is hard to compare pruning techniques to one another or determine how much progress the field has made over the past three decades. To address this situation, the authors identify issues with current practices, suggest concrete remedies, and introduce ShrinkBench, an open-source framework to facilitate standardized evaluations of pruning methods. ShrinkBench is used to compare various pruning techniques and show that its comprehensive evaluation can prevent common pitfalls when comparing pruning methods. The PyTorch code for this paper is available HERE.
YOLOv4: Optimal Speed and Accuracy of Object Detection
There are a huge number of features that are said to improve Convolutional Neural Network (CNN) accuracy. Practical testing of combinations of such features on large data sets, and theoretical justification of the result, is required. Some features operate on certain models exclusively and for certain problems exclusively, or only for small-scale datasets; while some features, such as batch-normalization and residual-connections, are applicable to the majority of models, tasks, and data sets. This paper’s authors assume that such universal features include Weighted-Residual-Connections (WRC), Cross-Stage-Partial-connections (CSP), Cross mini-Batch Normalization (CmBN), Self-adversarial-training (SAT), and Mish-activation. They use new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, CmBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP (65.7% AP50) for the MS COCO data set at a real-time speed of ~65 FPS on Tesla V100. The PyTorch code associate with this paper can be found HERE.
BERTweet: A pre-trained language model for English Tweets
This paper presents BERTweet, the first public large-scale pre-trained language model for English Tweets. BERTweet is trained using the RoBERTa pre-training procedure (Liu et al., 2019), with the same model configuration as BERT-base (Devlin et al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al., 2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks: Part-of-speech tagging, Named-entity recognition and text classification. BERTweet is released to facilitate future research and downstream applications on Tweet data. PyTorch BERTweet code is available HERE.
One of the most significant challenges in statistical signal processing and machine learning is how to obtain a generative model that can produce samples of large-scale data distribution, such as images and speeches. Generative Adversarial Network (GAN) is an effective method to address this problem. The GANs provide an appropriate way to learn deep representations without widespread use of labeled training data. This approach has attracted the attention of many researchers in computer vision since it can generate a large amount of data without precise modeling of the probability density function (PDF). In GANs, the generative model is estimated via a competitive process where the generator and discriminator networks are trained simultaneously. The generator learns to generate plausible data, and the discriminator learns to distinguish fake data created by the generator from real data samples. Given the rapid growth of GANs over the last few years and their application in various fields, it is necessary to investigate these networks accurately. This paper, after introducing the main concepts and the theory of GAN, compares two new deep generative models, and the evaluation metrics utilized in the literature and challenges of GANs are also explained. Moreover, the most remarkable GAN architectures are categorized and discussed. Finally, the essential applications in computer vision are examined.
Comparing BERT against traditional machine learning text classification
The BERT model has arisen as a popular state-of-the-art machine learning model in recent years that is able to cope with multiple NLP tasks such as supervised text classification without human supervision. Its flexibility to cope with any type of corpus delivering great results has made this approach very popular not only in academia but also in the industry. Although, there are lots of different approaches that have been used throughout the years with success. This paper first presents BERT and includes a little review of classical NLP approaches. Then, the researchers empirically test with a suite of experiments dealing different scenarios the behavior of BERT against the traditional TF-IDF vocabulary fed to machine learning algorithms. The purpose of this work is to add empirical evidence to support or refuse the use of BERT as a default on NLP tasks. Experiments show the superiority of BERT and its independence of features of the NLP problem such as the language of the text adding empirical evidence to use BERT as a default technique to be used in NLP problems.
AutoML-Zero: Evolving Machine Learning Algorithms From Scratch
Machine learning research has advanced in multiple aspects, including model structures and learning methods. The effort to automate such research, known as AutoML, has also made significant progress. However, this progress has largely focused on the architecture of neural networks, where it has relied on sophisticated expert-designed layers as building blocks—or similarly restrictive search spaces. The goal of this paper is to show that AutoML can go further: it is possible today to automatically discover complete machine learning algorithms just using basic mathematical operations as building blocks. This is demonstrated by introducing a novel framework that significantly reduces human bias through a generic search space. Despite the vastness of this space, an evolutionary search can still discover two-layer neural networks trained by backpropagation. These simple neural networks can then be surpassed by evolving directly on tasks of interest, e.g. CIFAR-10 variants, where modern techniques emerge in the top algorithms, such as bilinear interactions, normalized gradients, and weight averaging. Moreover, evolution adapts algorithms to different task types: e.g., dropout-like techniques appear when little data is available.
On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice
Machine learning algorithms have been used widely in various applications and areas. To fit a machine learning model into different problems, its hyper-parameters must be tuned. Selecting the best hyper-parameter configuration for machine learning models has a direct impact on the model’s performance. It often requires deep knowledge of machine learning algorithms and appropriate hyper-parameter optimization techniques. Although several automatic optimization techniques exist, they have different strengths and drawbacks when applied to different types of problems. In this paper, optimizing the hyper-parameters of common machine learning models is studied. Several state-of-the-art optimization techniques are introduced in terms of how to apply them to machine learning algorithms. Many available libraries and frameworks developed for hyper-parameter optimization problems are provided, and some open challenges of hyper-parameter optimization research are also discussed in this paper. Moreover, experiments are conducted on benchmark datasets to compare the performance of different optimization methods and provide practical examples of hyper-parameter optimization. This survey paper will help industrial users, data analysts, and researchers to better develop machine learning models by identifying the proper hyper-parameter configurations effectively.
Visualizing classification results
Classification is a major tool of statistics and machine learning. A classification method first processes a training set of objects with given classes (labels), with the goal of afterward assigning new objects to one of these classes. When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label. This is sometimes called label bias, and raises the question of whether the object was mislabeled. The goal of this paper is to visualize aspects of the data classification to obtain insight. The proposed display reflects to what extent each object’s label is (dis)similar to its prediction, how far each object lies from the other objects in its class, and whether some objects lie far from all classes. The display is constructed for discriminant analysis, the k-nearest neighbor classifier, support vector machines, logistic regression, and majority voting. It is illustrated on several benchmark datasets containing images and texts.
Phishing Detection Using Machine Learning Techniques
The Internet has become an indispensable part of our life. However, it also has provided opportunities to anonymously perform malicious activities like Phishing. Phishers try to deceive their victims by social engineering or creating mock-up websites to steal information such as account ID, username, password from individuals and organizations. Although many methods have been proposed to detect phishing websites, Phishers have evolved their methods to escape from these detection methods. One of the most successful methods for detecting these malicious activities is Machine Learning. This is because most Phishing attacks have some common characteristics which can be identified by machine learning methods. This paper compares the results of multiple machine learning methods for predicting phishing websites.
Memory Optimization for Deep Networks
Deep learning is slowly, but steadily, hitting a memory bottleneck. While the tensor computation in top-of-the-line GPUs increased by 32x over the last five years, the total available memory only grew by 2.5x. This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. This paper presents MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. MONeT jointly optimizes the checkpointing schedule and the implementation of various operators. MONeT is able to outperform all prior hand-tuned operations as well as automated checkpointing. MONeT reduces the overall memory requirement by 3x for various PyTorch models, with a 9-16% overhead in computation. For the same computation cost, MONeT requires 1.2-1.8x less memory than current state-of-the-art automated checkpointing frameworks. The PyTorch code associated with this machine learning research paper is available HERE.
Teaching a GAN What Not to Learn
Generative adversarial networks (GANs) were originally envisioned as unsupervised generative models that learn to follow a target distribution. Variants such as conditional GANs, auxiliary-classifier GANs (ACGANs) project GANs on to supervised and semi-supervised learning frameworks by providing labelled data and using multi-class discriminators. This paper approaches the supervised GAN problem from a different perspective, one that is motivated by the philosophy of the famous Persian poet Rumi who said, “The art of knowing is knowing what to ignore.”
When Machine Learning Meets Privacy: A Survey and Outlook
The newly emerged machine learning (e.g. deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. It is important to note that the problem of privacy preservation in the context of machine learning is quite different from that in traditional data privacy protection, as machine learning can act as both friend and foe. Currently, the work on the preservation of privacy and ML is still in an infancy stage, as most existing solutions only focus on privacy problems during the machine learning process. Therefore, a comprehensive study on the privacy preservation problems and machine learning is required. This paper surveys the state of the art in privacy issues and solutions for machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on an in-depth analysis of the area of privacy and machine learning, the machine learning research paper points out future research directions in this field.
Interesting in learning more about machine learning? Check out these Ai+ training sessions:
Machine Learning Foundations: Linear Algebra
This first installment in the Machine Learning Foundations series the topic at the heart of most machine learning approaches. Through the combination of theory and interactive examples, you’ll develop an understanding of how linear algebra is used to solve for unknown values in high-dimensional spaces, thereby enabling machines to recognize patterns and make predictions.
Supervised Machine Learning Series
Data Annotation at Scale: Active and Semi-Supervised Learning in Python
Explaining and Interpreting Gradient Boosting Models in Machine Learning
ODSC West 2020: Intelligibility Throughout the Machine Learning Lifecycle