Have you ever stood in a packed crowd, seen heatwaves rippling off the pavement and felt the roar of cars flying past you? No? I’m guessing that the overlap of ODSC blog readers and car racing fans might be small. However, those roaring cars do present some interesting data science challenges. During a race, high-resolution images of the cars are taken and shared in a common Dropbox folder for all teams to use. Because the images are important for use mid-race, most crews have an employee to sort through this folder of images in order to identify photos of their car. This process of selecting photos containing a specific car is easily automatable. By detecting car objects using a pre-trained MobileNet-SSD, then training an ensemble of neural networks to identify a specific car, this automation can be done with 96 percent accuracy. Looking for a challenge, our team of data scientists wanted to take this idea a step further to see if we use GANs to generate images of race cars—realistic images of race cars.
[Related Article: Latest Developments in GANs]
Overview of Steps
For those of you not that familiar with GANs, a GAN consists of two neural networks: a generator and a discriminator. The generator produces fake data while the discriminator tries to differentiate between the fake and real data.
The generator improves by producing data which can fool the discriminator, in theory eventually producing fake data which is indistinguishable from the real data.
Although we experimented briefly with a Deep Convolutional GAN, we settled on using a Conditional GAN (CGAN) because it would allow us to produce multi-class images (images of different cars) at one time. A CGAN is a GAN which takes some extra information, y, into both the generator and the discriminator. Since we wanted to generate images of four cars, the extra information we added was a one-hot vector of length four, indicating which car we wanted produced in the image.
We trained this model on 64*64*3 size images, with a batch size of 32 and 60 epochs. Although the images did come out looking like cars, the model generated images with the same color composition for all four classes.
We diagnosed this problem as mode collapse, a common problem with GANs. This is caused by the generator producing the same or nearly same images every time while still being able to fool the discriminator.
We were not discouraged, however, and implemented experience replay to combat this. Experience replay appends a randomly generated image in the sample array after each epoch and is seen as an effective way to prevent mode collapse. After retraining it with the experience replay, the results had more than one color composition and more distinctions between classes.
The last thing we modified about the GAN was adding spectral normalization. Spectral normalization is a weight normalization technique which is used to stabilize the training of a discriminator. Using spectral normalization, the results continued to improve.
If you are interested in learning more about spectral normalization, check out this in-depth article.
Results
[Related Article: 6 Unique GANs Use Cases]
With this new and improved Spectral-Normalized Conditional GAN using experience relay, we looked at the results in two ways. First, we got creative using the Structural Similarity Index Metric (SSIM) score. This may not be standard practice for evaluating GANs, but the SSIM score is a similarity score between images. Therefore, for each fake image, we averaged out its SSIM score compared to all the real images. This would allow you to select the best fake images by looking for a minimal average SSIM score. Below are the generated set of images for the four different classes of cars.
Another more standard way we previously evaluated this was training a classifier model on the generated data and testing on the original data. Training on 4000 generated images from each class (16,000 total), we were able to predict the original images’ class with the accuracies seen in the confusion matrix below.
Overall the model had an average of 89.6% accuracy.
Clearly, it is possible to generate accurate enough images of race cars in order to train a model to identify them. To read more about this project and details around how we trained the models, read our white paper: Privately Training an AI Model Using Fake Images Generated by Generative Adversarial Networks.
You can also learn more about other AI applications for businesses in all industries: https://www.wwt.com/article/uncovering-ai-research-development.
Yoni Malchi
Yoni Malchi is a Senior Engagement Manager in the Business and Analytics Advisory (BAA) practice at WWT. He leads AI engagements with key customers bridging the gap between the business and technology teams. Yoni also leads the AI R&D efforts for the BAA team which researches cutting-edge AI techniques, tools, and platforms to provide differentiated recommendations to our clients.
Achal Sharma:
Achal Sharma has 5 months of experience as a Data Analyst at WWT and overall experience of more than 4 years. He holds a Bachelors in electronics and communication engineering. Achal has deep expertise in creating end to end data ingestion and analytics pipeline, creating text analytics and predictive analytics solutions for various industries.
Ajay Dadheech:
Ajay Dadheech is a Data Science Manager for the Business and Analytics Advisory (BAA) practice at World Wide Technology (WWT). He has experience in Business Intelligence for Retail Chain, Logistics and Life Sciences.
Mohammad Faisal:
Mohammad Faisal is a Data Engineer for the Business and Analytics Advisory (BAA) practice at WWT. He enjoys Databases, Qualtrics, Process Automation, and Big Data.
Jimmy Lemkemeier:
Jimmy Lemkemeier is an Analyst for the Business and Analytics Advisory (BAA) practice at WWT. He is interested in Machine Learning especially when it involves computer vision and is currently interested in the problem of explaining convolutional neural network’s decisions.