This blogpost introduces automating image annotation with MAX (Model Asset Exchange). To learn more about how our deep learning models are created, containerized, and deployed to production, come join our training at ODSC West 2019: Deploying Deep Learning Models as Microservices.
Introduction
The Model Asset Exchange (MAX) is an open source collection of 30+ deep learning models that are free and ready to be used. These models have been wrapped in an easy-to-use API to allow any developer to use the models. Additionally, the MAX model APIs are available as Docker images, which makes deploying and scaling an easy task. No deep learning knowledge required!
We will illustrate the power of two MAX models for social media image processing. More specifically, we will use the image caption generation and image style transfer in Python.
- Generating a caption for an image
Here, we will use a MAX model deployed to a public instance to generate a fitting caption for our input image. The appropriate model here is the MAX Image Caption Generator.
Note: If you want to make a lot of queries to the model, or if you want to use this model offline (e.g. as part of an application), it’s usually a good idea to download the model as a Docker container. If you have Docker installed, it only takes one line of code!
this model’s instruction page | more info on Medium
Below, you can find an illustration of the neural network’s architecture for the Image Caption Generator. Although this may look complex at first, using this model with MAX is extremely easy!
This image is sourced from the [Show and Tell](https://github.com/tensorflow/models/tree/master/research/im2txt) publication, which forms the backbone of the MAX Image Caption Generator.
The usual Python model querying process takes three steps:
- Specify the Model URL
- Upload the input image to the model
- Parse the output of the model
Let’s explore the public instance of the model at the url below. Clicking on this url takes us to the ‘swagger’ API of the model, which already carries a lot of information about the model.
http://max-image-caption-generator.max.us-south.containers.appdomain.cloud/
Next, we can access the model with Python as follows.
# Load in the required Python libraries import requests # 1. Send an image through the network: # The served model: MAX-Image-Caption-Generator model_endpoint = 'http://max-image-caption-generator.max.us-south.containers.appdomain.cloud/' + 'model/predict' # Upload an image to the MAX model's rest API with open(my_image, 'rb') as file: file_form = {'image': (my_image, file, 'image/jpeg')} # note: set 'jpeg' to 'png' if working with a png image # Post the image to the rest API using the requests library r = requests.post(url=model_endpoint, files=file_form) # Return the JSON response = r.json() # Show the output print('----OUTPUT CAPTIONS----\n') for i, x in enumerate(response['predictions']): print(str(i+1)+'.', x['caption'] # 2. Extract the caption from the output my_caption = response['predictions'][0]['caption'] print(my_caption)
Feeding the model the image below, will result in the following caption:
“a man riding a wave on top of a surfboard”
(image source: pexels.com)
Now that we have a caption, we can use the TextBlob and NLTK library for Python to remove stopwords (such as ‘and’, ‘the’, ‘it’, ‘a’, ‘on’, etc.) from the sentence. The remaining words are keywords and could be used as potential social media hashtags.
# Load in the required Python libraries import nltk from textblob import TextBlob from nltk.corpus import stopwords nltk.download('stopwords') def remove_stopwords(sentence): """Remove stopwords from a sentence and return the list of words.""" blob = TextBlob(sentence) return [word for word in blob.words if word not in stopwords.words('english') and len(word)>2] tags = remove_stopwords(my_caption)
In this case, this would result in the following hashtags:
[‘man’, ‘riding’, ‘wave’, ‘top’, ‘surfboard’]
- Restyling an image
Next, we will perform an image style-transfer to our chosen input image. The code is very similar to generating the image caption. The difference here is that we expect an image as return instead of a JSON formatted string. For this reason, we will have to use the Pillow and io library for Python.
# Load in the required Python libraries import requests from PIL import Image from io import BytesIO # Specify the model endpoint url. This is the API to which we will send the input data to. model_endpoint = 'http://max-fast-neural-style-transfer.max.us-south.containers.appdomain.cloud/' + 'model/predict' # Choose the style as a parameter in the API url (only pick one) model_endpoint += '?model=mosaic' # model_endpoint += '?model=candy' # model_endpoint += '?model=rain_princess' # model_endpoint += '?model=udnie' # Uploading the image to the model with open(my_image, 'rb') as file: file_form = {'image': (my_image, file, 'image/jpeg')} # Post the image to the rest API using the requests library response = requests.post(url=model_endpoint, files=file_form) # Load the output image into memory output_image = Image.open(BytesIO(response.content)) # Show the output image output_image.show()
Four style-transfers are possible, and the results are shown below.
- Mosaic
- Candy
- Rain Princess
- Udnie
Which style do you prefer? Meet us at ODSC West 2019 during our session “Deploying Deep Learning Models as Microservices”, and let us know!
Authors:
Gabriela de Queiroz
Gabriela de Queiroz is a Sr. Engineering & Data Science Manager/Sr. Developer Advocate at IBM where she leads and manages a team of data scientists and software engineers to contribute to open source and artificial intelligence projects. She works in different open source projects and is actively involved with several organizations to foster an inclusive community. She is the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 175 chapters in 45+ countries. She is now working to make AI more diverse and inclusive in her new organization, AI Inclusive. She has worked in several startups where she built teams, developed statistical and machine learning models and employed a variety of techniques to derive insights and drive data-centric decisions.
Website: https://k-roz.com/
Saishruthi Swaminathan
Saishruthi Swaminathan is a developer advocate and data scientist in the IBM CODAIT team, whose main focus is to democratize data and AI through open source technologies. Her passion is to dive deep into the ocean of data, extract insights, and use AI for social good. Previously, she worked as a software developer. On a mission to spread the knowledge and experience, she acquired in her learning process. She also leads education for rural children initiative and organizing meetups focusing on women empowerment. She has a master’s in electrical engineering, specializing in data science and a bachelor’s degree in electronics and instrumentation. She can be found on “LinkedIn”:https://www.linkedin.com/in/saishruthi-swaminathan/ and “Medium”:https://medium.com/@saishruthi.tn.
Simon Plovyt
Simon is a Developer Advocate at the Center for Open-Source Data & AI Technologies. Previously, he worked as a machine learning consultant in Europe, and was with UC San Francisco before that. Simon holds a master’s degree in Bioinformatics engineering, and a Bachelor’s degree in molecular biology.
Linkedin: https://www.linkedin.com/in/splovyt/
Twitter: https://twitter.com/plovyts
Medium: https://medium.com/@splovyt/