Thursday, January 9, 2025
Google search engine
HomeData Modelling & AIAutomating Image Annotation with MAX

Automating Image Annotation with MAX

This blogpost introduces automating image annotation with MAX (Model Asset Exchange). To learn more about how our deep learning models are created, containerized, and deployed to production, come join our training at ODSC West 2019: Deploying Deep Learning Models as Microservices

Introduction

The Model Asset Exchange (MAX) is an open source collection of 30+ deep learning models that are free and ready to be used. These models have been wrapped in an easy-to-use API to allow any developer to use the models. Additionally, the MAX model APIs are available as Docker images, which makes deploying and scaling an easy task. No deep learning knowledge required!

We will illustrate the power of two MAX models for social media image processing. More specifically, we will use the image caption generation and image style transfer in Python.

  1. Generating a caption for an image

Here, we will use a MAX model deployed to a public instance to generate a fitting caption for our input image. The appropriate model here is the MAX Image Caption Generator.

Note: If you want to make a lot of queries to the model, or if you want to use this model offline (e.g. as part of an application), it’s usually a good idea to download the model as a Docker container. If you have Docker installed, it only takes one line of code!

this model’s instruction page | more info on Medium 

Below, you can find an illustration of the neural network’s architecture for the Image Caption Generator. Although this may look complex at first, using this model with MAX is extremely easy!

This image is sourced from the [Show and Tell](https://github.com/tensorflow/models/tree/master/research/im2txt) publication, which forms the backbone of the MAX Image Caption Generator.

The usual Python model querying process takes three steps:

  1. Specify the Model URL
  2. Upload the input image to the model
  3. Parse the output of the model

Let’s explore the public instance of the model at the url below. Clicking on this url takes us to the ‘swagger’ API of the model, which already carries a lot of information about the model.

http://max-image-caption-generator.max.us-south.containers.appdomain.cloud/

Next, we can access the model with Python as follows. 

 

 # Load in the required Python libraries
import requests
# 1. Send an image through the network:
# The served model: MAX-Image-Caption-Generator
model_endpoint = 'http://max-image-caption-generator.max.us-south.containers.appdomain.cloud/' + 'model/predict'
# Upload an image to the MAX model's rest API
with open(my_image, 'rb') as file:
    file_form = {'image': (my_image, file, 'image/jpeg')} # note: set 'jpeg' to 'png' if working with a png image
    # Post the image to the rest API using the requests library
    r = requests.post(url=model_endpoint, files=file_form)
    # Return the JSON
    response = r.json()
    
# Show the output
print('----OUTPUT CAPTIONS----\n')
for i, x in enumerate(response['predictions']):
    print(str(i+1)+'.', x['caption']
          
# 2. Extract the caption from the output
my_caption = response['predictions'][0]['caption']
print(my_caption)

 

Feeding the model the image below, will result in the following caption:

“a man riding a wave on top of a surfboard”

 

(image source: pexels.com)

Now that we have a caption, we can use the TextBlob and NLTK library for Python to remove stopwords (such as ‘and’, ‘the’, ‘it’, ‘a’, ‘on’, etc.) from the sentence. The remaining words are keywords and could be used as potential social media hashtags.

 

# Load in the required Python libraries
import nltk
from textblob import TextBlob
from nltk.corpus import stopwords
nltk.download('stopwords')

def remove_stopwords(sentence):
    """Remove stopwords from a sentence and return the list of words."""
    blob = TextBlob(sentence)
    return [word for word in blob.words if word not in stopwords.words('english') and len(word)>2]
tags = remove_stopwords(my_caption)

In this case, this would result in the following hashtags:
[‘man’, ‘riding’, ‘wave’, ‘top’, ‘surfboard’]

  1. Restyling an image

Next, we will perform an image style-transfer to our chosen input image. The code is very similar to generating the image caption. The difference here is that we expect an image as return instead of a JSON formatted string. For this reason, we will have to use the Pillow and io library for Python.

# Load in the required Python libraries
import requests
from PIL import Image
from io import BytesIO
# Specify the model endpoint url. This is the API to which we will send the input data to.
model_endpoint = 'http://max-fast-neural-style-transfer.max.us-south.containers.appdomain.cloud/' + 'model/predict'
# Choose the style as a parameter in the API url (only pick one)
model_endpoint += '?model=mosaic'
# model_endpoint += '?model=candy'
# model_endpoint += '?model=rain_princess'
# model_endpoint += '?model=udnie'
# Uploading the image to the model
with open(my_image, 'rb') as file:
    file_form = {'image': (my_image, file, 'image/jpeg')}
    # Post the image to the rest API using the requests library
    response = requests.post(url=model_endpoint, files=file_form)
    # Load the output image into memory
    output_image = Image.open(BytesIO(response.content))
# Show the output image
output_image.show()

Four style-transfers are possible, and the results are shown below.

  • Mosaic

  • Candy

 Image Annotation with MAX

  • Rain Princess

 Image Annotation with MAX

  • Udnie

 Image Annotation with MAX

Which style do you prefer? Meet us at ODSC West 2019 during our session “Deploying Deep Learning Models as Microservices”, and let us know!


Authors:

Gabriela de Queiroz

Gabriela de Queiroz is a Sr. Engineering & Data Science Manager/Sr. Developer Advocate at IBM where she leads and manages a team of data scientists and software engineers to contribute to open source and artificial intelligence projects. She works in different open source projects and is actively involved with several organizations to foster an inclusive community. She is the founder of R-Ladies, a worldwide organization for promoting diversity in the R community with more than 175 chapters in 45+ countries. She is now working to make AI more diverse and inclusive in her new organization, AI Inclusive. She has worked in several startups where she built teams, developed statistical and machine learning models and employed a variety of techniques to derive insights and drive data-centric decisions.

Website: https://k-roz.com/

Saishruthi Swaminathan

Saishruthi Swaminathan is a developer advocate and data scientist in the IBM CODAIT team, whose main focus is to democratize data and AI through open source technologies. Her passion is to dive deep into the ocean of data, extract insights, and use AI for social good. Previously, she worked as a software developer. On a mission to spread the knowledge and experience, she acquired in her learning process. She also leads education for rural children initiative and organizing meetups focusing on women empowerment. She has a master’s in electrical engineering, specializing in data science and a bachelor’s degree in electronics and instrumentation. She can be found on “LinkedIn”:https://www.linkedin.com/in/saishruthi-swaminathan/ and “Medium”:https://medium.com/@saishruthi.tn.

Simon Plovyt

Simon is a Developer Advocate at the Center for Open-Source Data & AI Technologies. Previously, he worked as a machine learning consultant in Europe, and was with UC San Francisco before that. Simon holds a master’s degree in Bioinformatics engineering, and a Bachelor’s degree in molecular biology.

Linkedin: https://www.linkedin.com/in/splovyt/

Twitter: https://twitter.com/plovyts

Medium: https://medium.com/@splovyt/

RELATED ARTICLES

Most Popular

Recent Comments