How to use Vision API from Google Cloud | Set-2

23 July 2024

2

Prerequisite: Create a Virtual Machine and set up an API in Google Cloud

In the previous article we have seen how to use Facial Detection, Logo Detection, Label Detection and Landmark Detection features of Vision using Vision API, now let’s see few more features like Optical Character Recognition, handwritten text detection, Image Properties Detection etc.

Text Detection (Optical Character Recognition):

It detects and extracts text from within an image.

import os 
import io 
from google.cloud import vision 
from matplotlib import pyplot as plt 
from matplotlib import patches as pch 
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
     os.path.join(os.curdir, 'credentials.json') 
  
client = vision.ImageAnnotatorClient() 
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image: 
    content = image.read() 
      
image = vision.types.Image(content = content) 
response = client.text_detection(image = image) 
texts = response.text_annotations 
  
a = plt.imread(f) 
fig, ax = plt.subplots(1) 
ax.imshow(a) 
  
for text in texts: 
    print(text.description) 
      
    vertices = ([(vertex.x, vertex.y)  
                 for vertex in text.bounding_poly.vertices])  
    
    print('Vertices covering text: {}\n\n'.format(vertices)) 
    rect = pch.Rectangle(vertices[0], (vertices[1][0] - vertices[0][0]),  
                        (vertices[2][1] - vertices[0][1]), linewidth = 1,  
                                       edgecolor ='r', facecolor ='none')  
    ax.add_patch(rect) 
      
plt.show() 

The above code extracts text from a given image and also prints the coordinates of the vertices of the rectangle containing the text.
For example, when the following image is given as input:

Output:

MY MORNING
ROUTINE
How Successful People Start
Every Day Inspired
BENJAMIN SPALL and MICHAEL XANDER

Vertices covering text: [(38, 71), (348, 71), (348, 602), (38, 602)]


MY
Vertices covering text: [(46, 71), (108, 82), (100, 128), (38, 117)]


MORNING
Vertices covering text: [(129, 79), (348, 118), (338, 170), (120, 131)]


ROUTINE
Vertices covering text: [(96, 135), (292, 170), (283, 219), (87, 184)]


How
Vertices covering text: [(68, 200), (101, 205), (98, 221), (65, 216)]


Successful
Vertices covering text: [(104, 207), (196, 222), (193, 238), (101, 224)]


People
Vertices covering text: [(202, 222), (257, 231), (254, 251), (199, 242)]


Start
Vertices covering text: [(265, 232), (311, 239), (309, 255), (262, 248)]


Every
Vertices covering text: [(112, 238), (155, 246), (152, 265), (109, 258)]


Day
Vertices covering text: [(160, 246), (189, 251), (185, 271), (157, 266)]


Inspired
Vertices covering text: [(194, 251), (262, 263), (258, 283), (191, 271)]


BENJAMIN
Vertices covering text: [(57, 534), (118, 546), (115, 561), (54, 549)]


SPALL
Vertices covering text: [(122, 550), (160, 558), (157, 572), (119, 564)]


and
Vertices covering text: [(165, 560), (185, 564), (182, 577), (162, 573)]


MICHAEL
Vertices covering text: [(190, 564), (250, 576), (247, 590), (187, 578)]


XANDER
Vertices covering text: [(254, 575), (311, 587), (308, 602), (251, 591)]

Document/handwritten text detection:

This feature also performs Optical Character Recognition on dense documents, including handwritings.

import os 
import io 
from google.cloud import vision 
from matplotlib import pyplot as plt 
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
     os.path.join(os.curdir, 'credentials.json') 
  
client = vision.ImageAnnotatorClient() 
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image: 
    content = image.read() 
      
image = vision.types.Image(content = content) 
response = client.document_text_detection(image = image) 
  
a = plt.imread(f) 
plt.imshow(a) 
  
txt = [] 
for page in response.full_text_annotation.pages: 
        for block in page.blocks: 
            print('\nConfidence: {}%\n'.format(block.confidence * 100)) 
            for paragraph in block.paragraphs: 
  
                for word in paragraph.words: 
                    word_text = ''.join([symbol.text for symbol in word.symbols]) 
                    txt.append(word_text) 
                      
print(txt) 

The above code identifies and extracts handwritten texts from an image and outputs it.
For example, when we give the following image as input:

Output:

Block confidence: 97.00000286102295%

['Geeks', 'for', 'Geeks', 'A', 'computer', 'science', 'portal', 'for', 'Geeks', '.']

Image Properties Detection:

This feature detects the general attributes of an image, like dominant color.

import os 
import io 
from google.cloud import vision 
from matplotlib import pyplot as plt 
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] =
      os.path.join(os.curdir, 'credentials.json') 
  
client = vision.ImageAnnotatorClient() 
  
f = 'image_filename.jpeg'
with io.open(f, 'rb') as image: 
    content = image.read() 
      
image = vision.types.Image(content = content) 
  
response = client.image_properties(image = image) 
properties = response.image_properties_annotation 
  
a = plt.imread(f) 
plt.imshow(a) 
  
for color in properties.dominant_colors.colors: 
        print('fraction: {}'.format(color.pixel_fraction)) 
        print('\tr: {}'.format(color.color.red)) 
        print('\tg: {}'.format(color.color.green)) 
        print('\tb: {}'.format(color.color.blue)) 

The code takes an image as in input and returns its color properties i.e. amount of red, green and blue colors. For example, when the following image is given as input:

Output:

fraction: 0.036332178860902786
        r: 5.0
        g: 185.0
        b: 6.0
fraction: 0.03337658569216728
        r: 131.0
        g: 207.0
        b: 13.0
fraction: 0.029988465830683708
        r: 253.0
        g: 169.0
        b: 5.0
fraction: 0.0262399073690176
        r: 254.0
        g: 123.0
        b: 5.0
fraction: 0.03553921729326248
        r: 253.0
        g: 248.0
        b: 12.0
fraction: 0.02104959636926651
        r: 249.0
        g: 36.0
        b: 6.0
fraction: 0.024581892415881157
        r: 3.0
        g: 35.0
        b: 188.0
fraction: 0.03424163907766342
        r: 6.0
        g: 122.0
        b: 200.0
fraction: 0.027032872661948204
        r: 140.0
        g: 32.0
        b: 185.0
fraction: 0.029411764815449715
        r: 10.0
        g: 177.0
        b: 217.0

Safe Search Properties Detection:

Detects explicit content such as adult content or violent content within an image. This feature uses five categories (“adult”, “spoof”, “medical”, “violence”, and “racy”) and returns the likelihood that each is present in a given image.

import os 
import io 
from google.cloud import vision 
from matplotlib import pyplot as plt 
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
    os.path.join(os.curdir, 'credentials.json') 
  
client = vision.ImageAnnotatorClient() 
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image: 
    content = image.read() 
      
image = vision.types.Image(content = content) 
  
a = plt.imread(f) 
plt.imshow(a) 
  
response = client.safe_search_detection(image = image) 
safe = response.safe_search_annotation 
  
likelihood_name = ('UNKNOWN', 'VERY_UNLIKELY', 'UNLIKELY', 
                   'POSSIBLE', 'LIKELY', 'VERY_LIKELY') 
  
print('Adult: {}'.format(likelihood_name[safe.adult])) 
print('Medical: {}'.format(likelihood_name[safe.medical])) 
print('Spoofed: {}'.format(likelihood_name[safe.spoof])) 
print('Violence: {}'.format(likelihood_name[safe.violence])) 
print('Racy: {}'.format(likelihood_name[safe.racy])) 

Given an image, the code will determine the probability of it being an image with graphics or adult content.

Object Detection:

Detects and extracts multiple objects from an image. It localizes multiple objects and returns their coordinates.

import os 
import io 
from google.cloud import vision 
from matplotlib import pyplot as plt 
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
      os.path.join(os.curdir, 'credentials.json') 
  
client = vision.ImageAnnotatorClient() 
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image: 
    content = image.read() 
      
image = vision.types.Image(content = content) 
  
a = plt.imread(f) 
plt.imshow(a) 
  
response = client.object_localization(image = image) 
objects = response.localized_object_annotations 
  
print('Number of objects found: ', len(objects)) 
for object_ in objects: 
    print('Object: ', object_.name) 
    print('Confidence: ', object_.score) 

For example, when we input the following image:

Output:

Number of objects found:  1
Object:  Scissors
Confidence:  0.540185272693634

For more information, visit Cloud Vision API documentation here.

How to use Vision API from Google Cloud | Set-2

Text Detection (Optical Character Recognition):

Document/handwritten text detection:

Image Properties Detection:

Safe Search Properties Detection:

Object Detection:

Java Program for Longest Common Subsequence

Maximum height of Tree when any Node can be considered as Root

Print Fibonacci sequence using 2 variables

LEAVE A REPLY Cancel reply

Most Popular

7 Best Free Antiviruses for Mac in 2025: Fully Tested & Reliable by Katarina Glamoslija

Google is ready to open the app floodgates for Android Auto

Samsung’s Sketch to Image is going multimodal with One UI 7

Long-pressing the back button in Android 16 could trigger this new animation

Recent Comments

EDITOR PICKS

7 Best Free Antiviruses for Mac in 2025: Fully Tested & Reliable by Katarina Glamoslija

Google is ready to open the app floodgates for Android Auto

Samsung’s Sketch to Image is going multimodal with One UI 7

POPULAR POSTS

7 Best Free Antiviruses for Mac in 2025: Fully Tested & Reliable by Katarina Glamoslija

Google is ready to open the app floodgates for Android Auto

Samsung’s Sketch to Image is going multimodal with One UI 7

POPULAR CATEGORY

ABOUT US

FOLLOW US