Wednesday, January 15, 2025
Google search engine
HomeLanguagesHow to use Vision API from Google Cloud | Set-2

How to use Vision API from Google Cloud | Set-2

Prerequisite: Create a Virtual Machine and set up an API in Google Cloud

In the previous article we have seen how to use Facial Detection, Logo Detection, Label Detection and Landmark Detection features of Vision using Vision API, now let’s see few more features like Optical Character Recognition, handwritten text detection, Image Properties Detection etc.

Text Detection (Optical Character Recognition):

It detects and extracts text from within an image.




import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
from matplotlib import patches as pch
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
     os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
response = client.text_detection(image = image)
texts = response.text_annotations
  
a = plt.imread(f)
fig, ax = plt.subplots(1)
ax.imshow(a)
  
for text in texts:
    print(text.description)
      
    vertices = ([(vertex.x, vertex.y) 
                 for vertex in text.bounding_poly.vertices]) 
    
    print('Vertices covering text: {}\n\n'.format(vertices))
    rect = pch.Rectangle(vertices[0], (vertices[1][0] - vertices[0][0]), 
                        (vertices[2][1] - vertices[0][1]), linewidth = 1
                                       edgecolor ='r', facecolor ='none'
    ax.add_patch(rect)
      
plt.show()


The above code extracts text from a given image and also prints the coordinates of the vertices of the rectangle containing the text.
For example, when the following image is given as input:

Output:

MY MORNING
ROUTINE
How Successful People Start
Every Day Inspired
BENJAMIN SPALL and MICHAEL XANDER

Vertices covering text: [(38, 71), (348, 71), (348, 602), (38, 602)]


MY
Vertices covering text: [(46, 71), (108, 82), (100, 128), (38, 117)]


MORNING
Vertices covering text: [(129, 79), (348, 118), (338, 170), (120, 131)]


ROUTINE
Vertices covering text: [(96, 135), (292, 170), (283, 219), (87, 184)]


How
Vertices covering text: [(68, 200), (101, 205), (98, 221), (65, 216)]


Successful
Vertices covering text: [(104, 207), (196, 222), (193, 238), (101, 224)]


People
Vertices covering text: [(202, 222), (257, 231), (254, 251), (199, 242)]


Start
Vertices covering text: [(265, 232), (311, 239), (309, 255), (262, 248)]


Every
Vertices covering text: [(112, 238), (155, 246), (152, 265), (109, 258)]


Day
Vertices covering text: [(160, 246), (189, 251), (185, 271), (157, 266)]


Inspired
Vertices covering text: [(194, 251), (262, 263), (258, 283), (191, 271)]


BENJAMIN
Vertices covering text: [(57, 534), (118, 546), (115, 561), (54, 549)]


SPALL
Vertices covering text: [(122, 550), (160, 558), (157, 572), (119, 564)]


and
Vertices covering text: [(165, 560), (185, 564), (182, 577), (162, 573)]


MICHAEL
Vertices covering text: [(190, 564), (250, 576), (247, 590), (187, 578)]


XANDER
Vertices covering text: [(254, 575), (311, 587), (308, 602), (251, 591)]

Document/handwritten text detection:

This feature also performs Optical Character Recognition on dense documents, including handwritings.




import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
     os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
response = client.document_text_detection(image = image)
  
a = plt.imread(f)
plt.imshow(a)
  
txt = []
for page in response.full_text_annotation.pages:
        for block in page.blocks:
            print('\nConfidence: {}%\n'.format(block.confidence * 100))
            for paragraph in block.paragraphs:
  
                for word in paragraph.words:
                    word_text = ''.join([symbol.text for symbol in word.symbols])
                    txt.append(word_text)
                      
print(txt)


The above code identifies and extracts handwritten texts from an image and outputs it.
For example, when we give the following image as input:

Output:

Block confidence: 97.00000286102295%

['Geeks', 'for', 'Geeks', 'A', 'computer', 'science', 'portal', 'for', 'Geeks', '.']

Image Properties Detection:

This feature detects the general attributes of an image, like dominant color.




import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] =
      os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpeg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
  
response = client.image_properties(image = image)
properties = response.image_properties_annotation
  
a = plt.imread(f)
plt.imshow(a)
  
for color in properties.dominant_colors.colors:
        print('fraction: {}'.format(color.pixel_fraction))
        print('\tr: {}'.format(color.color.red))
        print('\tg: {}'.format(color.color.green))
        print('\tb: {}'.format(color.color.blue))


The code takes an image as in input and returns its color properties i.e. amount of red, green and blue colors. For example, when the following image is given as input:

Output:

fraction: 0.036332178860902786
        r: 5.0
        g: 185.0
        b: 6.0
fraction: 0.03337658569216728
        r: 131.0
        g: 207.0
        b: 13.0
fraction: 0.029988465830683708
        r: 253.0
        g: 169.0
        b: 5.0
fraction: 0.0262399073690176
        r: 254.0
        g: 123.0
        b: 5.0
fraction: 0.03553921729326248
        r: 253.0
        g: 248.0
        b: 12.0
fraction: 0.02104959636926651
        r: 249.0
        g: 36.0
        b: 6.0
fraction: 0.024581892415881157
        r: 3.0
        g: 35.0
        b: 188.0
fraction: 0.03424163907766342
        r: 6.0
        g: 122.0
        b: 200.0
fraction: 0.027032872661948204
        r: 140.0
        g: 32.0
        b: 185.0
fraction: 0.029411764815449715
        r: 10.0
        g: 177.0
        b: 217.0

Safe Search Properties Detection:

Detects explicit content such as adult content or violent content within an image. This feature uses five categories (“adult”, “spoof”, “medical”, “violence”, and “racy”) and returns the likelihood that each is present in a given image.




import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
    os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
  
a = plt.imread(f)
plt.imshow(a)
  
response = client.safe_search_detection(image = image)
safe = response.safe_search_annotation
  
likelihood_name = ('UNKNOWN', 'VERY_UNLIKELY', 'UNLIKELY',
                   'POSSIBLE', 'LIKELY', 'VERY_LIKELY')
  
print('Adult: {}'.format(likelihood_name[safe.adult]))
print('Medical: {}'.format(likelihood_name[safe.medical]))
print('Spoofed: {}'.format(likelihood_name[safe.spoof]))
print('Violence: {}'.format(likelihood_name[safe.violence]))
print('Racy: {}'.format(likelihood_name[safe.racy]))


Given an image, the code will determine the probability of it being an image with graphics or adult content.

Object Detection:

Detects and extracts multiple objects from an image. It localizes multiple objects and returns their coordinates.




import os
import io
from google.cloud import vision
from matplotlib import pyplot as plt
  
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 
      os.path.join(os.curdir, 'credentials.json')
  
client = vision.ImageAnnotatorClient()
  
f = 'image_filename.jpg'
with io.open(f, 'rb') as image:
    content = image.read()
      
image = vision.types.Image(content = content)
  
a = plt.imread(f)
plt.imshow(a)
  
response = client.object_localization(image = image)
objects = response.localized_object_annotations
  
print('Number of objects found: ', len(objects))
for object_ in objects:
    print('Object: ', object_.name)
    print('Confidence: ', object_.score)


For example, when we input the following image:

Output:

Number of objects found:  1
Object:  Scissors
Confidence:  0.540185272693634

For more information, visit Cloud Vision API documentation here.

Dominic Rubhabha-Wardslaus
Dominic Rubhabha-Wardslaushttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Recent Comments