Saturday, November 16, 2024
Google search engine
HomeData Modelling & AIFinding chairs the data scientist way! (Hint: using Deep Learning) – Part...

Finding chairs the data scientist way! (Hint: using Deep Learning) – Part I

Introduction

I have been going through the deep learning literature for quite some time now. I have also participated in a few challenges to get my hands dirty.

But what I enjoy the most is to apply deep learning in a real life problem. A real life problem which encompasses my daily life. This is partly why I picked up this problem of chair count recognition, to finally solve a problem which was unsolved till now!

In this article, I will cover how I defined the problem. I will also mention what were the steps I took to solve the problem. Consider it as a raw uncut version of my experience as I tried to solve the problem 🙂

 

Table of Contents

  • Why did I choose to solve the problem?
  • Simplifying the problem: Chair Recognition in clear image
  • Taking a step further: Detecting chair location
    • Overview of YOLO: a state-of-the-art Object Detection technique
    • Applying YOLO for Chair Detection
  • Challenges and Future Steps

 

Why did I choose to solve this problem?

Let me provide a bit of background about why I wanted to count chairs in a photograph.

At Analytics Vidhya, we usually have 10-15 people in the office. But in summers, interns crowd our habitat. So, if we have to do an all team meetings in Summers – we end up pulling chairs from all other rooms

Given my laziness, I thought – what if there was an algorithm which could suggest us which room has an unoccupied chair? This will save us the hassle of going from one room to the other in search of a chair.

This seemed to be a simple and mundane enough problem, but I saw it as a chance to try out my newly acquired prowess! Can deep learning be used to solve this problem? Well, honestly I don’t know how much it could help, but no harm in trying it out right?

 

Tasks in the problem

Now you know what the problem was, let me explain to you my thought process in solving the problem. We can break down the problem into four tasks –

  • If we have a video feed of the room, is there a chair present in the room?
  • If there is a chair present, where can we find the chair in the room?
  • Is the chair occupied? If there are unoccupied chairs, how many of them are present in the room?
  • From which room should we bring an unoccupied chair?

I decided that I should go from a comparatively easy problem to a more complex problem to reach my goal. That is the reason I divided the problem into these specific four tasks. In this article, I will cover how I attempted the first two tasks and then in the subsequent article, I will show you my attempts for the next two problems.

 

Simplifying the problem: Chair Recognition in clear image

The first and the simplest task for our problem is to find out whether we have a chair in the picture clicked in a room. As of now, I simplified the problem by ignoring the need of video feed by manually taking pictures of the room.

For example, if I give you two images, can you tell which one is of a chair?

    

If you have guessed correctly, it’s the first one. So how did you guess it?

You have probably seen a chair so many times that it is not difficult for you to infer if there is a chair in the image or not. In short, you have prior knowledge of what a chair looks like in reality. Similarly, we can have a trained artificial neural network which can do the exact thing for us.

By the way, we choose to use artificial neural network over other algorithms because right now, Neural nets are the most powerful and state-of-the-art techniques for solving image processing problems.

So what I did was, I took an out-of-the-box pre-trained neural network and applied it to these images. This network was previously trained on ImageNet dataset, which has an assortment of all sorts of classes that are found in the reality.

But there was an issue when I let the model recognize an object in the image. It could not correctly classify what object was present in the image. For example, here is an output for the image given below

[[('n03179701', 'desk', 0.56483036), ('n03337140', 'file', 0.14689149), ('n04550184', 'wardrobe', 0.03918023)]]

On the contrary, it predicted that the image contained a desk rather than a chair. This seemed disheartening because a desk and a chair have very few similarities. A desk is much broader in shape than a chair.

 

 

Solving the chair – desk problem

As mentioned in m previous article, whenever I encounter a problem when building neural networks, I go through a stepwise approach to tackle the issue. I’ll just list down the steps:

Step 1: Check the architecture
Step 2: Check the hyper-parameters of neural network
Step 3: Check the Complexity of network
Step 4: Check the Structure of Input data
Step 5: Check the Distribution of data

Here after evaluation, I found that the image input I was giving to the model was incorrect. I was not properly handling the aspect ratio of the image. So to take care of this, I added a custom code which was mentioned in one the keras’s issues on github. The updated image looked like this.

After taking care of the issue, the model started working correctly and giving out right results.

[[('n02791124', 'barber_chair', 0.77817303), ('n03179701', 'desk', 0.090379775), ('n03337140', 'file', 0.033129346)]]

 

Taking a step further: Detecting chair location

Now that we have recognized that our image contains a chair, the next step was to identify where in the image is the chair present. Along with the chair, we also have to recognize and identify a person in the image. We need to identify a person to discern the occupancy of the chair. Both of these tasks (task 2 and task 3 respectively) will help us to solve a much bigger task of finding out if the chair is occupied or not.

For this too, as with the previous task, we will use a pre-trained network which will give us an acceptable score out-of-the-box. For object detection, currently, YOLO network is one the best models which gives a great performance in real time. I have covered a bit about YOLO and how it works in this article. Let us look at how we can leverage this to solve our problem.

 

Applying YOLO for Chair Detection

To setup YOLO in the system, the following simple steps can be followed:

Step 1:

git clone https://github.com/pjreddie/darknet
cd darknet
make

Step 2:

wget https://pjreddie.com/media/files/yolo.weights

Now to run this to solve our problem, you have to type the below command and give the location of your own image

./darknet detect cfg/yolo.cfg yolo.weights ../../data/image.jpg

After applying YOLO on our images, I saw that it gave pretty good results. Let me show you some examples of what it can do.

                                                                  

 

Challenges and Future Steps

Although we have a decent start, there are still some issues which would hinder the deployment of the project as a full-fledged product. I will list down a few of them:

Issue 1

The YOLO model still made some mistakes, i.e. it was not a 100% accurate model. For example, in the image below; even a dustbin is categorized as a person!

 

 

Issue 2

What if in an image, a chair obstructs the view of another chair? Would our algorithm be able to identify the hidden chair? This is a point to ponder upon.

 

Along with these issues, there are some more practical implementation details, like how much time does our algorithm take to recommend a solution, what kind of hardware does it require to run etc. These all things are certainly to be considered before selling our algorithm as a product!

Also, as I said earlier that we have only considered the first two tasks and haven’t touched upon the next two tasks. Our next steps would be to identify the count of chairs in the room and then build an end-to-end product.

 

End Notes

In this article, I described my personal experience of solving a real life problem. This article covers object detection and recognition in an image; the object specifically being a chair.  For recognition, we used a simple pre-trained model for predicting the object in an image. On the other hand, for detection, we used YOLO, which is a state-of-the-art real time technique for object detection.

I will continue on with chair count in the next part of the article, where we will cover how to calculate the count of chairs. I hope this will help you solve your own problem someday. Good luck!

Learn, compete, hack and get hired!

RELATED ARTICLES

Most Popular

Recent Comments