This article was published as a part of the Data Science Blogathon.
Overview for Deep Learning for Emojis
Nowadays, we are using several emojis or avatars to show our moods or feelings. They act as nonverbal cues of humans. They become the crucial part of emotion recognition, online chatting, brand emotion, product review, and a lot more. Data science research towards emoji-driven storytelling is increasing repeatedly.
The detection of human emotions from images is very trendy, and possibly due to the technical advancements in computer vision and deep learning. In this deep learning project, to filter and locate the respective emojis or avatars, we will classify the human facial expressions. If you are not familiar with deep learning you can click here.
About the Dataset
The pictorial dataset we are going to use for this project is FER2013 (Facial Expression Recognition 2013). It contains 48*48-pixel grayscale face images. The images are located in the center and occupy the same amount of space. Below is the facial expression categories present in our dataset:
-
0:angry
-
1:disgust
-
2:feat
-
3:happy
-
4:sad
-
5:surprise
-
6:natural
Dataset: Facial Expression Recognition Dataset
Approach:- Firstly, we build a deep learning model which classifies the facial expressions from the pictures. Then we will locate the already classified emotion with an avatar or an emoji.
CNN to Recognize Facial Emotion
Now we will build a convolution neural network(CNN) architecture and feed the FER2013 dataset to the model so that it can recognize emotion from images. We build the CNN model using the Keras layers in various steps. You can see each layer in the below diagram.
To build the network we use two dense layers, one flatten layer and four conv2D layers. We are going to use the Softmax equation to generate the model output.
Prerequisites:- Just download the FER2013 dataset from the provided link. Extract the downloaded dataset in a folder named data with individual train and test directories.
Write below python code on your Jupiter notebook and save it with train.py:
Import the required libraries
import NumPy as np import cv2 from Keras.emotion_models import Sequential from kerasKeras.layers import Dense from Keras.layers import Dropout from Keras.layers import Flatten from Keras. layers import Conv2D from Keras.optimizers import Adam from Keras. layers import MaxPooling2D from Keras.preprocessing.image import ImageDataGenerator
Initialize the training and validation generators:
train_dir = 'data/train'
val_dir = 'data/test'
train_datagen = ImageDataGenerator(rescale=1./255)
val_datagen = ImageDataGenerator(rescale=1./255)
#training generator for CNN
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=(48,48),
batch_size=64,
color_mode="gray_framescale",
class_mode='categorical')
#validation generator for CNN
validation_generator = val_datagen.flow_from_directory(
val_dir,
target_size=(48,48),
batch_size=64,
color_mode="gray_framescale",
class_mode='categorical')
Output
Found 28709 images belonging to 7 classes.
Found 7178 images belonging to 7 classes.
To display the train data
for i in os.listdir("train/"): print(str(len(os.listdir("train/"+i))) +" "+ i +" images")
Output
3995 angry images
436 disgust images
4097 fear images
7215 happy images
4965 neutral images
4830 sad images
3171 surprise images
To display the test data
for i in os.listdir("test/"): print(str(len(os.listdir("test/"+i))) +" "+ i +" images")
Output
958 angry images
111 disgust images
1024 fear images
1774 happy images
1233 neutral images
1247 sad images
831 surprise images
Build the convolution network architecture:
emotion_model = Sequential() emotion_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1)))#output=(48-3+0)/1+1=46 emotion_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu'))#output=(46-3+0)/1+1=44 emotion_model.add(MaxPooling2D(pool_size=(2, 2)))#output=devided input by 2 it means 22,22,64 emotion_model.add(Dropout(0.25))#reduce 25% module at a time of output emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu',input_shape=(48,48,1)))#(22-3+0)/1+1=20 emotion_model.add(MaxPooling2D(pool_size=(2, 2)))#10 emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu'))#(10-3+0)/1+1=8 emotion_model.add(MaxPooling2D(pool_size=(2, 2)))#output=4 emotion_model.add(Dropout(0.25))#nothing change emotion_model.add(Flatten())#here we get multidimension output and pass as linear to the dense so that 4*4*128=2048 emotion_model.add(Dense(1024, activation='relu'))#hddien of 1024 neurons of input emotion_model.add(Dropout(0.5)) emotion_model.add(Dense(7, activation='softmax'))#hddien of 7 neurons of input plot_model(emotion_model, to_file='model_plot.png', show_shapes=True, show_layer_names=True)#save model leyer as model_plot.png emotion_model.summary()
Output
Compile and train the model
emotion_model.compile(loss='categorical_crossentropy',optimizer=Adam(lr=0.0001, decay=1e-6),metrics=['accuracy']) emotion_model_info = emotion_model.fit_generator( #to fetch the model info from validation generator train_generator, steps_per_epoch=28709 // 64, epochs=50, validation_data=validation_generator, validation_steps=7178 // 64)
Output
Save the model weights:
emotion_model.save_weights('model.h5')#to save the model
To detect bounding boxes of face in the webcam and to predict the emotions we use OpenCV Haarcascade xml:
cv2.ocl.setUseOpenCL(False) #emotion dictionary creation em_dict = {0: "Angry", 1: "Disgusted", 2: "Fearful", 3: "Happy", 4: "Neutral", 5: "Sad", 6: "Surprised"} cap = cv2.VideoCapture(0) while True: ret, fram = cap.read() if not ret: break #bounding box initialization bounding_box = cv2.CascadeClassifier('/home/shikha/.local/lib/python3.6/site-packages/cv2/data/haarcascade_frontalface_default.xml') gray_frame = cv2.cvtColor(fram, cv2.COLOR_BGR2gray_frame) #to detect the multiple faces and frame them separately n_faces = bounding_box.detectMultiScale(gray_frame,scaleFactor=1.3, minNeighbors=5) for (x, y, w, h) in n_faces: cv2.rectangle(fram, (x, y-50), (x+w, y+h+10), (255, 0, 0), 2) roi_frame = gray_frame[y:y + h, x:x + w] crop_img = np.expand_dims(np.expand_dims(cv2.resize(roi_frame, (48, 48)), -1), 0) emotion_prediction = emotion_model.predict(crop_img) maxindex = int(np.argmax(emotion_prediction)) cv2.putText(frame, em_dict[maxindex], (x+20, y-60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA) cv2.imshow('Video', cv2.resize(frame,(1200,860),interpolation = cv2.INTER_CUBIC)) if cv2.waitKey(1) & 0xFF == ord('q'): cap.release() cv2.destroyAllWindows() break
Code for GUI and mapping with emojis
Firstly, create a folder with emojis name and then save the images of different facial expression(cartonify images) with respect to the seven emotions which are present in dataset.
Create a Jupiter notebook with the name gui.py and run the file.
-
Import the Libraries
import Tkinter as tk from tkinter import * import cv2 from PIL import Image from PIL import ImageTk import os import numpy as np import cv2 from Keras.models import Sequential from Keras.layers import Dense from keras.layers import Dropout from keras.layers import Flatten from keras.layers import Conv2D from keras.optimizers import Adam from keras.layers import MaxPooling2D from keras.preprocessing.image import ImageDataGenerator
-
Model Creation- It involves the addition of different Keras layers to create a deep learning model shown in the below image.
emotion_model = Sequential()#to extract the features in model emotion_model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(48,48,1))) emotion_model.add(Conv2D(64, kernel_size=(3, 3), activation='relu')) emotion_model.add(MaxPooling2D(pool_size=(2, 2))) emotion_model.add(Dropout(0.25)) emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu')) emotion_model.add(MaxPooling2D(pool_size=(2, 2))) emotion_model.add(Conv2D(128, kernel_size=(3, 3), activation='relu')) emotion_model.add(MaxPooling2D(pool_size=(2, 2))) emotion_model.add(Dropout(0.25)) emotion_model.add(Flatten()) emotion_model.add(Dense(1024, activation='relu')) emotion_model.add(Dropout(0.5)) emotion_model.add(Dense(7, activation='softmax')) emotion_model.load_weights('model.h5') cv2.ocl.setUseOpenCL(False)
-
Mapping of facial emotion with Avtar
#emotion dictionary contains the emotions present in the dataset
em_dict = {0: " Angry ", 1: "Disgusted", 2: " Fearful ", 3: " Happy ", 4: " Neutral ", 5: " Sad ", 6: "Surprised"} emoji_dist={0:"./emojis/angry.png",2:"./emojis/disgusted.png",2:"./emojis/fearful.png",3:"./emojis/happy.png",4:"./emojis/neutral.png",5:"./emojis/sad.png",6:"./emojis/surpriced.png"} global last_frame1 #emoji dictionary is created with images for every emotion present ion dataset last_frame1 = np.zeros((480, 640, 3), dtype=np.uint8) global cap1 show_text=[0] def show_vid(): #to open the camera and to record video cap1 = cv2.VideoCapture(0) #it starts capturing if not cap1.isOpened(): #if camera is not open print("cant open the camera1") flag1, frame1 = cap1.read() frame1 = cv2.resize(frame1,(600,500))#to resize the image frame bound_box = cv2.CascadeClassifier('/home/shikha/.local/lib/python3.6/site-packages/cv2/data/haarcascade_frontalface_default.xml')#it will detect the face in the video and bound it with a rectangular box gray_frame = cv2.cvtColor(frame1, cv2.COLOR_BGR2GRAY)#to color the frame n_faces = bound_box.detectMultiScale(gray_frame,scaleFactor=1.3, minNeighbors=5) for (x, y, w, h) in n_faces: #for n different faces of a video cv2.rectangle(frame1, (x, y-50), (x+w, y+h+10), (255, 0, 0), 2) roi_frame = gray_frame[y:y + h, x:x + w] crop_img = np.expand_dims(np.expand_dims(cv2.resize(roi_frame, (48, 48)), -1), 0)#crop the image and save only emotion contating face prediction = emotion_model.predict(crop_img)#predict the emotion from the cropped image maxindex = int(np.argmax(prediction)) cv2.putText(frame1, em_dict[maxindex], (x+20, y-60), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2, cv2.LINE_AA) show_text[0]=maxindex#store the emotion found in image from emotion dictionary if flag1 is None:#if webcam is disabled print ("Major error!") elif flag1: global last_frame1 last_frame1 = frame1.copy() pic = cv2.cvtColor(last_frame1, cv2.COLOR_BGR2RGB) #to store the image img = Image.fromarray(pic) imgtk = ImageTk.PhotoImage(image=img) lmain.imgtk = imgtk lmain.configure(image=imgtk) lmain.after(10, show_vid) if cv2.waitKey(1) & 0xFF == ord('q'): exit() def show_vid2(): frame2=cv2.imread(emoji_dist[show_text[0]])#to store the emoji with respect to the emotion pic2=cv2.cvtColor(frame2,cv2.COLOR_BGR2RGB) img2=Image.fromarray(frame2) imgtk2=ImageTk.PhotoImage(image=img2) lmain2.imgtk2=imgtk2 lmain3.configure(text=emotion_dict[show_text[0]],font=('arial',45,'bold'))#to configure image and text lmain2.configure(image=imgtk2) lmain2.after(10, show_vid2) if __name__ == '__main__': root=tk.Tk() img = ImageTk.PhotoImage(Image.open("logo.png")) heading = Label(root,image=img,bg='black') heading.pack() heading2=Label(root,text="Photo to Emoji",pady=20, font=('arial',45,'bold'),bg='black',fg='#CDCDCD')#to label the output heading2.pack() lmain = tk.Label(master=root,padx=50,bd=10) lmain2 = tk.Label(master=root,bd=10) lmain3=tk.Label(master=root,bd=10,fg="#CDCDCD",bg='black') lmain.pack(side=LEFT) lmain.place(x=50,y=250) lmain3.pack() lmain3.place(x=960,y=250) lmain2.pack(side=RIGHT) lmain2.place(x=900,y=350) root.title("Photo To Emoji") root.geometry("1400x900+100+10") root['bg']='black' exitbutton = Button(root, text='Quit',fg="red",command=root.destroy,font=('arial',25,'bold')).pack(side = BOTTOM) show_vid()#function calling to record video show_vid2()#function calling to generate emoji from recorded video root.mainloop()
Output
Summary
This project is based on the Keras library of deep learning technology. In order to recognize facial emotions, we have built a convolution neural network. After that, we fed our model with the FER2013 dataset. And finally, we map each facial emotion with its corresponding emojis or avatars.
To detect the bounding box of images in the webcam we use the OpenCV’s Haarcascade XML. In the end, we serve these boxes to the trained model for the purpose of classification.
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion