< Deeplearning > Use CNN and RNN to detect blink in a video
Let’s do it!
If you use your face to pay a sum of money by ALIPAY, you may find that it sometimes requires you to do some facial movement to check whether you are a real person or not.
So as you see, facial motion detection is actually being used in many circumstances. You may want to figure out how to detect facial motion.
In this post, I will take motion blink as a example to demonstrate how to do realize it.
So Let’s do it.
How to ‘make’ data if you do not have
Of course we can use thousands of blinking video sequences to train a deep learning classifier. But it may be difficult and inefficient to collect enough blink videos. So Can we simulate blinking data by using single eye image.
As we know, a blink sequence contains three eye movements: opening eyes, closing eyes and opening eyes again.
We can modeling blink motion as a sequence of eyes opening and eyes closing. Even you have just 100 eye opening images and 100 eye closing images, you can also create millions of eye blinking sequences. Quite surprising!
Build your blinking detection system ?
It seems is over. The last step is to use a CNN model(may be a 3D CNN model) to train a blink detection model by using the above fake training sequences. If you have read my former post before, you may understand how GRU works in Tensorflow.
So we will start from here to demonstrate how to detect blink based by using GRU. Of course you can use Conv-Gru to simply use raw eye image sequences to get a end-to-end blink detection system.
However if you want to make this task simpler, you should firstly train a CNN binary classifier to classify if eye close or not. And next you should use the classifier as a feature extractor to extract feature of eye and then form a sequence as the input of your RNN network.
To make the system more accurate.
In order to make your blink system more accurate, to give the input features to RNN more diversity is very important. So you have to firstly make sure you have trained a good eye feature extractor.
You can use facial landmarks(from any open source code) to get the landmarks of the face image and extract the eye part of the face image. One thing to be mentioned, you could use landmark points itself as parts features(combined with the feature extracted from you CNN classifier) to make the feature more robust.
Next, you have to figure out how to simulate a good feature sequence to train your RNN model. You can create many hard cases and should not neglect any blinking circumstances. One thing to be remembered, the more diverse your training sequences are, the more accurate your blink system is.
Is not over?
So now you have a simple eye blinking detection system. But there are many other questions should be solved.
There first question is how many frames could be used to detection blink. The answer is quite simple: as many as you like. So in order to do this, you can train a RNN model based on flexible length sequence(You have to define the max sequence length).
Another question is how to design your actual system logic when you apply your model on real time video stream. And you can find it by yourself.
< Deeplearning > Use CNN and RNN to detect blink in a video