School of Science and Technology 科技學院
Electronic and Computer Engineering 電子工程學系

Applying the smart city concept to home safety

Student Lai Hing Lun
Programme Bachelor of Science with Honours in Cyber and Computer Security
Supervisor Dr. Farah Yan
Year 2021/22


With the popularization of artificial intelligence in our daily lives, smart monitoring equipment become the commonly used safety equipment in the family. Because most parents are busy with work, they often neglect or forget to take care of children or baby and finally cause danger. Smart monitoring uses artificial intelligence and image processing technology to identify children’s situations and automatically detects them 24 hours. When accidents occur, an alert will be issued immediately and the user will be notified. This enables parents can pay more attention to work and can take care of their children at the same time. Thereby, it can prevent the children from being in danger.

For most of the home smart monitor in market only require object tracking and based detection system. It is too few functions for the disabled or children who need for secure their life. However, those smart monitor also have the light source effect and low accuracy of recognition problem. In this project, my smart monitor can solve the problem upon.

Demonstration Video

Methodologies and Technologies used

For the hardware, we need to use a night vision camera to ensure that the recognition function can work normally at night. Secondly, we need a microcontroller or computer to connect between the camera and display. So that the night vision camera can transmit the obtained image data to the computer and display the analyzed data on the monitor. For the software, we need to use opencv to write the code which for face detection, face recognition…

For the face recognition, first I use Haar algorithm to find the human face from the image. Second, I use the local binary patterns histograms to do the calculation to ensure that the smart monitor can clearly identify the faces of target. Before we start do the face recognition, we have to create a dataset which is already be trained by the LBPH.

For detecting the moving people, we will use object monitor detection to detect and scan the face of the moving target. We also use the eye blink detection to check whether people are awake or asleep.

For action recognition, we will use the MediaPipe to implement the action recognition. MediaPipe is an open source that is aimed at computer vision and it is released by Google. It will help us to collect body landmarks and we can use those of data to do the action recognition after training the data by LSTM model.

For implement object recognition, it must use a huge database to compare the data and recognin the target. So I use the open source of object dnn model from the internet.

For implement speech recognition, we need to use the pyttsx3 which is python text to voice tool and the google recognize audio to do voice to text.

HAAR Algorithm

Figure 1. Haar characteristics 4 categories

Figure 2. Human Face Haar characteristics

Haar characteristics is an algorithm that used in face detection for finding facial features, it reflects the gray level changes in some areas of the image. It will calculate the grayscale difference between different pixels in an area to determine whether it is a human face. No matter whether the light is bright or not, the pixel difference between different areas is always regular. It separates into 4 categories edge features, linear features, central features and diagonal features shown in figure 1. Human face have fixed Haar characteristics which is shown in figure 2, such as the eyes are darker than the cheeks. So I can use it to find out the human face and eye with the detections.

Face recognition

Before we start the face recognition, we must finish the face detection first. Face detection is achieved by continuously looping to obtain camera pictures for background processing. In computer graphics, images have only two colors which are black and white. Between black and white, there have many levels of color depth to represent different values. So that we need to create a grayscale image to cover the processed image before using the face detector to detect the processed image. The face detector will return the data and make a real time box to display the place of the face in the origin picture. After that we must create a face database to save those data for doing face recognition. In OpenCV there have 3 non-deep learning methods for face recognition which are LBPH, EigenFace and Fisher. When people use face recognition, it will extract feature from human faces and use these methods to determine the feature results and reliability.

Local Binary Patterns Histograms (LBPH)

It is a face recognition algorithm that uses the histogram of local binary patterns. It uses more integer calculations and is more efficient than other calculation methods. The original LBP algorithm is defined in the 3 by 3 pixels, the center pixel will be the threshold. If the surrounding pixels are greater than the threshold, then the position of the pixel will be marked as 1, otherwise will be marked at 0 (openCV,n.d.). The LBP formulas have been shown in figure 3. Which mean LBP will have 256 different values. The extended LBP algorithm is using circle with radius to encode any neighboring pixels. LBP features are robust to lighting changes, so the distance of the light source will not case huge different value.

Object motion detection

When the object is moving, there will be different pixels values in gray level between adjacent frames. Object motion detection depends on the different pixels values between each frame of the video and sets the values to be the threshold. If there are only have static objects, then the different pixels values between each frame will be lower than the threshold and it will return zero. This means when there have new different pixels values that are larger than the threshold, then it will return non-zero and know the object is moving. Through the frame difference, we can detect any moving object. After the detection, the threshold of the values will be also reflash.

Eye Blink Detection

Figure 3. eye blink detection formula

Eye blink detection uses the frequency of blinking to check whether people are awake or asleep. When using the face detection test, it will use the bounding box to show the position of the eyes. Through mark the six positions on the eyes, eye blink detection can use a formula to calculate the distance between the two eyelids which is called EAR values (Soukupova,2016). With the formulas of eye blink detection have been shown in figure 3, we know when the eyes are closed, the EAR values will decrease. Otherwise, the EAR values will increase when the eye is open. By using this calculation method, computer can know how to define people are awake or asleep.

LSTM model

LSTM is a machine learning that help machine to predict the next action by using huge of data. For example, the machine will save your dinner for record every day. By using much of those data, the machine can predict your tomorrow dinner.

Action detection

Action detection is used for detect the current movement and collecting the key point. We have installed the mediapipe for detect the face mesh, hand and pose landmarks. It will help us to extract the key point values which represent the action. Each action will have independent key point values, so we can save those values for our action data to identify action.

Action recognition

For first time to do the action recognition, I need to create an own personal dataset which for collect the key point values for later training. The action data will be collecting from the action detection and separate into different folder with different action. Before we use the dataset, we must do the training by using Long Short-Term Memory (LSTM) deep learning model. After build the LSTM model, we can use the mediapipe to detect our current action. Our machine will keep collecting our action and will predict our body movement and correspond the closest action from the dataset. Finally, the machine will be detecting our action and return the action value to us.

Deep neural network model

It is a complex neural network that has at least two layers. The model uses machine learning to predict the target object. For using in our object recognition, we need to collect a hug of dataset which is design for learning the features between different objects. Each object’s features need to train the data many times. For example, there have one hundred images about car, deep learning will learn their features with the similarity percentage for the prediction. Even the percentage is lower, it will keep all wrong prediction data for creating a better accurate model.

Object recognition

Object recognition is computer vision technology that uses the deep learning and machine learning to help machines identify the object like human. For implement object recognition, I used deep neural network models which download from the internet and is already trained to be object database. The smart monitor will keep reflash the video frame and compare the data inside the image with dnn models. If there have the match data value with the object, then it will return the object name and the similarity rate with the box. When there have the multiple object appear, it can still locate and recognize the objects.

speech recognition

For speech recognition, we need to import the pyttsx3 and google recognize audio and combine those two tools together. Here we can use the google recognize audio to retransfer human voice to a text value. Then we can write a script that if the text value is similarity with some words, it will return some things. The pyttsx3 can help us return the text value to a voice which mean it can speech out the returning data.


Experiment Architecture

Figure 4. Smart monitor architecture

Figure 5. Smart monitor work table

For experiment the smart monitor, I designed a diagram table and divided it into three parts to explain the experiment which shown in figure 4. For functions part, the smart monitor totally has four base functions and five advanced functions which have been used the base functions. For dataset part, I build gray image dataset which will be train by LBPH for the face recognition. Another one is the action dataset which will be train by LSTM model for the action recognition. For object recognition, I have been using the dnn model source from public. The last part dissects the different import models that are used in smart monitor.

According to figure 5, smart monitor will keep running the object recognition, action recognition and eye blink detection. The object and action recognition will keep use mediapipe and tensorflow model to analyze the video capture information. When the action recognition has detected the action which is the help action that save in action dataset, it will return to alarm system. When the object recognition detects some danger things such as knife, it will return to alarm system. The face detection will only be used when the object recognition has detected person. If smart monitor has found the face in the image, it will run the face recognition with the gray image dataset and find who is the person. If the face recognition result is return unknow, then it will return to alarm system. The speech recognition will import into those functions.

Build Dataset

In this project, I have been built up two datasets for face recognition and action recognition.

For create the image dataset for face recognition, there have divided into several steps:

  1. We must import the OpenCV module first for us to write the following script.
  2. For getting the image, we need to set up the camera to our OpenCV video capture target. Connecting them together and test the object can be created.
  3. Because there has different image capture out with different situation, so we need to build several folders for divided these images.
  4. Before saving the capture image, we need to create a grayscale image to cover the capture image by using OpenCV. Because the computer graphics only know the black and white.
  5. Write a script to combine the above function and test.

For create the action dataset for action recognition, there have divided into several steps:

  1. We need to import the MediaPipe for getting the face mesh, hand and pose landmarks.
  2. We will create a folder to save each frames of those landmarks data with different files
  3. For the accuracy, we need to create more data for same action. For example, each action needs thirty video and each video need thirty frames.
  4. Use the Long Short-Term Memory (LSTM) deep learning model to train the data.
  5. Write the script to detect the action by compare the dataset.
Compared method for face recognition

In this part, I will compare different human face recognition methods and why I finally choose this method for my smart monitor.

Face recognizer have three common methods, EigenFaces, FisherFaces and LBPH.

Figure 6. EigenFaces characteristics

Figure 7. FisherFaces characteristics

  • EigenFaces, It distinguishes different faces by capturing the area with the largest change between faces, for example, there are some obvious changes from the eyes to the nose shown in figure 6. Compared with the other two schemes, it uses less resources but has the worst recognition rate and also affected by light.
  • FisherFaces, It is about looking at everyone’s training face at the same time and finding the main characteristics that distinguish them from others shown in figure 7. This method can effectively distinguish the characteristics of each person. Its recognition rate also is the highest among them, but the disadvantage is that it will be easily affected by light and darkness, thus greatly reducing its recognition rate.
  • LBPH, LBPH features are robust to lighting changes, so the distance of the light source will not case huge different value. It solves the light problem of the other two methods.

In our real life, we cannot guarantee perfect lighting conditions so that LBPH will be the best methods for my smart monitor.

Results and Discussion

Face Recognition

Figure 8. Face recognition demonstration 1

Figure 9. Face recognition demonstration 2

The face recognition will detect the image that have human face, then it will use LBPH to train the database which I have been created. It will pair the most same human face in the database. Finally, if the similarity rate is higher than fifty percent then it will show the person that detected which is shown in figure 8. On the contrary, if the similarity rate is lower than fifty percent then it will show unknow which is shown in figure 9. Because LBPH robust to lighting changes, so it solves the light source effect that the one of the problems in smart monitor in market. It also will have higher accuracy of recognition.

However, when I use the newest OpenCV version there have some command different or removed and there have some bug happen in the face recognition function. However, in the newest version have some copyright if you want to sell the script. For the face recognition dataset, the image must be only one person face for more accuracy. Fortunately, I have created my own dataset successfully and the accuracy will be much higher if my dataset is huge enough. Because of the resource problem, I cannot build a huge dataset which mean the accuracy is not much higher.


Figure 10. Face and eye detection demonstration 1

Figure 11. Face and eye detection demonstration 2

In the face detection, eye blink detection, I use the OpenCV haar cascades to do the detections. It will return the Haar characteristics and then it shows up the target which shown in figure 10. The face and eye detection will be show out with different color box. The face will use the red box and the eye will be use green box. It can analyze the human face in the masses successfully, you can see only the unhuman have not red box on the face in figure 11.

Because the recognition is based on detection, when the detection is lower accuracy, it will also impact the recognition directly. For example. If the detection only detects half of the human face, then the recognition cannot compare any human face because there have no humans who only have half of the face. For increase the accuracy of detect human face correctly, I have been analyzed many methods. Finally, I have been selecting the middle efficacy with need lower resource for my project.

It is easily to create with the base logic and they are successful to combine into the recognition function. However, the picture pixels will seriously affect its judgment. If there has multiple human face close to each other in lower pixels image which shown in figure 11, it will make the eye detection fail. So we have to make sure the quality of the camera has higher pixels to prevent this problem happen.


In this report, I use the machine learning to increase the function of the smart monitor. The smart monitor has combined different machine learning functions for diversified responses to various crises. Those functions can be assimilated into our life and are significant benefits for home safety. Through further exploration and studying machine learning, I discovered different types of logic which can be used in my smart monitor. I analyze the limitations and require resources of different options to decide the solution I use, and I find more feasible functions that can be applied to smart monitor.

Those of function is a bit complicated in practice. Therefore, in order to solve the problem, some functions may need to be simplified. In addition, due to the lack of expanded databases and resources, the accuracy of the face, action and object recognition will be slightly biased. I have to find out the error problem between different scripts and fix it. Finally, the smart monitor can be combining those of function and run successfully.

Finally, this smart monitor is totally different from the other smart monitor on the market. It requires a based detection system and four recognition systems which have more functions than the market. By using machine learning and different algorithm, they solve the common problem of the smart monitors and the using problem of disabled men. Thereby, it can be a new unique home safety device for disabled men.



Jonathan Chiu
Marketing Director
3DP Technology Limited

Jonathan handles all external affairs include business development, patents write up and public relations. He is frequently interviewed by media and is considered a pioneer in 3D printing products.

Krutz Cheuk
Biomedical Engineer
Hong Kong Sanatorium & Hospital

After graduating from OUHK, Krutz obtained an M.Sc. in Engineering Management from CityU. He is now completing his second master degree, M.Sc. in Biomedical Engineering, at CUHK. Krutz has a wide range of working experience. He has been with Siemens, VTech, and PCCW.

Hugo Leung
Software and Hardware Engineer
Innovation Team Company Limited

Hugo Leung Wai-yin, who graduated from his four-year programme in 2015, won the Best Paper Award for his ‘intelligent pill-dispenser’ design at the Institute of Electrical and Electronics Engineering’s International Conference on Consumer Electronics – China 2015.

The pill-dispenser alerts patients via sound and LED flashes to pre-set dosage and time intervals. Unlike units currently on the market, Hugo’s design connects to any mobile phone globally. In explaining how it works, he said: ‘There are three layers in the portable pillbox. The lowest level is a controller with various devices which can be connected to mobile phones in remote locations. Patients are alerted by a sound alarm and flashes. Should they fail to follow their prescribed regime, data can be sent via SMS to relatives and friends for follow up.’ The pill-dispenser has four medicine slots, plus a back-up with a LED alert, topped by a 500ml water bottle. It took Hugo three months of research and coding to complete his design, but he feels it was worth all his time and effort.

Hugo’s public examination results were disappointing and he was at a loss about his future before enrolling at the OUHK, which he now realizes was a major turning point in his life. He is grateful for the OUHK’s learning environment, its industry links and the positive guidance and encouragement from his teachers. The University is now exploring the commercial potential of his design with a pharmaceutical company. He hopes that this will benefit the elderly and chronically ill, as well as the society at large.

Soon after completing his studies, Hugo joined an automation technology company as an assistant engineer. He is responsible for the design and development of automation devices. The target is to minimize human labor and increase the quality of products. He is developing products which are used in various sections, including healthcare, manufacturing and consumer electronics.

Course Code Title Credits
  COMP S321F Advanced Database and Data Warehousing 5
  COMP S333F Advanced Programming and AI Algorithms 5
  COMP S351F Software Project Management 5
  COMP S362F Concurrent and Network Programming 5
  COMP S363F Distributed Systems and Parallel Computing 5
  COMP S382F Data Mining and Analytics 5
  COMP S390F Creative Programming for Games 5
  COMP S492F Machine Learning 5
  ELEC S305F Computer Networking 5
  ELEC S348F IOT Security 5
  ELEC S371F Digital Forensics 5
  ELEC S431F Blockchain Technologies 5
  ELEC S425F Computer and Network Security 5
 Course CodeTitleCredits
 ELEC S201FBasic Electronics5
 IT S290FHuman Computer Interaction & User Experience Design5
 STAT S251FStatistical Data Analysis5
 Course CodeTitleCredits
 COMPS333FAdvanced Programming and AI Algorithms5
 COMPS362FConcurrent and Network Programming5
 COMPS363FDistributed Systems and Parallel Computing5
 COMPS380FWeb Applications: Design and Development5
 COMPS381FServer-side Technologies and Cloud Computing5
 COMPS382FData Mining and Analytics5
 COMPS390FCreative Programming for Games5
 COMPS413FApplication Design and Development for Mobile Devices5
 COMPS492FMachine Learning5
 ELECS305FComputer Networking5
 ELECS363FAdvanced Computer Design5
 ELECS425FComputer and Network Security5