School of Science and Technology 科技學院
Computing Programmes 電腦學系

Visually Impaired Assistant With Neural Network And Image Recognition

Arthur On Chun LIU, Ki Shun LI, LI Qi YAN

Programme Bachelor of Computing with Honours in Internet Technology
Supervisor Prof. Vanessa Ng
Areas Intelligent Applications
Year of Completion 2020
Award Champion, IEEE (HK) Computational Intelligence Chapter 17th FYP Competition

Objectives

The aim of this project is to implement a visually impaired assistant with neural network and image recognition to help visually impaired go outside efficiently and safely. The visually impaired assistant will be similar to a person to accompany the user, using the image recognition technology to describe the nearby environment to the visually impaired in real- time, letting them know what is going on in front of them. The obstacle detector will let them know the obstacles ahead so that they can be prepared early.

To allow visually impaired people to go to their destinations, navigation is also a must, and there are several apps on the Internet providing the navigation function, but the system will make the navigation system more useful to assist the visually impaired to go to the destination.

The main aim of the project is to implement an application to help visually impaired to go outside. The project has also defined several sub-objectives as follows:

  • To design and develop a scene description system for describing the scene, letting the users know what is happening in front of them.
  • To design and develop an obstacle prompt system for describing the obstacles ahead, users can be prepared early.
  • To design and develop a voice navigation system for navigating the user to the destination and the user does not need to watch the phone screen.
  • To design and develop a mobile application (Android only) for handling the voice output of all three systems above, to initiate system and language recognition.

Video Demonstration

Techniques and technologies used

Image Caption

To provide a scene description, we have to use image caption technology and we have chosen “The show, Attend and Tell” for our project as an image caption technology since our target user is Hong Kong people.

Figure 1: Principal of Show, Attend and Tell, retrieve from Xu, et al, 2015

Figure 2: Prediction Caption

Speech Recognition

For speech recognition, Alex Graves (2014) proposed a model of recurrent neural networks and CTC layers, as well as RNN-CTC pronunciation and acoustic models for speech recognition (A. Graves and N. Jaitly., 2014).

Object Detection

To provide obstacle prompt, we have to use object detection technology. MobileNetV2 is a lightweight single shot multi-box detector, it is suitable to run at most of the mobile devices and embedded devices. It will be used as a backend of our obstacle prompt system since real-time performance is more important for obstacle prompt systems when the user may not know the obstacle.

Figure 3: Example output of object detection, retrieve from Tensorflow

Depth Estimation

To provide more accurately obstacle prompt, we have to use depth estimation technology to sort out the most dangerous obstacle to the users. High Quality Monocular Depth Estimation via Transfer Learning (DenseDepth) is an encoder-decoder architecture model, which can present a high resolution depth map by inputting a single RGB image.

Figure 4: Original Color Image

Figure 5: Depth map

System architecture

The system has consisted of three components: Android Client, Raspberry Pi, and Machine Learning Server. The responsibilities of different components are shown as below.

Figure 6: Major Components

Figure 7: Components of the system

Android Client

  • As a voice interface to the user
  • Returns navigation information upon user request
  • Using MQTT clients to connect the Raspberry Pi and prompt the user when dangerous information or scene description is received
  • Open a WIFI hotspot for Raspberry Pi connection

Raspberry Pi

  • Connected to Raspberry Pi' WIFI hotspot
  • Will host an MQTT server for data transferring between Android Client
  • As a hands-free camera device which can easily be installed on the walking stick or worn on the neck since the user may have to pick up the walking stick.
  • Will host a video streaming server which will do motion detection and serve images to clients by MJPEG.
  • When the motion is detected, the video streaming server on the Raspberry Pi will broadcast an MQTT message. Once the object detection client and image caption client was notified they will send HTTP requests to Machine Learning Server respectively and forward the response data to the MQTT server.

Machine Learning Server

  • Serve the request through Flask, a web application framework based on Python
  • Since the inference time of deep neural networks(DNN) is relatively large, the request of object detection and image caption will be processed separately.
  • Object name and relative direction will be returned from the object detection server when the confidence of the object in the image is high and the object may be dangerous to users.
  • Sentences of description will be returned from the image caption server.

System Design and Implementation

Figure 8. Open “Visually Impaired Assistant”

Figure 9. Voice Recognition Function

Figure 10. Voice Recognition Function

Figure 11: Voice Navigation screen

Figure 12: Navigation screen

Figure 13: Sense Description screen

The voice recognition interface will be displayed when the user clicks the voice recognition button on the main screen or clicks the voice navigation on the bottom navigation bar. Users can say where they want to go, and the application will automatically open the navigation screen.

The navigation screen shows the route to the destination and the route information will be read out by Text-to-speech service.

The scene description interface shows the image of the external camera and the scene description of the picture, which will be read aloud by Text-to-speech service according to the priority.

Figure 14: Setting screen

Figure 15: A user hold the devices with a walking stick

Figure 16: A user hang around the devices to the neck

The setting screen can set whether to turn on 3D direction prompts and related prompt intervals.

The user can hang the raspberry pi and the external camera (devices) around the neck or hold it with a walking stick. After that, the user needs to put on the earpiece and microphone, and then turn on the app using their mobile phone.

The user can listen to the nearby environment through the earpiece, if there are obstacles in front such as lampposts, stairs, signs, etc., the application will use voice to prompt the user.

Evaluation

In our observations, testers used our equipment and application to collide with objects and people less often than when not in use. The number of object collisions has been reduced by about one-third, and the number of human collisions has dropped significantly by more than one-half. The data indicate that the use of our devices and application can reduce the chance of partial accidents and thus improve user safety. Besides, users can reach the destination more efficiently when using our solution with 3D direction prompts, as can be seen from the time to find the correct direction and the time travel to the destination.

Table 1: User Evaluation Average Result

Conclusion and Future Development

This project successfully implemented a navigation system that uses the MapBox API and converts its voice into Cantonese to allow the system to work with Hong Kong users. On the basis of the navigation system, a 3D direction prompt system is also implemented, allowing users to distinguish whether they are facing the correct direction through the left-right channels and the large-small beep sound. In addition, the voice recognition system can filter the user’s voice using DialogFlow and output the destination for the Mapbox API to search for the appropriate route.

In the environment description system, real-time environment description is realized, the user can know the affairs in front of the camera through the voice by the text-to-speech service. Environmental description can let the visually impaired know the nearby information that not only be told by other people.

The obstacle prompts system has also been successfully implemented. That system uses image recognition technology to recognize the objects in the image and prompts the user through voice when an obstacle is found.

Future work

It can be improved in hardware and software.

Hardware

  • It can combine raspberry pi and its batteries to reduce volume
  • To add a microphone and buttons to provide basic services without a phone
  • It can reduce dependence on the network connectivity by using Jetson Nano or other single-board computers with neural network acceleration chips
  • By adding an infrared lens or an ultrasonic sensor to accurately sense the distance of objects in front.

Software

  • It can improve the accuracy and quantity of obstacle recognition by changing the model of the image recognition
  • It can improve the language and the accuracy of scene description by changing the model of the image recognition and training by own dataset
  • Combining image caption model and object detection model by sharing the same encoder/backbone.
Jonathan Chiu
Marketing Director
3DP Technology Limited

Jonathan handles all external affairs include business development, patents write up and public relations. He is frequently interviewed by media and is considered a pioneer in 3D printing products.

Krutz Cheuk
Biomedical Engineer
Hong Kong Sanatorium & Hospital

After graduating from OUHK, Krutz obtained an M.Sc. in Engineering Management from CityU. He is now completing his second master degree, M.Sc. in Biomedical Engineering, at CUHK. Krutz has a wide range of working experience. He has been with Siemens, VTech, and PCCW.

Hugo Leung
Software and Hardware Engineer
Innovation Team Company Limited

Hugo Leung Wai-yin, who graduated from his four-year programme in 2015, won the Best Paper Award for his ‘intelligent pill-dispenser’ design at the Institute of Electrical and Electronics Engineering’s International Conference on Consumer Electronics – China 2015.

The pill-dispenser alerts patients via sound and LED flashes to pre-set dosage and time intervals. Unlike units currently on the market, Hugo’s design connects to any mobile phone globally. In explaining how it works, he said: ‘There are three layers in the portable pillbox. The lowest level is a controller with various devices which can be connected to mobile phones in remote locations. Patients are alerted by a sound alarm and flashes. Should they fail to follow their prescribed regime, data can be sent via SMS to relatives and friends for follow up.’ The pill-dispenser has four medicine slots, plus a back-up with a LED alert, topped by a 500ml water bottle. It took Hugo three months of research and coding to complete his design, but he feels it was worth all his time and effort.

Hugo’s public examination results were disappointing and he was at a loss about his future before enrolling at the OUHK, which he now realizes was a major turning point in his life. He is grateful for the OUHK’s learning environment, its industry links and the positive guidance and encouragement from his teachers. The University is now exploring the commercial potential of his design with a pharmaceutical company. He hopes that this will benefit the elderly and chronically ill, as well as the society at large.

Soon after completing his studies, Hugo joined an automation technology company as an assistant engineer. He is responsible for the design and development of automation devices. The target is to minimize human labor and increase the quality of products. He is developing products which are used in various sections, including healthcare, manufacturing and consumer electronics.

Course Code Title Credits
  COMP S321F Advanced Database and Data Warehousing 5
  COMP S333F Advanced Programming and AI Algorithms 5
  COMP S351F Software Project Management 5
  COMP S362F Concurrent and Network Programming 5
  COMP S363F Distributed Systems and Parallel Computing 5
  COMP S382F Data Mining and Analytics 5
  COMP S390F Creative Programming for Games 5
  COMP S492F Machine Learning 5
  ELEC S305F Computer Networking 5
  ELEC S348F IOT Security 5
  ELEC S371F Digital Forensics 5
  ELEC S431F Blockchain Technologies 5
  ELEC S425F Computer and Network Security 5
 Course CodeTitleCredits
 ELEC S201FBasic Electronics5
 IT S290FHuman Computer Interaction & User Experience Design5
 STAT S251FStatistical Data Analysis5
 Course CodeTitleCredits
 COMPS333FAdvanced Programming and AI Algorithms5
 COMPS362FConcurrent and Network Programming5
 COMPS363FDistributed Systems and Parallel Computing5
 COMPS380FWeb Applications: Design and Development5
 COMPS381FServer-side Technologies and Cloud Computing5
 COMPS382FData Mining and Analytics5
 COMPS390FCreative Programming for Games5
 COMPS413FApplication Design and Development for Mobile Devices5
 COMPS492FMachine Learning5
 ELECS305FComputer Networking5
 ELECS363FAdvanced Computer Design5
 ELECS425FComputer and Network Security5