School of Science and Technology 科技學院
Computing Programmes 電腦學系

Subtitle glasses for hearing impaired and tourist

Chui Tsz Ching, Lee Chak Yin, Wu Long Fung, Yip On Tik

ProgrammeBachelor of Computing with Honours in Internet Technology

Bachelor of Science with Honours in Computer Science
SupervisorDr. Jeff Au Yeung Siu Kei
AreasIntelligent Applications
Year of Completion2025

Objectives

Project Aim

The aim of the project has remained unchanged: Our target audiences experience significant inconvenience in communication and network limitations. To address these challenges, we aim to develop an application for smart glasses. This application will feature offline capabilities, focusing on speech-to-text conversion in Cantonese and text-to-text functionality, translating spoken Cantonese to multi-languages and processing everything in reall time. 

Project Objectives

The projectives of the project has remained unchanged, to achieve the project aim, we need to attain these objective:​ 

  1. Research any offline Speech-to-text & Text-to-Text model​ in the internet and evaluate their performance(accuracy and response time) 
  2. Setup Speech-to-text & Text-to-Text model accessible server​ in order to lower the development cost.  
  3. Test and Evaluate​ with using the server. 
  4. Investigate technical problems like filtering out the wearer’s speech and output the result with only others' speech, the valid distance that the microphone can receive the speech etc. 
  5. Design User Interface ​and implement a user-friendly feature​ like choosing the language that the wearer wants to output(if applicable), resizing the font etc. 

Videos

Demonstration Video

Presentation Video

Methodologies and Technologies used

Overview of the Solution 

Early Prototype (Cloud-Based) 

  • Leveraged Azure Speech-to-Text + Google Cloud Translation for real-time captions 
  • Proved real-time transcription and multilingual translation feasibility 
  • Faced issues: third-party latency, black-box model behavior, and escalating API costs 

Shift to Local Server Architecture 

  • Migrated STT workloads from cloud APIs to self-hosted, lightweight models 
  • Goals: lower & predictable latency, full control over models, better privacy, and cost savings 

Parallel Development Streams 

  • UI & Display Team: Built smart-glasses interface, customization for tourists vs. hearing-impaired users 
  • STT & Backend Team: Evaluated/deployed on-premise speech models; designed server with: 
    • Asynchronous task queues 
    • WebSocket for real-time audio/text exchange 
    • Secure data handling 

Final System Design 

  • Client-Server Model: 
    • Client (Glasses): Captures audio, streams to server, renders transcriptions/translations instantly 
    • Server: Processes audio with local STT (and optional translation), returns text via WebSocket 
    • Outcome: A self-contained real-time captioning solution offering predictable performance, privacy, and cost efficiency.

Hardware Components (Smart Glasses) 

INMO Air2 glasses 

  • jjBattery: 500mA 
  • Processor and Memory Capacity: RAM: 2GB, ROM: 32GB, Chip: ZiGuang ZhanRui AI Chip, 4Core 1.8GHz 
  • Connectivity Options: Wi-Fi with 2.4GHz/5GHz 
  • Audio: 2 microphones 
  • Displays: Micro-OLED; FOV26; sRGB 100%, Resolution: 640*400 
  • Operating System: IMOS2.0 (Similar to Android) 
  • Controller: A ring and touch pads built-in the glasses. 

Figure 1: System Block Diagram 1  

Figure 2: System Block Diagram 2

Results (Prototype System Design)

Prototype Architecture & Cloud Service Selection 

Cloud-Based Proof-of-Concept 

  • Deployed both STT and T2T in the cloud to rapidly validate real-time captioning on smart glasses 
  • Audio streamed from the glasses' mic → cloud APIs → text returned to OLED display 

Comparative STT Testing 

  • Services: Azure Speech Service vs. Google Speech-to-Text 
  • Test conditions: scripted/unscripted dialog, indoor quiet, outdoor noise 
  • Findings: Azure outperformed Google for mixed Cantonese-English and noisy settings 
  • Glasses' built-in microphone proved reliable across environments 

Comparative Translation Testing 

  • Services: Google Translate API vs. Azure Translation 
  • Test material: informal Cantonese utterances 
  • Findings: Google produced more natural, idiomatic translations 

Final Prototype Stack 

  • Transcription: Azure Speech-to-Text API 
  • Translation: Google Translate API 
  • Client: Android mobile app streaming audio/text to/from the cloud 
  • Display: Instant captions on smart glasses' OLED 

Core Features Implemented 

Real-Time STT 

  • Glasses capture audio and stream it to Azure Speech-to-Text. 
  • Supports mixed Cantonese/English input with 1–2.5 s latency. 

On-the-Fly Translation 

  • Transcribed text is fed into Google Translate. 
  • Delivers immediate translations in the user's chosen language—ideal for tourists. 

Wearable Subtitle Display 
• OLED shows two lines: original transcript + translated text. 
• Minimal UI keeps captions legible without blocking vision. 

Basic Controls 
• Start/Stop toggle for speech capture. 
• Dual-line display for quick performance checks. 

End-to-End Cloud Pipeline 
• Audio sent over HTTPS to cloud back-end. 
• Validated seamless flow: capture → transcribe → translate → display 

Figure 3 Prototype's System Architecture 

Testing Result

 

Motor Driver

The motor driver, written in Python, was unique compared to other programs in the package. It operated as a stand-alone program that could be executed with commands. It interfaced with the ROS core, receiving commands from other packages to power the motors via GPIO pins. The performance of the motor driver during testing and evaluations met our expectations, enabling the robotic car to move smoothly based on the generated commands. With the motor driver, the robotic car gained enhanced mobility, including the ability to move forward with automatic speed adjustment and to turn left or right with adjusted angles. This ensured the car avoided crashes or unnecessary shifting, enhancing its overall performance and reliability.

RPLiDAR

Throughout the testing and demonstration phases of the project, we fully realized the potential of the RPLiDAR scanner, proving it to be a key component of the prototype. It was instrumental in drawing maps using the Cartographer algorithm and navigating the car using both the Cartographer and navigation stacks. The successful deployment of the RPLiDAR scanner and associated programs enabled several key features in the prototype, including cloud-point map drawing, path finding, and obstacle avoidance, significantly enhancing the functionality and effectiveness of the prototype.

Ultra-Wideband (UWB)

The algorithms implemented in the project performed well in their respective roles as outlined in the previous sections. The ultra-wideband (UWB) system effectively received signals and guided the robotic car to its target. However, there was a regrettable limitation in the current setup. The Reinforcement Learning (RL) navigation system, a crucial component for autonomous movement, unfortunately, was unable to complete the final integration step. This was an area that required further attention and development to fully realize the potential of the autonomous robotic car.

Demo Application

Our application primarily consisted of existing ROS features, so our testing and validation process mainly involved comparing the functionality of these features in ROS and our application. The results were largely similar or closely matched. Additional features, such as logging, were also tested and performed as expected. To evaluate the user interface (UI) and user experience (UX), we conducted blind tests with volunteers from various technical backgrounds. The aim was to ensure that our application was accessible and easy to learn, regardless of the user’s prior knowledge of the subject. The feedback we received was generally positive, and we used this feedback to make necessary updates to the UI, thereby enhancing the overall user experience.

Figure 4: Design Evolution Overview

Version 4 (Current Design): 

Our final prototype UI refinement focused on intuitive iconography and improved visual hierarchy: 

  • Text-based buttons were replaced with universally recognizable icons 
  • Clear visual instruction (“Tap to begin listening”) with a prominent play button 
  • Simplified main screen showing only essential information during conversations 
  • Enhanced contrast and optimized text size for readability in various lighting conditions 

The interface consisted of four basic buttons: 

  • Start Recording – Begins capturing and transmitting audio to the local server. 
  • Stop Recording – Ends the current audio session. 
  • Connect to server – A button to connect to the server 
  • Disconnect – A button to disconnect to the server 

Figure 5: Developer UI deployed on INMO Air2 for Server-based testing 

Implementation

Deployment Hardware 

  • Device: INMO Air2 Smart Glasses 
  • OS: IMOS 2.0 (Android-based) 
  • Processor: ZiGuang ZhanRui AI Chip (Quad-core, 1.8GHz) 
  • Memory: 2GB RAM, 32GB ROM 
  • Display: Micro-OLED, 640×400 resolution, sRGB 100%, FOV 26° 
  • Input: 2 built-in microphones 
  • Connectivity: Dual-band Wi-Fi (2.4GHz / 5GHz) 
  • Control: Touch pad or ring controller 

Prototype System Workflow 

  • Connection to Wifi: Connect to a Wifi or hotspots to enable the app to connect with the cloud services. 
  • Audio Input: User speaks directly to the glasses. 
  • Cloud Transcription: Audio is streamed to Azure Speech-to-Text API for real-time transcription. 
  • Translation (Optional): Transcribed text is sent to Google Translate API. 
  • Subtitle Display: Output is displayed on the glasses’ OLED screen with adjustable font size. 

Figure 6: Script 1 (simulate a tour guide in HKMU)

Figure 8: Script 2 (Simulate food ordering in restaurants) 

Conclusion

The project, initiated in August 2022, aimed to provide a solution for investigating unknown areas using map drawing, route planning, and AI-enhanced positioning. The objectives were met, with achievements including ROS package and program development, RPLiDAR and IPS implementation, and web application development.

ROS Implementation (Package and Program Development)

The robotic car used a Raspberry Pi for operations such as motor control and sensor data integration. Despite Raspbian OS meeting basic needs, ROS was chosen for its standardization in deploying and executing software operations in robotics. Despite the learning curve, ROS became the project’s fundamental platform.

RPLiDAR Implementation and Development

The RPLiDAR scanner, a key project component, scanned the environment and generated data for a cloud-point map, enabling users to review visualized mapping data. This data aided in obstacle avoidance, route planning, and map review in the web application.

IPS Implementation

The prototype was capable of locating itself in an indoor area using the Indoor Positioning System (IPS).

Demo Application Development

An API set was developed for easy ROS management/deployment. A responsive web app allowed users to control the car and scaled automatically to the user’s screen size. Original ROS features were migrated to the app, with additional features added to enhance the user experience.

ROS Launch Script

Some features, not originally part of the plan, were added due to their unavailability in the demo.

Future Development

The current design of our solution had a few limitations. Firstly, the car did not move smoothly when it needed to make a turn at an angle larger than 30 degrees. This impacted the overall maneuverability of the car. Secondly, there was not enough memory to effectively run the IPS analysis program, which was crucial for the car’s positioning system. Lastly, there were technical gaps in generating a live map in the web application for regular users, which affected user experience.

To address these issues, we suggested several improvements for future work. We could develop the motor control program with more precise PWM settings and formulas to improve the car’s movement. We could also develop a more lightweight algorithm for the IPS analysis program to reduce memory usage. Additionally, we could create a better script for resource management to optimize the system’s performance. Finally, we could develop a package with related programs and APIs that could transmit the cloud-point map data to the web application, enhancing the live map feature.



Jonathan Chiu
Marketing Director
3DP Technology Limited

Jonathan handles all external affairs include business development, patents write up and public relations. He is frequently interviewed by media and is considered a pioneer in 3D printing products.

Krutz Cheuk
Biomedical Engineer
Hong Kong Sanatorium & Hospital

After graduating from OUHK, Krutz obtained an M.Sc. in Engineering Management from CityU. He is now completing his second master degree, M.Sc. in Biomedical Engineering, at CUHK. Krutz has a wide range of working experience. He has been with Siemens, VTech, and PCCW.

Hugo Leung
Software and Hardware Engineer
Innovation Team Company Limited

Hugo Leung Wai-yin, who graduated from his four-year programme in 2015, won the Best Paper Award for his ‘intelligent pill-dispenser’ design at the Institute of Electrical and Electronics Engineering’s International Conference on Consumer Electronics – China 2015.

The pill-dispenser alerts patients via sound and LED flashes to pre-set dosage and time intervals. Unlike units currently on the market, Hugo’s design connects to any mobile phone globally. In explaining how it works, he said: ‘There are three layers in the portable pillbox. The lowest level is a controller with various devices which can be connected to mobile phones in remote locations. Patients are alerted by a sound alarm and flashes. Should they fail to follow their prescribed regime, data can be sent via SMS to relatives and friends for follow up.’ The pill-dispenser has four medicine slots, plus a back-up with a LED alert, topped by a 500ml water bottle. It took Hugo three months of research and coding to complete his design, but he feels it was worth all his time and effort.

Hugo’s public examination results were disappointing and he was at a loss about his future before enrolling at the OUHK, which he now realizes was a major turning point in his life. He is grateful for the OUHK’s learning environment, its industry links and the positive guidance and encouragement from his teachers. The University is now exploring the commercial potential of his design with a pharmaceutical company. He hopes that this will benefit the elderly and chronically ill, as well as the society at large.

Soon after completing his studies, Hugo joined an automation technology company as an assistant engineer. He is responsible for the design and development of automation devices. The target is to minimize human labor and increase the quality of products. He is developing products which are used in various sections, including healthcare, manufacturing and consumer electronics.

Course Code Title Credits
  COMP S321F Advanced Database and Data Warehousing 5
  COMP S333F Advanced Programming and AI Algorithms 5
  COMP S351F Software Project Management 5
  COMP S362F Concurrent and Network Programming 5
  COMP S363F Distributed Systems and Parallel Computing 5
  COMP S382F Data Mining and Analytics 5
  COMP S390F Creative Programming for Games 5
  COMP S492F Machine Learning 5
  ELEC S305F Computer Networking 5
  ELEC S348F IOT Security 5
  ELEC S371F Digital Forensics 5
  ELEC S431F Blockchain Technologies 5
  ELEC S425F Computer and Network Security 5
 Course CodeTitleCredits
 ELEC S201FBasic Electronics5
 IT S290FHuman Computer Interaction & User Experience Design5
 STAT S251FStatistical Data Analysis5
 Course CodeTitleCredits
 COMPS333FAdvanced Programming and AI Algorithms5
 COMPS362FConcurrent and Network Programming5
 COMPS363FDistributed Systems and Parallel Computing5
 COMPS380FWeb Applications: Design and Development5
 COMPS381FServer-side Technologies and Cloud Computing5
 COMPS382FData Mining and Analytics5
 COMPS390FCreative Programming for Games5
 COMPS413FApplication Design and Development for Mobile Devices5
 COMPS492FMachine Learning5
 ELECS305FComputer Networking5
 ELECS363FAdvanced Computer Design5
 ELECS425FComputer and Network Security5