School of Science and Technology 科技學院
Computing Programmes 電腦學系

Scanaract: CV-Powered Cataract Detector

SHAH Pooja Zenit, FU Yuhao, KE Yankai, LI Xilin

ProgrammeBachelor of Science with Honours in Computer Science
SupervisorDr. Dani Samer Assi
AreasIntelligent Applications
Year of Completion2026

Objectives

Project Aim

The aim of this project remains unchanged: Scanaract seeks to address the core pain points of
traditional screening methods, which rely heavily on specialised equipment and physicians, as well
as the uneven distribution of global medical resources. By creating an integrated system with a
self-trained Computer Vision algorithm — Vision Transformer — Scanaract aims to integrate an
inexpensive app with cloud-based artificial intelligence analysis and custom snow goggle-inspired
hardware. This AI-powered technology utilises deep learning techniques to detect the presence of
cataract, enabling rapid and timely intervention with improved patient outcomes.

Project Objectives

The objectives of this project include:

  1. Design a lightweight and comfortable eyewear, with outer case and electronic components
  2. Implement the hardware prototype with custom 3D printed case
  3. Set up Local Wi-Fi from Raspberry Pi 4B for data transfer via Flask Server and CURL commands for the app
  4. Design and develop a simple and user-friendly application compatible with iOS, Android and Web Browsers using Flutter
  5. Select the hyperparameters for fine-tuning the ResNet 50 and compare with other models
  6. Design, refine and evaluate AI Model for Cataract Detection through Accuracy, Precision and Recall
  7. Setup the Cloud Service to deploy the model on Hugging Face
  8. Obtain label from features through CURL Commands to Hugging Face
  9. Setup the Database storage of user accounts and history using Supabase from Flutter
  10. Collect data from open-source platforms to build a dataset for training and testing data

Videos

Demonstration Video

Presentation Video

Methodologies and Technologies used

Overview of the Solution
  • Scanaract requires 4 components: Application, Hardware, Cloud and Database
  • The application connects smart glasses to the cloud, processes user requests, and serves as the user interface
  • Smart glasses (snow-goggle inspired) use a high-definition macro camera to capture pupil images
  • Images are stored in Supabase, then sent to HuggingFace cloud for cataract analysis
  • Results are returned to the app, which generates a human-readable report and stores them in the database
High-Level Architecture
  • System flowchart shows end-to-end architecture
  • Data flow among hardware acquisition, Supabase back-end, and Hugging Face cloud inference
Component Design
  • Three main components: hardware, application, and cloud AI
  • Each component has interlinked subcomponents
  • Two models explored for evaluation and optimal results
Hardware Design
  • Snow-goggle inspired outer case with macro camera (HXY-12 216 model)
  • Camera connected to Raspberry Pi 4B via USB for power and capture commands
  • LED strip inserted for controlled lighting, adjusted to minimize pupil reflections
  • Raspberry Pi powered by external source (e.g., power bank)
  • Pi's dual-band Wi-Fi acts as access point for mobile app
  • Flask Server hosted locally to accept HTTP requests for image capture
Mobile Application
  • Compatible with Android and iOS
  • Features: login/register, password reset, health trend view, camera connection
  • Allows photo capture, result analysis, history display, and report sharing with professionals
  • Supports account personalization and team information view
ResNet50 Model
  • ResNet50 chosen for medical image analysis with residual learning
  • 50-layer depth captures subtle cataract features efficiently
  • Pre-processing with Hough Circle Transform isolates pupil/iris regions
  • Transfer learning from ImageNet, fine-tuned on cataract dataset
  • Hyperparameters: Input 224×224, Batch size 16, Optimizer AdamW, Loss BCEWithLogitsLoss, LR 5e-5/5e-6, Epochs 10/20
ViT Hybrid Model
  • Hybrid ViT with ResNet50 backbone for reflection tolerance
  • YOLOv11 used to refine dataset by cropping pupil/iris regions
  • Data augmentation with flips and rotations
  • Two-stage training: classifier head initialization (10 epochs), full fine-tuning with attention regularization (15 epochs)
  • Hyperparameters: Input 224×224, Batch size 16, Optimizer Adam, Loss CrossEntropy, LR 5e-5/5e-6, λ_atten=0.2, margin=0.05
Database Schema
  • Tables: auth.users, profiles, examinations
  • Profiles store patient info, linked 1:1 with auth.users
  • Examinations linked 1:* with profiles, store health index, image URL, date, notes
  • Storage buckets: Avatars (profile photos), Inspection Results

Figure 1. High-Level Architecture Diagram

Figure 2: Database Tables

Results ( Prototype & Final System Design)

Prototype System Results Hardware Implementation
  • Snow-goggle inspired outer case with foam paddings for comfort and adjustable secure fit
  • Macro camera connected to Raspberry Pi 4B via USB for power and capture commands
  • Raspberry Pi powered by external source (e.g., power bank)
  • Dual-band Wi-Fi configured via Scanaract_Wifi.conf to act as access point
  • Flask Server hosted locally, auto-run via camera-server.service
  • API endpoints:
    • /capture for high-resolution image (latency issue)
    • /capture_fast for zero-latency capture using background thread at 30fps
    • /video_feed for live preview stream (JPEG-encoded, multipart/x-mixed-replace)
Application Implementation
  • Developed using Flutter framework with Material 3 design system
  • Single-page architecture with Navigator, MaterialPageRoute, and IndexedStack
  • UI components: Card, ListTile, ListView.builder, CircleAvatar, TextFormField, ElevatedButton, IconButton
  • Unified blue theme defined in ThemeData, global styles via InputDecorationTheme and ElevatedButtonThemeData
  • Rich animations: rising bar chart, pulse effects, staggered list entrances, animated eye icon in navigation bar
  • Backend integration with Raspberry Pi:
    • GET requests via CURL to trigger image capture
    • Uploads captured image to Supabase Storage
    • Public URL forwarded to HuggingFace AI model (Gradio-hosted) for analysis
    • Results displayed in app and saved in database

Figure 3. ResNet50 x System Flowchart

Figure 4. ResNet50 Results

Testing Result
Hardware Testing

The hardware goggles were tested for comfort, safety, clarity, stability, control, and punctuality. Comfort trials confirmed the weight was manageable, with adjustments made for different facial structures. Safety checks ensured attachments were secure and lighting was non-glare. Clarity tests showed the macro camera captured clear pupil images under supplemental lighting. Stability confirmed reliable Raspberry Pi–app connections. Control verified power consumption and heat were manageable with proper ventilation. Punctuality tests showed data transmission times under one second. Overall, the goggles emitted Wi-Fi stably, processed commands rapidly, and were safe and comfortable to use.

Application Testing

The Flutter-based app was tested across Android, iOS, and desktop browsers. System-specific permissions were verified to ensure functions worked without crashes or freezes. User interface testing confirmed smooth transitions, consistent fonts, correct image ratios, and responsive feedback. Backend testing optimized response speed. Communication protocols were validated for LAN and cloud interactions, with delay tolerances set for different devices. Cloud response times were tested along the full workflow path and remained within tolerance. Illegal input handling, gesture animations, reconnection mechanisms, program interruption resilience, API call monitoring, data consistency, encryption, and server crash recovery were all tested successfully. The app demonstrated secure data handling, seamless cross-platform performance, and reliable communication protocols.

Database Testing

Supabase PostgreSQL and Storage were tested for connection stability, data structure integrity, and multi-user requests. The database consistently added information quickly, maintained correct structures, and handled simultaneous user requests seamlessly. UUID keys, cascade deletes, and security policies ensured reliable and secure data management.

Computer Vision Model Testing

The CV model was tested for accuracy, precision, recall, and deployment. Fine-tuned models achieved expected accuracy, precision, and recall on unseen test photos. Deployment to HuggingFace was successful, with responses returned within 45 seconds. The model performed well with hardware conditions and correctly predicted cataracts in a real patient test (without goggles). Overall, the CV model demonstrated reliable diagnostic performance.

AI Accuracy Evaluation
ViT Hybrid Model

Initial evaluations showed ViT outperforming DenseNet121 on open-source datasets. However, hardware-acquired images revealed poor robustness in the baseline ViT. The team improved the ViT while also enhancing ResNet50. The hybrid ViT achieved higher accuracy than the baseline, with hyperparameters including input size 224×224, batch size 16, optimizer Adam, CrossEntropyLoss, learning rates 5e-5/5e-6, attention regularization λ=0.2, margin=0.05, and staged training (10 + 15 epochs).

ResNet50 Model

ResNet50 achieved high sensitivity and specificity, with AUC-ROC >0.98 and strong confusion matrix results. Grad-CAM visualizations confirmed the model focused on the crystalline lens, ensuring clinical interpretability. Statistical metrics demonstrated excellent true positive rates and reliable generalization across validation sets.

Model Comparison

Final comparison showed ViT Hybrid (Accuracy 99.7%, Precision 99.67%, Recall 99.67%) versus ResNet50 (Accuracy 100%, Precision 100%, Recall 95.54%). Given the priority of recall in medicine, ResNet50 was selected as the final model for Scanaract.

Figure 5. User Interface Design

Figure 6. Grad-CAM Evaluation ResNet50

Figure 7. Models Comparison

Implementation

Deployment Hardware (Raspberry Pi & Goggles)
  • Device: Raspberry Pi 4B integrated into custom snow-goggle inspired housing.
  • Camera: High-definition macro camera connected via USB for power and capture commands.
  • Lighting: LED strip installed for controlled illumination and reduced glare.
  • Power: External power bank supplies stable energy to Raspberry Pi and camera.
  • Networking: Raspberry Pi configured as a wireless access point via hostapd and dnsmasq.
  • Server: Flask Server hosted locally on Raspberry Pi to accept HTTP requests from the mobile app.
  • Scalability: Supports multiple users through Supabase database and HuggingFace cloud integration.
Final System Workflow
  • Connection: Mobile app connects to Raspberry Pi's local Wi-Fi network.
  • Image Capture: App sends GET requests to Flask Server endpoints (/capture, /capture_fast, /video_feed).
  • Data Transfer: Captured images uploaded to Supabase Storage and linked to user profiles.
  • Cloud Processing: HuggingFace-hosted ResNet50/ViT hybrid models analyze images for cataract detection.
  • Database Integration: Results stored in Supabase examinations table with health index and confidence scores.
  • Output: Mobile app displays diagnostic results, health trends, and scan history to the user.

Conclusion

Summary of the Project
  • Scanaract addresses the lack of accessible, affordable cataract screening tools in underserved regions.
  • WHO reports cataract affects 94 million people, with highest prevalence in low-income areas lacking ophthalmologists and slit-lamp equipment.
  • Developed a low-cost CV-powered ocular screening ecosystem using custom hardware, cross-platform mobile app, and cloud-based AI database.
  • Proved functional screening equipment can be assembled for less than HK$2000 (USD 250) with off-the-shelf parts.
  • Supports offline image capture in remote villages via Raspberry Pi's wireless access point, with cloud upload and AI inference when internet is available.
  • Project phases included component selection, hardware assembly, firmware, app development, database implementation, AI model optimization, and testing.
  • Provides a replicable blueprint for researchers and healthcare providers in resource-limited settings, embodying “prevention first” in public health.
Eyewear Design Implementation
  • Snow-goggle inspired outer case for aesthetics, comfort, and controlled lighting.
  • Lightweight design tested as comfortable for sessions up to 10 minutes.
  • Adjustable macro camera with LED strip illumination for consistent image quality.
  • Design improvements: nose ridge padding, anti-glare lighting, ventilation holes to reduce fogging and claustrophobia.
Hardware Prototype Implementation
  • Custom 3D-printed bracket designed in Blender and printed with black PLA filament.
  • Bracket secures Raspberry Pi on top and provides attachment points for camera and LED strip inside housing.
Local Wi-Fi & Server Setup
  • Raspberry Pi 4B configured as wireless access point via hostapd and dnsmasq.
  • Scanaract_Wifi.conf defines SSID and DHCP settings.
  • Flask Server implements API endpoints accessible via Flutter app.
  • Wi-Fi and Flask server configured to auto-launch on boot.
Application Development
  • Flutter app developed for Android, iOS, and web browsers.
  • Tested on Samsung Galaxy S7, iOS simulator, and Google Chrome.
  • UI features: consistent blue theme, rounded corners, responsive feedback on user actions.
AI Model Optimization
  • Hyperparameters selected for fine-tuning ResNet50 and compared with ViT hybrid model.
  • ResNet50 refined through iterative training and evaluation for accuracy, precision, and recall.
Cloud Deployment
  • Trained ResNet50 model deployed on HuggingFace using GradioSDK.
  • Application sends HTTP POST requests with image URLs to HuggingFace API.
  • Model returns predictions and confidence scores for cataract detection.
Database Implementation
  • Supabase PostgreSQL configured with two custom tables and storage buckets.
  • Row Level Security (RLS) policies implemented for secure user account and history management.
Dataset Collection
  • Data collected from multiple open-source platforms.
  • Constructed training, validation, and test sets for model development and evaluation.

Future Development

Hardware Enhancements
  • Redesign and model a new housing and internal support parts to improve durability.
  • Shorten wires for the camera and LED strip, and 3D-print an integrated housing with stronger component connections.
  • Customize a macro camera with autofocus to enhance image quality.
  • Add heat dissipation functionality to reduce overheating during prolonged use.
Application & Testing Expansion
  • Acquire an official Apple Developer certificate to enable complete iOS physical device testing.
  • Expand sign language vocabulary to include more gestures and sentence-level detection for richer bidirectional communication.
  • Integrate environmental awareness features such as detecting dangerous sounds (car horns, fire alarms) and displaying visual alerts for user safety.
  • Collaborate with medical institutions or government agencies to obtain permits for patient testing and construct a new cataract grading dataset.
AI Model Improvements
  • Address generalization concerns by diversifying datasets across ethnicities, genders, ages, and comorbidities.
  • Further optimize server-side models for Cantonese speech-to-text and Hong Kong Sign Language recognition to improve accuracy in noisy, real-world environments.
  • Integrate multi-modal emotion recognition by combining facial expression analysis with sign language to capture conversational tone more effectively.
System Optimization
  • Develop resource management scripts to balance performance and battery efficiency across devices.
  • Ensure smoother and more reliable operation by reducing overheating and improving energy efficiency.
  • Eliminate reliance on third-party APIs to reduce recurring costs and allow greater customization.