Development June 3, 2025

Singapore Sign Language Detection

Deep learning model for real-time Singapore Sign Language recognition using computer vision and LSTM neural networks

Task

Build a real-time sign language detection system for Singapore Sign Language using deep learning

  • Role

    Machine Learning Engineer

  • Tech

    Python, OpenCV, MediaPipe, TensorFlow, LSTM

  • Status

    Completed

The Challenge

Problem Statement

The Deaf community in Singapore faces significant communication barriers in daily interactions. While there are approximately 5,000 Deaf individuals in Singapore using Singapore Sign Language (SgSL), the general public has limited ability to communicate with them. Existing sign language recognition solutions are often:

  • Not optimized for SgSL (Singapore’s native sign language)
  • Limited in real-time performance on consumer hardware
  • Unable to handle masked faces (relevant post-COVID)
  • Requiring expensive specialized equipment

The goal was to build an accessible, real-time sign language detection system that works on standard webcams and handles both masked and unmasked scenarios.

Implementation

Technical Architecture

Computer Vision Pipeline:

  • MediaPipe Holistic: Real-time pose detection extracting keypoints from face, hands, and body landmarks
  • OpenCV: Video capture and preprocessing for frame extraction
  • Feature Engineering: Normalized landmark coordinates for position-invariant recognition

Deep Learning Model:

  • Architecture: LSTM (Long Short-Term Memory) neural network for sequential gesture recognition
  • Input: 30-frame sequences of landmark keypoints
  • Optimization: Keras Tuner for hyperparameter optimization
  • Performance: Optimized for both CPU and GPU inference
Methodology

Development Process

Data Collection: Captured diverse sign language gestures with both masked and unmasked subjects to ensure model robustness across real-world scenarios.

Feature Engineering: Extracted 468+ keypoints from face, pose, and hand landmarks using MediaPipe. Applied normalization to ensure position and scale invariance.

Model Training:

  • Designed LSTM architecture for temporal sequence recognition
  • Trained on 30-frame sequences to capture gesture dynamics
  • Used Keras Tuner to optimize layer sizes and dropout rates
  • Evaluated performance on CPU vs GPU environments
Key Obstacles

Challenges

Frame Rate Optimization: Implemented efficient processing pipeline to maintain real-time performance (30 FPS) while running pose estimation and inference simultaneously.

Mask Compatibility: Enhanced model robustness for masked scenarios by focusing on hand and upper body landmarks when facial features are obscured.

Resource Constraints: Optimized model for CPU performance to ensure accessibility on standard laptops without requiring GPU hardware.

Sequence Recognition: Developed custom algorithms for continuous gesture detection, distinguishing between intentional signs and transitional movements.

Reflection

Impact & Learning

This project provided valuable insights into real-world applications of computer vision in accessibility. Key learnings include:

  • Real-time ML constraints: Balancing model complexity with inference speed for live applications
  • Deep learning optimization: Techniques for model compression and hardware-specific tuning
  • Inclusive design: Importance of considering diverse user scenarios (masked/unmasked, different lighting)
  • Accessibility tech: Understanding how technology can bridge communication gaps for underserved communities

While the project was a personal learning exercise, it highlighted the potential for technology to create more inclusive environments for the Deaf community in Singapore.

Back