Development June 3, 2025

Singapore Sign Language Detection

Deep learning model for real-time Singapore Sign Language recognition using computer vision and LSTM neural networks

Task

Build a real-time sign language detection system for Singapore Sign Language using deep learning

Role

Machine Learning Engineer
Tech

Python, OpenCV, MediaPipe, TensorFlow, LSTM
Status

Completed

The Deaf community in Singapore faces significant communication barriers in daily interactions. While there are approximately 5,000 Deaf individuals in Singapore using Singapore Sign Language (SgSL), the general public has limited ability to communicate with them. Existing sign language recognition solutions are often:

Not optimized for SgSL (Singapore’s native sign language)
Limited in real-time performance on consumer hardware
Unable to handle masked faces (relevant post-COVID)
Requiring expensive specialized equipment

The goal was to build an accessible, real-time sign language detection system that works on standard webcams and handles both masked and unmasked scenarios.

Computer Vision Pipeline:

MediaPipe Holistic: Real-time pose detection extracting keypoints from face, hands, and body landmarks
OpenCV: Video capture and preprocessing for frame extraction
Feature Engineering: Normalized landmark coordinates for position-invariant recognition

Deep Learning Model:

Architecture: LSTM (Long Short-Term Memory) neural network for sequential gesture recognition
Input: 30-frame sequences of landmark keypoints
Optimization: Keras Tuner for hyperparameter optimization
Performance: Optimized for both CPU and GPU inference

Data Collection: Captured diverse sign language gestures with both masked and unmasked subjects to ensure model robustness across real-world scenarios.

Feature Engineering: Extracted 468+ keypoints from face, pose, and hand landmarks using MediaPipe. Applied normalization to ensure position and scale invariance.

Model Training:

Designed LSTM architecture for temporal sequence recognition
Trained on 30-frame sequences to capture gesture dynamics
Used Keras Tuner to optimize layer sizes and dropout rates
Evaluated performance on CPU vs GPU environments

Frame Rate Optimization: Implemented efficient processing pipeline to maintain real-time performance (30 FPS) while running pose estimation and inference simultaneously.

Mask Compatibility: Enhanced model robustness for masked scenarios by focusing on hand and upper body landmarks when facial features are obscured.

Resource Constraints: Optimized model for CPU performance to ensure accessibility on standard laptops without requiring GPU hardware.

Sequence Recognition: Developed custom algorithms for continuous gesture detection, distinguishing between intentional signs and transitional movements.

This project provided valuable insights into real-world applications of computer vision in accessibility. Key learnings include:

Real-time ML constraints: Balancing model complexity with inference speed for live applications
Deep learning optimization: Techniques for model compression and hardware-specific tuning
Inclusive design: Importance of considering diverse user scenarios (masked/unmasked, different lighting)
Accessibility tech: Understanding how technology can bridge communication gaps for underserved communities

While the project was a personal learning exercise, it highlighted the potential for technology to create more inclusive environments for the Deaf community in Singapore.

Back

Next Project

From Rail to Real Estate: Award winning Tableau Dashboard

Singapore Sign Language Detection

Task

Role

Tech

Status

Problem Statement

Technical Architecture

Development Process

Challenges

Impact & Learning

Next Project