SignesTrad: Real-Time French Sign Language to Text Interpreter

SignesTrad is an edge AI solution that translates French Sign Language (LSF) to text in real-time. Using the STM32N6 board's camera and Neural-ART NPU, our device captures hand gestures, processes them with optimized neural networks, and displays translated French text instantly. This portable solution works offline, making LSF communication accessible anywhere without internet dependency.

Vision and Purpose

Communication barriers between deaf individuals who use French Sign Language (LSF) and those who don't understand it create significant social and practical challenges. While professional interpreters provide excellent support, they're not always available, and existing digital solutions often require stable internet connections or powerful hardware. SignesTrad addresses these limitations by creating an affordable, portable device that performs real-time LSF interpretation directly on the edge, without dependency on cloud services.

Technical Implementation

SignesTrad harnesses the full potential of the STM32N6 Discovery Kit, particularly its Neural-ART Accelerator NPU, to run complex AI models efficiently at the edge. Our solution consists of:

Data Acquisition System: We utilize the MIPI camera interface of the STM32N6 board to capture high-quality video input at 30fps, with the camera positioned to clearly view the signer's hands and upper body movements.
AI Processing Pipeline: The core of our solution involves a two-stage deep learning approach:
- A modified MobileNetV3 model for hand and pose detection that identifies and tracks key points on the hands, arms, and face
- A temporal GRU (Gated Recurrent Unit) network that analyzes sequences of movements to recognize grammatical structures specific to LSF
Optimization for Edge Deployment: We plan to employ several techniques to ensure optimal performance on the STM32N6:
- Model quantization to 8-bit precision
- Layer fusion to reduce memory transfers
- Custom activation functions optimized for the Neural-ART architecture
- Hardware-specific memory allocation to minimize data transfer bottlenecks
User Interface: A clean, intuitive interface displays the translated text on the integrated LCD screen. The system includes:
- Real-time text display with minimal latency (<200ms from gesture to text)
- Confidence indicators for ambiguous interpretations
- Simple controls for adjusting sensitivity and language preferences
- Battery status and system diagnostics

Hardware Configuration

The SignesTrad prototype will utilize nearly all the key features of the STM32N6 Discovery Kit:

The powerful STM32N6 microcontroller serves as the system's brain, coordinating all operations
The Neural-ART NPU accelerates neural network inference by up to 30x compared to CPU-only execution
The MIPI connector interfaces with our custom camera module for high-quality video input
The 32MB HexaRAM provides sufficient memory for our model's activation maps and intermediate results
The onboard LCD display presents translated text to the user
The SD card slot stores our model weights and optional recording capabilities for system improvement

Proposed Software Architecture

Our software will be structured in a modular fashion to enable easy maintenance and future expansion:

Camera Interface Module: Will handle video capture, preprocessing, and frame management
AI Inference Engine: Will coordinate the execution of our neural networks on the Neural-ART NPU
Sign Language Processing: Will post-process network outputs to handle linguistic features of LSF
User Interface Manager: Will control display output and user input
System Management: Will handle power, connectivity, and resource allocation

Expected Performance Metrics

Based on our research and preliminary design, we project the following performance targets:

Sign recognition accuracy: 85-90% | For isolated signs
Sentence comprehension: 75-80% | For simple sentences
Processing latency: <250ms | From gesture to display
Frame rate: 12-15 FPS | Sufficient for fluid interpretation
Power consumption: ~1.5W | Estimated during active use
Initial vocabulary size: 500-600 words | First implementation
Boot time: ~5s | From power-on to ready state

Proposed Implementation Timeline

We anticipate a development process spanning approximately six months:

Month 1-2: Dataset collection and model training
Month 3: Initial algorithm implementation and optimization
Month 4: Hardware integration and testing
Month 5: User interface development and performance tuning
Month 6: Final testing, validation, and documentation

Why This Project Should Be Selected

SignesTrad represents a perfect match for the STM32 Edge AI Contest for several reasons:

Social Impact: Our project addresses a real-world accessibility challenge faced by approximately 300,000 deaf individuals in France who use LSF as their primary means of communication.
Technical Innovation: We push the boundaries of what's possible with edge AI on microcontrollers, demonstrating how complex computer vision and natural language processing can be optimized for embedded systems.
Complete Utilization of STM32N6 Capabilities: Our solution leverages virtually all the key features of the STM32N6 Discovery Kit, from its Neural-ART NPU to its camera interface, memory resources, and display capabilities.
Practical Implementation: SignesTrad is designed to be usable in everyday situations, with careful attention to user experience, battery life, and real-world performance.
Future Potential: The project establishes a foundation for expanded capabilities and could lead to commercial applications that bring tangible benefits to the deaf community.

By selecting SignesTrad, the judges would support not only technical innovation but also meaningful social inclusion through accessible technology.