SignesTrad: Real-Time French Sign Language to Text Interpreter

SignesTrad is an edge AI solution that translates French Sign Language (LSF) to text in real-time. Using the STM32N6 board's camera and Neural-ART NPU, our device captures hand gestures, processes them with optimized neural networks, and displays translated French text instantly. This portable solution works offline, making LSF communication accessible anywhere without internet dependency.
- Data Acquisition System: We utilize the MIPI camera interface of the STM32N6 board to capture high-quality video input at 30fps, with the camera positioned to clearly view the signer's hands and upper body movements.
- AI Processing Pipeline: The core of our solution involves a two-stage deep learning approach:
- A modified MobileNetV3 model for hand and pose detection that identifies and tracks key points on the hands, arms, and face
- A temporal GRU (Gated Recurrent Unit) network that analyzes sequences of movements to recognize grammatical structures specific to LSF
- Optimization for Edge Deployment: We plan to employ several techniques to ensure optimal performance on the STM32N6:
- Model quantization to 8-bit precision
- Layer fusion to reduce memory transfers
- Custom activation functions optimized for the Neural-ART architecture
- Hardware-specific memory allocation to minimize data transfer bottlenecks
- User Interface: A clean, intuitive interface displays the translated text on the integrated LCD screen. The system includes:
- Real-time text display with minimal latency (<200ms from gesture to text)
- Confidence indicators for ambiguous interpretations
- Simple controls for adjusting sensitivity and language preferences
- Battery status and system diagnostics
- The powerful STM32N6 microcontroller serves as the system's brain, coordinating all operations
- The Neural-ART NPU accelerates neural network inference by up to 30x compared to CPU-only execution
- The MIPI connector interfaces with our custom camera module for high-quality video input
- The 32MB HexaRAM provides sufficient memory for our model's activation maps and intermediate results
- The onboard LCD display presents translated text to the user
- The SD card slot stores our model weights and optional recording capabilities for system improvement
- Camera Interface Module: Will handle video capture, preprocessing, and frame management
- AI Inference Engine: Will coordinate the execution of our neural networks on the Neural-ART NPU
- Sign Language Processing: Will post-process network outputs to handle linguistic features of LSF
- User Interface Manager: Will control display output and user input
- System Management: Will handle power, connectivity, and resource allocation
- Sign recognition accuracy: 85-90% | For isolated signs
- Sentence comprehension: 75-80% | For simple sentences
- Processing latency: <250ms | From gesture to display
- Frame rate: 12-15 FPS | Sufficient for fluid interpretation
- Power consumption: ~1.5W | Estimated during active use
- Initial vocabulary size: 500-600 words | First implementation
- Boot time: ~5s | From power-on to ready state
- Month 1-2: Dataset collection and model training
- Month 3: Initial algorithm implementation and optimization
- Month 4: Hardware integration and testing
- Month 5: User interface development and performance tuning
- Month 6: Final testing, validation, and documentation
- Social Impact: Our project addresses a real-world accessibility challenge faced by approximately 300,000 deaf individuals in France who use LSF as their primary means of communication.
- Technical Innovation: We push the boundaries of what's possible with edge AI on microcontrollers, demonstrating how complex computer vision and natural language processing can be optimized for embedded systems.
- Complete Utilization of STM32N6 Capabilities: Our solution leverages virtually all the key features of the STM32N6 Discovery Kit, from its Neural-ART NPU to its camera interface, memory resources, and display capabilities.
- Practical Implementation: SignesTrad is designed to be usable in everyday situations, with careful attention to user experience, battery life, and real-world performance.
- Future Potential: The project establishes a foundation for expanded capabilities and could lead to commercial applications that bring tangible benefits to the deaf community.
Discussione (1 nota(e))