Real-Time, low audio latency based AI-Powered application architecture design
DOI:
https://doi.org/10.32968/psaie.2025.1.5Keywords:
OpenAI, low latency audio, parallel processing, websocketAbstract
This paper presents the design and implementation of a mobile application that provides users with an interactive conversational experience powered by OpenAI's language model. A key feature of this application is its real-time text response streaming, coupled with synchronized audio synthesis using Azure's text-to-speech (TTS) services. The architecture includes a Node.js backend server that handles OpenAI communication in streaming mode, sentence segmentation for response buffering, and a dedicated, multithreaded audio service for efficient TTS conversion. Parallelized webSocket communication enables high throughput real-time coordination between the backend and the audio service. This paper explores the system's architecture, implementation challenges, performance evaluation, and potential applications in education, accessibility, and virtual assistants.