Real-Time, low audio latency based AI-Powered application architecture design

Authors

  • Péter Mileff

DOI:

https://doi.org/10.32968/psaie.2025.1.5

Keywords:

OpenAI, low latency audio, parallel processing, websocket

Abstract

This paper presents the design and implementation of a mobile application that provides users with an interactive conversational experience powered by OpenAI's language model. A key feature of this application is its real-time text response streaming, coupled with synchronized audio synthesis using Azure's text-to-speech (TTS) services. The architecture includes a Node.js backend server that handles OpenAI communication in streaming mode, sentence segmentation for response buffering, and a dedicated, multithreaded audio service for efficient TTS conversion. Parallelized webSocket communication enables high throughput real-time coordination between the backend and the audio service. This paper explores the system's architecture, implementation challenges, performance evaluation, and potential applications in education, accessibility, and virtual assistants.

Downloads

Published

2025-12-10