Support for realtime audio input

Noting that the processing time is considerably shorter than the length of speech, is it possible to feed the models real time microphone output? Or does the inference run on the complete audio stream, instead of sample by sample?

This would greatly reduce the latency for voice assistants and the like, that the audio does not need to be fully captured and only after that fed to the models. Basically the same as I did here with SODA: https://github.com/biemster/gasr, but then with an open source and multilang model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for realtime audio input #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Support for realtime audio input #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions