Providers
Our goal is to make starting an broadcast channel as easy as possible. Successful frameworks such as ai16z's Eliza have shown that this sparks innovation and creativity within the ecosystem. While we do not have a public repository yet, we think it is useful to provide some of the providers we will support so that developers can plan ahead.
It is important to note that while we intend on maximizing our independence from external providers through open source, open weight models, we will provide developers with flexibility to use the providers they want.
Current LLM Providers
Large Language Models (LLMs) are the brains of idTV. They are the models that generate the content, and in some cases orchestrate the subagents that make our broadcast work.
Hugging Face (Open Source models)
Anthropic (Claude)
OpenAI (GPT-4, etc)
Google (Gemini)
Lip Sync Providers
Lip sync providers are used to generate natural speaking animationsfor the anchors and guests. As there are no cost and compute efficient public providers of lip sync inference, we have spun up a private GPU cluster on Modal to host an open source model called LatentSync.
Media Providers
Image generation providers are used to generate images based on the data generated by the contextual LLM agent. If the 'create_image' flag is set to 'true' in the frame composer endpoint, an image will be generated based on the content of the broadcast. Otherwise will default to a static image provided by the user.
Down the line, when we open our framework for other streams, we will be able to use these same providers for generating assets such as anchors, guests, and more.
Stability AI (Stable Diffusion)
Runway (RunwayML)
Audio Providers
Audio providers are used to generate audio based on the data generated by the contextual LLM agent, ranging from Text to Speech to Music. We are currently working to support generative music to accompany our stream.
ElevenLabs (Text to Speech, Voice Cloning)
PlayAI (Text to Speech)
Suno (Music Generation)