Agents
Agents are the brains of idTV. They are the models that generate the content, and in some cases orchestrate the subagents that make our broadcast work. Agents can be thought of as a combination of a single model and a single objective. An agent can be given an objective, and may request the usage of other agents through the idTV library. In these cases, results of the functions will be recursively provided to the agent and execution will continue until the objective is complete.
Agent Types
Not all agents are created equal. We employ a variety of agents to handle different tasks. Below are some of the agents we use, with a brief description of their purpose as well as how they may be initialized and called:
Data Processors
The data processing agent is the one we employ most often in order to generate structured data from oftentimes unstructured sources such as X newsfeed or parsed article data.
Setting | Value | Description |
---|---|---|
provider | anthropic | The large language model provider used by the agent to generate responses. |
embedder | default | The embedder used by the agent to generate embeddings. |
AI_MODEL | claude-3-opus-20240229 | The large language model used by the agent with the selected provider to generate responses. |
AI_TEMPERATURE | 0.7 | The temperature used by the agent to generate responses. |
MAX_TOKENS | 4000 | The maximum number of tokens used by the agent to generate responses. |
WEBSEARCH_TIMEOUT | 0 | Timeout for websearches to create a deadline to stop trying. |
WAIT_BETWEEN_REQUESTS | 1 | The number of seconds to wait between requests to the LLM provider. |
WAIT_AFTER_FAILURE | 3 | The number of seconds to wait after a failure to try again to make a request to the LLM provider. |
stream | false | Whether or not to stream the response from the LLM provider. |
Coordinators
The content generator agent would take in data from various data processors as well as fixed API requests to proritize and rank the importance of the information. Many of the parameters are similar to the data processor agent, with a few exceptions and additions for long term memory and retrieval.
Setting | Value | Description |
---|---|---|
USE_CACHED_DATA | true | Whether or not to use cached short term memory. Use cases such as if a token has undergone significant price movement. |
LONG_TERM_RETRIEVAL | true | Whether or not to use vector storage for long term memory. Use cases such as if a news article is pertinent to an old event. |
LONG_TERM_STORAGE | true | Whether or not to use vector storage for long term memory. Use cases such as if a news article is pertinent to an old event. |
LONG_TERM_MEMORY_PROVIDER | pgvector | The vector storage provider used for long term memory. |
Stream Renderer
The stream renderer agent is the one that takes in the content generated by the coordinator and renders it to the screen using cv2 to generate video frames. Based on the priority of the content, it will be rendered to the screen in specific sections trigger alerts.
Setting | Value | Description |
---|---|---|
GENERATE_BACKGROUND | false | Whether or not to generate a background image for the stream. Otherwise will default to bg.png |
IMAGE_PROVIDER | stable_diffusion | The image provider used to generate a background image. |
IMAGE_PROMPT | Cityscape view in the style of an 80s retro film | The prompt used to generate a background image. |
temperature | 0.8 | The temperature used by the image provider. |
NUMBER_OF_SECTIONS | 3 | The number of sections to render to the screen. |
SECTION_COLORS | #FF4B4B, #4B9EFF, #4BFF7A | The colors used by the sections. |
SECTION_POSITIONS | top-left, top-right, scanner | The positions of the sections on the screen. |
Anchor Generator
The anchor generator agent is the one that takes in the content generated by the coordinator and renders it to the screen. Based on the priority of the content, it will be rendered to the screen in specific sections trigger alerts. It's capable of taking in a generated video and overlaying the content on top of it.
Setting | Value | Description |
---|---|---|
BASE_VIDEO_PATH | /assets/videos/base.mp4 | The path to the base video of the anchor used for the broadcast. |
INTRO_LENGTH | 20 | The length of the intro video in seconds. If not provided, will default to a provided intro.mp4 |
OUTRO_LENGTH | 10 | The length of the outro video in seconds. If not provided, will default to a provided outro.mp4 |
TTS_PROVIDER | elevenlabs | The text to speech provider used to generate the voice of the anchor. |
LIP_SYNC_PROVIDER | latentsync | The lip sync provider used to generate the lip sync of the anchor. We are currently using a local LatentSync model hosted on a Modal cluster. |
Stream Agent
The stream agent takes in both frame renderer and anchor generator agents and orchestrates final outputs to be broadcasted to streaming services. Based on the readiness of videos generated, it will filter the ffmpeg outputs to display a combination of consistently generated frames (e.g. price tickers, trending news, token launches) and anchor generated videos (e.g. news segments, market commentary).
The stream agent is responsible for populating ffmpeg with the correct inputs and outputs, and the final send off of frames to streaming services. A simplified graphical representation of the stream agent is shown below:
Input 1 (Financial Data) ──┐
├── [Overlay Filter] ──┐
Input 2 (News MP4) ───────┘ │
├── Output Stream
Background Audio ─────────┐ │
├── [Audio Mix Filter] ─┘
News Audio ──────────────┘
The schema for the ffmpeg inputs look like the following:
interface StreamConfiguration {
// Core settings
streamSettings: {
distributor: string;
videoBitrate: number;
audioBitrate: number;
codec: string;
};
// Filter Graph Configuration
filterGraphs: {
maxGraphs: number; // Maximum allowed filter graphs
activeGraphs: number; // Currently active graphs
defaultGraph: string; // Default graph to use
graphs: {
[key: string]: {
video: {
filter: string; // FFmpeg filter string
parameters: { // Customizable parameters
position?: string;
blend?: string;
timing?: number;
}
};
audio: {
filter: string;
parameters: {
mix?: string;
fade?: number;
}
};
metadata: {
description: string;
triggers: string[];
dependencies?: string[];
}
}
}
};
// Input/Output Configuration
streams: {
inputs: {
primary: string; // Main video source
secondary?: string[]; // Additional sources
audio: string[]; // Audio sources
};
outputs: string[]; // Output endpoints
}
}