Agents

Agents are the brains of idTV. They are the models that generate the content, and in some cases orchestrate the subagents that make our broadcast work. Agents can be thought of as a combination of a single model and a single objective. An agent can be given an objective, and may request the usage of other agents through the idTV library. In these cases, results of the functions will be recursively provided to the agent and execution will continue until the objective is complete.

Agent Types

Not all agents are created equal. We employ a variety of agents to handle different tasks. Below are some of the agents we use, with a brief description of their purpose as well as how they may be initialized and called:

Data Processors

The data processing agent is the one we employ most often in order to generate structured data from oftentimes unstructured sources such as X newsfeed or parsed article data.

Setting	Value	Description
provider	anthropic	The large language model provider used by the agent to generate responses.
embedder	default	The embedder used by the agent to generate embeddings.
AI_MODEL	claude-3-opus-20240229	The large language model used by the agent with the selected provider to generate responses.
AI_TEMPERATURE	0.7	The temperature used by the agent to generate responses.
MAX_TOKENS	4000	The maximum number of tokens used by the agent to generate responses.
WEBSEARCH_TIMEOUT	0	Timeout for websearches to create a deadline to stop trying.
WAIT_BETWEEN_REQUESTS	1	The number of seconds to wait between requests to the LLM provider.
WAIT_AFTER_FAILURE	3	The number of seconds to wait after a failure to try again to make a request to the LLM provider.
stream	false	Whether or not to stream the response from the LLM provider.

Coordinators

The content generator agent would take in data from various data processors as well as fixed API requests to proritize and rank the importance of the information. Many of the parameters are similar to the data processor agent, with a few exceptions and additions for long term memory and retrieval.

Setting	Value	Description
USE_CACHED_DATA	true	Whether or not to use cached short term memory. Use cases such as if a token has undergone significant price movement.
LONG_TERM_RETRIEVAL	true	Whether or not to use vector storage for long term memory. Use cases such as if a news article is pertinent to an old event.
LONG_TERM_STORAGE	true	Whether or not to use vector storage for long term memory. Use cases such as if a news article is pertinent to an old event.
LONG_TERM_MEMORY_PROVIDER	pgvector	The vector storage provider used for long term memory.

Stream Renderer

The stream renderer agent is the one that takes in the content generated by the coordinator and renders it to the screen using cv2 to generate video frames. Based on the priority of the content, it will be rendered to the screen in specific sections trigger alerts.

Setting	Value	Description
GENERATE_BACKGROUND	false	Whether or not to generate a background image for the stream. Otherwise will default to bg.png
IMAGE_PROVIDER	stable_diffusion	The image provider used to generate a background image.
IMAGE_PROMPT	Cityscape view in the style of an 80s retro film	The prompt used to generate a background image.
temperature	0.8	The temperature used by the image provider.
NUMBER_OF_SECTIONS	3	The number of sections to render to the screen.
SECTION_COLORS	#FF4B4B, #4B9EFF, #4BFF7A	The colors used by the sections.
SECTION_POSITIONS	top-left, top-right, scanner	The positions of the sections on the screen.

Anchor Generator

The anchor generator agent is the one that takes in the content generated by the coordinator and renders it to the screen. Based on the priority of the content, it will be rendered to the screen in specific sections trigger alerts. It's capable of taking in a generated video and overlaying the content on top of it.

Setting	Value	Description
BASE_VIDEO_PATH	/assets/videos/base.mp4	The path to the base video of the anchor used for the broadcast.
INTRO_LENGTH	20	The length of the intro video in seconds. If not provided, will default to a provided intro.mp4
OUTRO_LENGTH	10	The length of the outro video in seconds. If not provided, will default to a provided outro.mp4
TTS_PROVIDER	elevenlabs	The text to speech provider used to generate the voice of the anchor.
LIP_SYNC_PROVIDER	latentsync	The lip sync provider used to generate the lip sync of the anchor. We are currently using a local LatentSync model hosted on a Modal cluster.

Stream Agent

The stream agent takes in both frame renderer and anchor generator agents and orchestrates final outputs to be broadcasted to streaming services. Based on the readiness of videos generated, it will filter the ffmpeg outputs to display a combination of consistently generated frames (e.g. price tickers, trending news, token launches) and anchor generated videos (e.g. news segments, market commentary).

The stream agent is responsible for populating ffmpeg with the correct inputs and outputs, and the final send off of frames to streaming services. A simplified graphical representation of the stream agent is shown below:


Input 1 (Financial Data) ──┐
                          ├── [Overlay Filter] ──┐
Input 2 (News MP4) ───────┘                     │
                                                ├── Output Stream
Background Audio ─────────┐                      │
                         ├── [Audio Mix Filter] ─┘
News Audio ──────────────┘

The schema for the ffmpeg inputs look like the following:


interface StreamConfiguration {
  // Core settings
  streamSettings: {
    distributor: string;
    videoBitrate: number;
    audioBitrate: number;
    codec: string;
  };

  // Filter Graph Configuration
  filterGraphs: {
    maxGraphs: number;        // Maximum allowed filter graphs
    activeGraphs: number;     // Currently active graphs
    defaultGraph: string;     // Default graph to use
    graphs: {
      [key: string]: {
        video: {
          filter: string;     // FFmpeg filter string
          parameters: {       // Customizable parameters
            position?: string;
            blend?: string;
            timing?: number;
          }
        };
        audio: {
          filter: string;
          parameters: {
            mix?: string;
            fade?: number;
          }
        };
        metadata: {
          description: string;
          triggers: string[];
          dependencies?: string[];
        }
      }
    }
  };

  // Input/Output Configuration
  streams: {
    inputs: {
      primary: string;       // Main video source
      secondary?: string[];  // Additional sources
      audio: string[];      // Audio sources
    };
    outputs: string[];      // Output endpoints
  }
}