Runway Characters lands as the first commercially deployed real-time conversational video agent that crosses the live-meeting threshold from a single image. It runs at 24 frames per second in HD with 1.75 seconds of end-to-end latency, joins Zoom, Google Meet, and Microsoft Teams, and ships with voice cloning and tool calling. The closest competitors on the market today are HeyGen Interactive Avatar, Synthesia Express avatars, D-ID Agents, and Tavus, each of which has a different cost structure, a different definition of real-time, and a different lock-in around input format. This deep dive compares all five across the seven decisions a creator actually has to make before they commit a workflow.
Background
On May 4, 2026 Runway published the engineering details of Runway Characters on its news blog, alongside an open product page at app.runwayml.com/characters. The product is built on GWM-1, Runway's general world model, and is available through the web and mobile apps and the developer endpoint at docs.dev.runwayml.com/api. Runway is the same lab behind Gen-4 video, Aleph, and the recent Seedance 2.0 API rollout, so Characters is not a side project. It is the conversational layer slotted on top of an existing video generation stack.
The competing stack is older but uneven. HeyGen has been shipping interactive avatars since 2024 and launched a developer platform earlier this year covered in our HeyGen CLI deep dive. Synthesia ships Express avatars and Personal avatars for enterprise video production, with an interactive avatar tier in beta. D-ID Agents and Tavus both target conversational use cases first, with very different latency profiles. The open-weights real-time pipeline behind LPM 1.0 is also in the picture for teams that prefer to self-host. Five different bets on the same surface.
Deep Analysis
Input format and fine-tuning friction
Runway Characters takes a single reference image and extracts style directly. There is no fine-tuning step, and the image can be photorealistic, an illustrated mascot, or a fantasy creature. HeyGen Interactive Avatar requires either a stock avatar from its library or a custom avatar built from a 2 to 5 minute consent video, which then takes hours to process. Synthesia Express works from a 1 to 2 minute script-driven recording and Personal avatars require studio-quality footage. D-ID Agents accept a single still photograph, similar to Runway, but the output is more rigid in the head pose dimension. Tavus uses its Phoenix model and requires a 2-minute training video for a personal replica.
For a creator who wants to spin up a custom mascot for a launch in one afternoon, this is the single biggest dividing line. Runway Characters and D-ID can do it the same day from one image. HeyGen, Synthesia, and Tavus all require a recording session and a processing queue.
Latency and the live-meeting threshold
Runway reports 1.75 seconds end-to-end latency from when the user stops speaking until the character starts responding, with 37 milliseconds of model time per frame and a 24 fps HD output. The server-side breakdown is 1,185 milliseconds of voice agent work plus 567 milliseconds of video pipeline work. That is the fastest published number across the five products and is the first to clear the threshold where a conversation feels like a normal video call rather than a turn-based exchange.
HeyGen Interactive Avatar reports response times in the 2 to 4 second range depending on language and session load. D-ID Agents currently publish around 2 seconds of end-to-end latency for English. Tavus advertises sub-1 second response time for its Conversational Video Interface but caps output at standard definition for that tier. Synthesia interactive avatars are not yet in the same latency class because Synthesia's core surface is script-driven prerecorded video.
The practical implication is meeting integration. A character that takes more than two seconds to respond pulls a live Zoom or Meet conversation out of flow. Below two seconds the back-and-forth survives. Runway is the only product on this list that publishes its full latency budget at 24 fps HD and lists Zoom, Google Meet, and Microsoft Teams as supported surfaces.
Voice cloning and voice design
Runway Characters offers instant voice cloning from a short audio sample and a text-to-voice design path for creators who do not have a reference voice. The voice stays consistent across sessions inside the same Character. HeyGen integrates ElevenLabs and Azure voices and supports voice cloning through a separate flow. Synthesia voices are sourced from its in-house voice library with cloning available for enterprise tiers. D-ID Agents use ElevenLabs and PlayHT voices through partner APIs. Tavus uses its own voice model tuned for conversational rhythm and offers a cloning path with a consent recording.
The choice here often comes down to whether a creator wants the voice and the face on one bill or wants to keep voice flexibility through ElevenLabs or Inworld TTS-2. Runway and Tavus collapse it to one bill. HeyGen and D-ID keep the voice marketplace open. Synthesia leans toward its closed stack.
Meeting integration and embeddable surfaces
Runway Characters ships with a single-line embed widget plus first-party hooks into Zoom, Google Meet, and Microsoft Teams. The character can also see what the user is sharing through optional webcam and screen-share inputs, which means it can react to a slide deck or a code editor live. HeyGen Interactive Avatar has an SDK and an embeddable widget but does not yet list direct Zoom or Teams integration. D-ID Agents publish an iframe widget and a chat SDK and have a Zoom App in their marketplace. Tavus exposes a Conversational Video Interface widget and SDKs in JavaScript and Python.
For a creator who wants the character to show up as a guest in a recurring meeting, Runway is the only one with all three major meeting platforms supported on day one. For an embedded helper inside a SaaS product, D-ID and Tavus have the most mature widget tooling. HeyGen sits in the middle and is the strongest option for video-first agencies that already use HeyGen Studio.
Tool calling, knowledge bases, and live reactions
Runway Characters supports tool calling so a character can trigger backend functions during a conversation, plus a knowledge base attachment for text and Markdown documents that scope what the character knows. This is the same pattern that text-only agent frameworks have used for two years, now wrapped in a live video face. HeyGen Streaming Avatar exposes a similar tool-calling layer through its API. D-ID Agents support knowledge bases natively and tool calls through its API. Tavus exposes function calling and a knowledge base on its Conversational Video Interface. Synthesia is the outlier because its primary surface is still asynchronous video production rather than agentic conversation.
For creators who plan to wire a character into a real product such as a booking flow or a support backend, this is table stakes. Runway hits it. So do HeyGen, D-ID, and Tavus. Synthesia does not, at least not yet.
Pricing and per-session cost
Runway has not yet published a per-minute price for Characters. The product is bundled with existing Runway plans and is available through the API with rate limits the company calls out in its docs but does not pin to a public per-minute number. Creators on free and Standard plans can run sessions to evaluate the system. HeyGen Interactive Avatar pricing starts at about 2 to 4 dollars per minute of streamed video depending on plan. Synthesia interactive avatars are part of enterprise contracts only. D-ID Agents start around 0.30 dollars per minute on developer plans. Tavus runs roughly 1 to 2 dollars per minute on its conversational tier.
This is the largest unknown for any creator who plans to run a character at scale. A 60-second product demo embedded on a high-traffic page can ring up real cost very quickly at HeyGen or Tavus pricing. D-ID is the cheapest on a strict per-minute basis. Until Runway publishes a public per-minute rate, treat the Characters price as the dependent variable in any ROI math and budget against the closest analog, which today is HeyGen.
Output ceiling and model lineage
Runway Characters runs on GWM-1, the same world-model lineage powering Gen-4 video and Aleph. That matters because the ceiling on lip sync quality, expression range, and head motion is tied to the foundation model, not to a separate avatar pipeline. HeyGen runs a proprietary avatar model that has been iterated for two years on subscription scale. Synthesia is the most mature pure avatar pipeline on the market with multi-year refinement of dubbing and lip-sync. D-ID's Live Portrait and Express models are tuned for low-latency talking head from a single image. Tavus Phoenix targets conversational rhythm specifically.
The takeaway is that the foundation-model bet matters more than it used to. A general world model can keep absorbing video improvements from the rest of the lab. A pure avatar pipeline is bounded by the avatar roadmap of its parent product.
Comparison Table
| Capability | Runway Characters | HeyGen Interactive | Synthesia Express | D-ID Agents | Tavus CVI |
|---|---|---|---|---|---|
| Input format | 1 image | 2-5 min consent video | 1-2 min recording | 1 image | 2 min training video |
| Fine-tuning required | None | Hours of processing | Hours of processing | None | Hours of processing |
| End-to-end latency | 1.75 s | 2-4 s | Async (script) | ~2 s | Sub-1 s (SD tier) |
| Frame rate (HD) | 24 fps | 30 fps | 30 fps | 25 fps | 24 fps |
| Voice cloning | Instant, native | ElevenLabs partner | Enterprise only | ElevenLabs partner | Native + consent clone |
| Zoom/Meet/Teams | All three native | Widget only | No | Zoom marketplace | Widget only |
| Tool calling | Yes | Yes | No | Yes | Yes |
| Embed widget | One-line embed | SDK + iframe | No | iframe + chat SDK | JS + Python SDKs |
| Foundation lineage | GWM-1 world model | Avatar-specific | Avatar-specific | Live Portrait | Phoenix conversational |
| Per-minute pricing | Bundled / TBA | ~$2-4/min | Enterprise contract | ~$0.30/min | ~$1-2/min |
When Each One Wins
Pick Runway Characters if you want a character that joins live Zoom, Google Meet, or Microsoft Teams calls from a single image, with the lowest published HD latency and a foundation model that will keep improving alongside the rest of the Runway video stack. Strongest for creators who want to spin up a launch mascot, an on-screen cohost, or a branded studio agent in a single afternoon.
Pick HeyGen Interactive Avatar if you already use HeyGen Studio for produced video and want one vendor across asynchronous video and live agents. Strongest for agencies and product teams whose video pipeline is already standardized on HeyGen.
Pick Synthesia Express if your primary use case is script-driven enterprise video and your live conversational need is secondary. Strongest for L&D teams that need translated, brand-consistent talking-head content at scale.
Pick D-ID Agents if per-minute cost is the binding constraint and you can accept slightly lower lip-sync ceiling. Strongest for support agents embedded on high-traffic pages where the math has to work at scale.
Pick Tavus if your product is a one-on-one consultative experience such as a recruiter screen, a tutor, or a sales discovery call, and you want sub-1 second latency at the cost of HD resolution. Strongest for B2B SaaS surfaces with deep function-calling integration.
Impact on Creators
The market has been split into two tiers for the last 18 months. Tier one was asynchronous talking-head video led by Synthesia and HeyGen Studio. Tier two was experimental real-time avatars from D-ID, Tavus, and a long tail of research demos. Runway Characters collapses the split. The same lab that ships state of the art video generation now ships the live talking-face surface on top of it, and the latency budget is finally below the threshold where a creator can defend a character on a live call.
That moves the bottleneck from technology to product. The question is no longer whether the lip sync is good enough or whether the latency is tolerable. The question is what kind of character a creator wants on screen, what knowledge it should carry, and what action it should be able to take through tool calls. That is a creative direction problem more than an engineering problem.
For solo creators and small studios, this is the first time it is reasonable to plan a product launch around a custom on-screen agent with a four-week timeline. For larger teams, the comparison work above is the buy decision. Run the same script through Runway Characters and against the closest competitor on the list, measure the result against the creative direction, and pick.
Key Takeaways
- Runway Characters is the first conversational video product to publish 1.75 second end-to-end latency at 24 fps HD from a single image input.
- The five-product comparison breaks along input format, latency, voice path, meeting integration, and per-minute cost.
- Runway is the only product on the list with native Zoom, Google Meet, and Microsoft Teams support on day one.
- D-ID remains the cheapest per minute. Runway has not published per-minute pricing yet, which is the single largest unknown for high-volume use cases.
- The foundation model lineage matters more than it used to because conversational video now inherits improvements from the parent video stack rather than from an isolated avatar pipeline.
What to Watch
The first thing to watch is Runway's per-minute pricing announcement. Until that lands, scaling decisions stay theoretical. The second is whether HeyGen, Synthesia, and Tavus respond with a foundation-model rewrite of their avatar layers. The third is whether the open-weights pipeline behind LPM 1.0 catches up on latency at 24 fps HD, because a self-hosted real-time character would change the cost equation for any team with GPU access. The fourth is regulatory. Real-time face cloning of public figures from a single image will draw the same authenticity guardrail debate that hit synthetic performers on the film side this spring. Expect platform-level controls, watermarking requirements, and consent verification flows to land before this surface goes mainstream.
FAQ
Can Runway Characters use a photo of a real person without their permission?
Runway's terms restrict use of images of real people without consent. Practically, that means a creator should only build a character from a photo they own or have permission to use. Brands should also obtain written consent before turning a real spokesperson into a live agent.
Is Runway Characters available on the free plan?
Yes. Free and trial users can run sessions to evaluate the system. The exact session minute allowance has not been published but the product is gated by per-session usage rather than by plan tier.
What is the difference between Runway Characters and HeyGen Interactive Avatar?
The biggest difference is input format. Runway works from one image and skips the fine-tuning step. HeyGen requires a consent video. The second difference is latency. Runway publishes the lowest HD number on the market today. The third is meeting integration. Runway natively supports Zoom, Google Meet, and Teams. HeyGen does not yet.
Does Runway Characters work for cartoon and fantasy characters?
Yes. The reference image can be a photorealistic person, an illustrated mascot, or a fantasy creature. Style extraction happens directly from the input image, and Runway reports natural lip sync, facial expressions, and head motion across all three input styles.
How does pricing actually compare across the five products?
D-ID is currently the cheapest at roughly 0.30 dollars per minute on developer plans. Tavus runs about 1 to 2 dollars per minute. HeyGen runs roughly 2 to 4 dollars per minute. Synthesia is enterprise-only. Runway has not yet published a per-minute rate for Characters, so the safest planning assumption is to model against HeyGen pricing until the rate lands.