Sesame iOS Preview: 4 Voice Agents, Real-Time Search
Voice AI lab Sesame opened the iOS preview of its mobile app on May 27, 2026, shipping four distinct voice agents with real-time web search and visual information cards.
Voice AI lab Sesame opened the iOS preview of its mobile app on May 27, 2026, shipping four distinct voice agents with real-time web search and visual information cards.
MERIT is a new AI framework that learns to separate melody, rhythm, and timbre components of music similarity. Unlike existing tools that treat a song as a single fingerprint, it lets you ask: do these two tracks share the same groove?
Audio.Observer is a new streaming platform that turns daily news headlines into AI-generated songs. It launched on May 19, 2026, and publishes new tracks every day across tech, markets, world events, and culture.
Parakeet TDT v3, the fast NVIDIA transcription model, silently translates French speech to English with up to 31% English intrusion on spontaneous recordings.
ElevenLabs shipped a substantial May 13 update: GPT-5.4-mini and nano options in Voice Agents, document-level RAG for knowledge bases, and workspace-level API analytics.
Inworld released Realtime TTS-2 on May 5 with sub-200ms latency, plain-English direction, and one voice across 100+ languages, ranking above ElevenLabs and Google on Speech Arena.
xAI pushed Grok 4.3 to the public API on April 30 and bundled Custom Voices, a one-minute voice cloning workflow, at no per-voice surcharge. The release consolidates text, voice, and creative agents on a single xAI bill.
The first week of April 2026 delivered three significant open-source audio AI models that directly challenge paid alternatives in voice synthesis, sound effects, and multilingual speech.
In one week, Suno v5.5 added voice cloning, Udio pivoted to a walled garden, Google launched Lyria 3 Pro for developers, and Spotify built artist protection tools, fracturing AI music into competing visions.