On May 14, 2026, a team of twelve researchers from Sony AI and collaborating institutions published Break-the-Beat!, a neural synthesizer that converts drum MIDI patterns into high-quality audio while matching the timbre of a reference drum recording. The system fills a gap in AI audio research: most symbolic-to-audio tools handled single-instrument synthesis, leaving polyphonic percussion largely unsolved.
What Happened
The Break-the-Beat! team, led by Shuyang Cui with researchers including Zachary Novack, Woosung Choi, and ten co-authors from Sony Group Corporation, fine-tuned a pre-trained text-to-audio foundation model using a custom content encoder and a hybrid conditioning mechanism. The system accepts two inputs: a drum MIDI sequence and a reference drum audio clip. It outputs rendered drum audio that follows the rhythmic structure of the MIDI while adopting the sonic character of the reference.
A live demo is available with dozens of generated samples spanning Speed Metal, Funk Rock, Live Fusion, and electronic drum kits. Tempos in the demo range from 75 to 158 BPM, covering production styles from hip-hop to metal.
Why It Matters for Music Producers
Every producer working with MIDI drums faces the same bottleneck: the sonic character of your drums is limited to whatever samples you own. Break-the-Beat! separates rhythmic structure from timbre. You define the groove via MIDI and the sound via any reference audio recording.
This has practical implications across multiple production contexts:
- Rapid prototyping: Evaluate a groove against multiple timbres without switching drum kits
- Vintage sound design: Use a vinyl drum recording as reference to synthesize modern MIDI patterns with analog character
- Consistent session sound: Define a reference once and generate all drum tracks with the same unified sonic identity
- Rhythm-only mode: Tap a beat pattern and render it immediately without building a full MIDI arrangement
Duration scaling from 0.5x to 2.0x while preserving timbre consistency solves a common arrangement problem: adapting drum loops to different section lengths without pitch artifacts or awkward pattern repetition.
How Break-the-Beat! Works

The model fine-tunes a pre-trained text-to-audio foundation model with two core components. A content encoder extracts rhythmic structure from the MIDI input. A hybrid conditioning mechanism combines the MIDI content with embeddings from the reference audio, ensuring both timing and timbre influence synthesis. Training used a newly curated dataset of paired target-reference drum audio samples built specifically for this task.
Unlike text-to-audio systems such as AudioLDM and Stable Audio, which generate audio from text prompts with no rhythmic precision, Break-the-Beat! is driven directly by MIDI, preserving exact timing relationships across the full polyphonic drum arrangement. Temporal resolution is adjustable at 16th, 32nd, or 64th note precision, with 16th-note resolution sufficient for most production workflows.
Comparing Drum Synthesis Approaches

| Method | Input | Timbre Control | Rhythmic Precision | Status |
|---|---|---|---|---|
| Sample-based (Battery, Superior Drummer) | MIDI | Fixed library | Exact | Production-ready |
| Physical modeling (DrumSynth) | MIDI + parameters | Parametric | Exact | Production-ready |
| Text-to-audio (AudioLDM, Stable Audio) | Text prompt | Prompt-based | Low (no MIDI) | Production-ready |
| Break-the-Beat! | MIDI + reference audio | Timbre-matched reference | High (MIDI-driven) | Research demo |
Workflow: Integrating Break-the-Beat! Into Your DAW

Here is a step-by-step workflow for using Break-the-Beat! with any modern DAW today:
- Build your drum pattern in MIDI. Lay out kick, snare, hi-hat, and toms in Ableton, Logic, FL Studio, or your DAW of choice. Export as a standard .mid file.
- Select your reference audio. Choose any drum recording with the timbre you want: a field recording, a sample from your library, or a drum loop whose sound you like but whose pattern does not fit your arrangement.
- Run the model. Use the Break-the-Beat! demo page to render. The research team plans to release code for local runs; check their project page for updates.
- Set your duration multiplier. Enter 1.0 for the original pattern length, 0.5 to halve it, or 2.0 to double. Timbre consistency is preserved across all duration settings.
- Set temporal resolution. Use 16th notes for most production contexts. Move to 32nd or 64th note resolution only for technically demanding patterns with fine-grained timing requirements.
- Import and layer. Bring the rendered audio back into your DAW. Align it to the grid, layer with other elements, and process as you would any recorded drum audio.
Creator Outcome
Break-the-Beat! shifts drum sound design from library management to reference curation. Instead of browsing thousands of samples, producers work with a small set of reference recordings that define sonic character. A lo-fi hip-hop producer can use a cardboard box recording as reference and render any MIDI pattern with that character. A metal producer can use a vintage studio session recording to give modern MIDI patterns analog tonality.
The rhythm-only MIDI mode shortens the iteration cycle between idea and sound. Tap a beat on your controller, set a reference, and hear the groove before committing to a full arrangement. This reduces the friction between creative instinct and evaluable output.
What to Do Next
Listen to the demo examples on the project page, paying attention to timbral consistency across pattern types and genres. The Speed Metal and Live Fusion examples show the model handling high rhythmic precision with complex polyphonic synthesis. Code and dataset materials are planned for public release. Full methodology details are in the official arXiv paper.
Frequently Asked Questions
Can I use Break-the-Beat! with electronic drum sounds?
Yes. The demo includes an Ele-Drum category showing the model working with electronic drum timbres. The hybrid conditioning approach is not restricted to acoustic kits.
Does it work with any drum recording as reference?
Any audio recording with drum content can serve as the reference input. The model extracts timbral characteristics and applies them to the MIDI-specified rhythmic pattern. Reference recording quality will affect output quality.
Is the code available yet?
The team indicated plans to release code and dataset construction materials. The demo is available now for evaluation. Watch the research team GitHub for the release announcement.
How does Break-the-Beat! compare to a sampler with velocity layers?
Drum samplers with velocity layers produce deterministic output based on your sample library. Break-the-Beat! is generative: it does not reproduce the reference exactly but adapts its timbre to each new pattern. This produces natural variation that can be desirable in production contexts but reduces predictability compared to traditional samplers.
What tempo range does it support?
The demo examples range from 75 to 158 BPM, covering hip-hop, funk, rock, and metal production ranges. The paper does not specify hard tempo limits, so behavior at extreme tempos is not documented.
Will it work with orchestral percussion or only drum kit patterns?
The research focused specifically on drum kit synthesis. The model architecture is designed for polyphonic percussion in kit format. Orchestral percussion instruments such as timpani or marimba are outside its tested scope.