Song-Ze (Jimmy) Yu

Song-Ze Yu 游 松 澤  ·  Jimmy

An engineer & a musician.

Profile

An engineer, a musician.

I'm a Computer Science student from UC Berkeley and National Tsing Hua University, working at the intersection of machine learning for music and audio, LLM alignment, and HCI.

My research lives between BAIR under Prof. Trevor Darrell and David M. Chan, and the CNMAT under Prof. Carmine-Emanuele Cella.

My long-term goal is to build audio-language models that can truly hear and understand music. This exposes a brand-new human computer interaction method to musicians, to discuss, collaborate, or even interact with non-musical audiences.

Before research, I was a child musician: 24 solo recitals and more than 70 TV appearances across China and Taiwan between 2009 and 2016. I still play, write, and produce. That dual lens of engineer and musician is the through-line of everything I build.

Publications & Research

My research taste.

Music AI has never looked more promising nowadays: companies like Suno, Udio, and Ace Studio show how generative models may reshape music itself. However, I'm deeply concerned that much of this field is moving toward replacing the humanity, intention, and creativity from the soul of a musician.

I tried to reflect what was missing and how to leverage the advantage of this exploding generative AI era. My research starts from three missing pieces:

Controllability.

Many models nowadays can produce impressive zero-shot results, but giving very little control to musicians.

Music understanding.

Many models learn patterns at latent space level from large-scale internet data, but still lack of grounded understanding from “perceptual level” (pitch, rhythm and music structure), to “hidden level” (the intention from the composer, the expression from the player, and the emotion from the audience).
But these are how we bring music from random permutation of notes to life.

Humanity.

Music is the most abstract language we have. There is no ground truth hidden in the notes or audio signal itself. The same musical phrase can bring huge variance between individuals. This leads me to a research question I care deeply:

can ALMs model and align human preferences?

Together, these questions shape my long-term research direction: building a unified audio-language model that can truly hear, understand, and communicate about music, enabling new forms of HCI between musicians, computers, and audiences.

Under Review — 2026

MuNo-LM: Music Notation for Language Models — Unifying Score and Performance for Audio-Language Modeling Under Review

Milan Liessens Dujardin, Song-Ze Yu.

ISMIR 2026 · Center for New Music & Audio Technologies (CNMAT), UC Berkeley

  • A text-based representation unifying score-level info (harmony, phrasing, dynamics, voicing, articulation) with time-aligned audio-perceptual features.
  • Format LLMs can directly consume; bridges symbolic notation and recorded performance.
  • Plus an automatic caption-and-QA generation pipeline for time-grounded music understanding in ALMs.

PitchBench: Measuring Pitch Hearing in Audio-Language Models Under Review

Milan Liessens Dujardin*, Song-Ze Yu*.  (*equal contribution)

NeurIPS 2026 — Datasets & Benchmarks Track · BAIR Lab, UC Berkeley

  • 28 experiments measuring pitch hearing in audio-language models: absolute & relative perception, sequences & chords.
  • Varies loudness, note duration, sound source, time stretching, background noise, and notation format.
  • Frontier ALMs perform consistently poorly: current models do not yet possess stable pitch perception.
  • Released as a Python package with evaluation data + generation tools.

In Preparation — planned 2026

Anamnesis: An Open-Source Platform for Large-Scale Backstory-Conditioned Survey Simulation EMNLP Demo · Planned

Song-Ze Yu, Joseph Suh, Yutong Bai, Serina Chang, David M. Chan.

EMNLP 2026 Demo · BAIR Lab, UC Berkeley

  • Open-source platform for demographically-controlled multimodal survey simulation & virtual population studies.
  • Achieved closer alignment to ground-truth population distributions than individual human respondents on ATP surveys.
  • Originally submitted to ACL 2026 Demo; being extended for EMNLP 2026 Demo.

InstructFX2FX: A Multi-Turn Text-to-Preset Demo for Iterative Audio Effect Refinement DAFx · Planned

Song-Ze Yu, Milan Liessens Dujardin, Yuxuan Cai, Wantong Zhang.

DAFx 2026 Demo · CNMAT Lab, UC Berkeley

  • Interactive demo for multi-turn, text-guided audio effect refinement; engineers iteratively refine effects through natural-language instructions.
  • Hybrid LLM + CLAP-optimization architecture: the LLM picks effects & initial parameters; CLAP-guided optimization perceptually refines them.
  • Outperforms LLM-only reprompting on directional descriptor pairs (e.g. “less bright / more warm”).

Earlier Work — 2025

Research · Industry · Leadership

Where I've worked.

Research — I.

Berkeley AI Research Lab Researcher

BAIR, UC Berkeley · advised by Prof. Trevor Darrell & David M. Chan

  • Co-first author on PitchBench (NeurIPS D&B 2026, under review): benchmarking pitch hearing in audio-language models.
  • Lead author on Anamnesis (EMNLP 2026 Demo, planned): open-source platform for backstory-conditioned survey simulation.

CNMAT — Center for New Music & Audio Technologies Research Lead

CNMAT, UC Berkeley · advised by Prof. Carmine-Emanuele Cella

  • Author on MuNo-LM (ISMIR 2026, under review): unifying score and performance for audio-language modeling.
  • Lead author on InstructFX2FX (DAFx 2026 Demo, planned): multi-turn text-to-preset audio effect refinement via LLM + CLAP optimization.

AHG Music Lab Undergraduate Researcher

National Tsing Hua University · advised by Prof. Yi-Wen Liu

  • Built a ReaScript-generated dataset of piano recordings with varied multi-band EQ parameters.
  • Trained a supervised neural network to predict EQ parameters from reference audio DSP features (MSE 0.0216).
  • Published as the VTR model (arXiv 2509.24404), later shipped as a JUCE plugin.

Industry — II.

Positive Grid Machine Learning Intern

Guitar amp modeling & audio ML in production.

Scrum Master JUCE · VST PyTorch
  • Served as Scrum Master for Amp-AI-SaaS, an agentic workflow architecture for Positive Grid's next-generation products.
  • Built two JUCE-based VST plugins and integrated my capstone audio-to-preset model via a PyTorch inference pipeline.
  • Gained exposure to source separation and Zero-Shot Virtual Amplifier (VA) Modeling approaches used in the music industry.

Open Source & Leadership — III.

Berkeleytime ML Pod Lead & Full-Stack Engineer

ASUC's flagship student platform · 122K+ monthly users at UC Berkeley.

Pod Lead FastAPI · BGE · FAISS React · GraphQL · Redis
  • Led development and launch of Berkeleytime's new Explore page, serving 122K+ monthly users.
  • Manage a 10-member Machine Learning Pod: driving technical strategy, engineering execution, task allocation, and code reviews across the team.
  • Unblocked production deployment of semantic search by leading migration from FAISS to Redis + MongoDB Vector Search, improving scalability and operability.
  • Built the AI semantic search service (BGE + FAISS) as a FastAPI microservice integrated with the Node.js / Docker / GraphQL stack.
  • Reduced GraphQL first-load latency from 25s → 1s through pagination and Redis caching; shipped 20+ urgent PRs.

Pianist · Composer · Producer

A life in music.

I learned piano by ear. My first competition (a national piano contest in Taiwan), I won first place without a teacher. What followed, between 2009 and 2016, was a child-star decade: more than 70 TV appearances, 24 solo piano recitals across mainland China and Taiwan, three on-stage moments with Jay Chou, and a featured guest spot at Yen-J's Taipei Arena concert.

I never stopped writing. I have 20+ original songs, and my single “失電陣線聯盟” crossed 14.4K plays on Instagram. In college I played keys, lead vocal, cajón, and electric guitar across several bands; whatever the song needed, I'd fill in. If it's keyboard, I don't need rehearsal: one listen and I can improvise the part on stage.

24 Solo Recitals
70+ TV Shows
20+ Originals
Song-Ze Yu

Awards & Hackathons

Selected awards.

Haven interior design app Haven — award ceremony
Oct 2025 · Berkeley, CA
Claude × a16z × Berkeley M.E.T. Makeathon
Best Design

Built Haven, a room-scanning interior design app, in 5 hours.

Travelity travel planner app Travelity — Meichu Hackathon award
Oct 2024 · Hsinchu, Taiwan
Meichu Hackathon 2024
2nd Place (Google) · NT$16,000

Travelity: AI personalized travel planner.

ccDiary mental health journaling app ccDiary — SITCON Hackathon award
Jul 2024 · Taipei, Taiwan
SITCON Hackathon 2024
1st Place

ccDiary: AI mental-health journaling app with anonymous social features.

WorldQuant Challenge 2024
May 2024 · Remote, Taiwan
WorldQuant Challenge 2024
Silver Level

Quantitative alpha research from academic papers and social-media signals.

Code & Side Projects

Selected projects.

A subset of work from github.com/vaclisinc. Roughly grouped: audio & ML, hackathon products, open source, full-stack, and hardware. The full list lives on GitHub.

Audio · ML · Music — I.

VTR-SmartEQ

JUCE · C++ · PyTorch · ReaScript

A JUCE-based VST plugin integrating my audio-to-preset (VTR) model for real-time EQ parameter prediction inside any DAW. Companion to the arXiv paper.

github.com/vaclisinc/VTR-smartEQ →

Vaclis Tone Replication

Python · PyTorch · DSP

The underlying audio-to-preset model: predicts multi-band EQ parameters directly from a reference recording. Trained on a ReaScript-automated piano dataset.

arXiv 2509.24404 →

InstructFX2FX

Python · CLAP · LLM

Interactive demo for multi-turn, text-guided audio effect refinement. Parses feedback like “less bright / more warm” into directional updates in CLAP embedding space.

github.com/vaclisinc/InstructFX2FX →

MLB Pitch-Type Prediction (RNN)

Python · PyTorch · RNN

An RNN-based pitch-type prediction system analyzing MLB pitching sequences to uncover patterns and refine batting strategy. Bridge between baseball analytics and sequence modeling.

github.com/vaclisinc/Pitch-Type-Prediction-Using-RNN →

Hackathon Products — II.

ParkFlow

Vue 3 · Flutter · Taipei Parking API

Real-time parking, navigation, and notifications for Taipei. Selected for integration into the official TownPass super-app (3M+ downloads). NT$100,000 prize.

github.com/vaclisinc/ParkFlow →

Travelity

Flutter · Python · Gemini · Google Maps

AI personalized travel planner built around interests, personality, and real-time data. 2nd Place (Google) at Meichu Hackathon 2024.

github.com/vaclisinc/Travelity →

ccDiary

Flutter · Python · OpenAI · LangChain

An AI-powered mental-health diary with anonymous social features. 1st Place at SITCON Hackathon 2024: built with the team to make journaling less lonely.

github.com/SimonLiu423/cc_diary →

Open Source — III.

Berkeleytime

TypeScript · GraphQL · MongoDB · Redis

ASUC's flagship student platform serving 122K+ monthly users at UC Berkeley. I lead the ML pod and have shipped 20+ PRs to the production codebase as a full-stack engineer.

my PRs →

Claude Code Remote

JavaScript · IMAP / SMTP

Contributed an execution-trace notification system and terminal-style email UI; fixed a critical self-reply loop. The contributions trended on X with 126K+ views.

github.com/vaclisinc/Claude-Code-Remote →

Full-Stack & Hardware — IV.

InningIQ

React · Python · FastAPI · Firebase

A web platform of subservices for amateur baseball: including jyBaseball, a community-driven digitization tool. Built with the NTHU baseball team and Taiwanese baseball YouTubers.

github.com/vaclisinc/InningIQ →

PilotClean

Verilog · FPGA

An FPGA dual-mode cleaning robot combining manual flexibility with autonomous cleaning. A full hardware/software co-design exercise on an Altera board.

github.com/vaclisinc/PilotClean →

RecyScore

Python · Raspberry Pi · AWS

IoT-based interactive recycling reward system using a Raspberry Pi camera and AWS cloud services. Sustainability hack with real hardware deployed in a campus pilot.

github.com/vaclisinc/RecyScore →

Get in touch

Let's talk.

Actively seeking 2027 Fall PhD positions.

I'm happy to talk about audio-language models, music technology, hackathon ideas, piano performance, or anything in between. The fastest way to reach me is email.

vaclis@berkeley.edu