AI News Hub Logo

AI News Hub

I Taught a Browser to Play Piano — Here's How It Figures Out Which Finger Goes Where

DEV Community
monkeymore studio

Learning piano is hard enough without guessing which finger hits which key. Most sheet music doesn't bother telling you, and when it does, the fingering is often generic or just plain wrong for your hand size. I got tired of that, so I built a tool that reads a MusicXML score, runs a full physics-and-biomechanics simulation in your browser, and shows you exactly how your hands should move across the keys. No servers. No uploads. Just drop a .xml or .mxl file into the page and watch your browser plan out every finger placement in real time. You can try it yourself on our free piano finger visualization tool. You might think: "Isn't this the kind of heavy computation that belongs on a server?" Turns out, keeping everything client-side has some serious advantages. Musicians work on original compositions, unpublished arrangements, and copyrighted material. Uploading a MusicXML file to a remote server means trusting someone else with your intellectual property. When the entire analysis runs inside your browser, the file never leaves your device. Not even a byte gets transmitted. No network round-trips means no loading spinners while a server crunches numbers. A typical score with a few hundred notes gets analyzed in under a second on a modern laptop. The animation starts the moment you hit "Process Score." Once the page loads, you can use it without an internet connection. Great for practice rooms with spotty Wi-Fi or flights where you want to review fingerings for upcoming repertoire. No software to install, no plugins, no DAW integration. If you have a web browser, you have a fully functional piano fingering analyzer. Here's the bird's-eye view of what happens from the moment you drop a file to the moment the animation starts playing: The heavy lifting happens in a custom JavaScript port of the pianoplayer engine, adapted to run entirely in the browser. Let's dig into each stage. MusicXML is the standard exchange format for digital sheet music. It's XML-based, often zipped (.mxl), and can be surprisingly messy. Our parser lives in musicxml_io.js and handles the entire pipeline from raw bytes to structured note events. Every note in the score gets normalized into an INote object: export class INote { constructor() { this.name = null; this.isChord = false; this.isBlack = false; this.pitch = 0; this.octave = 0; this.x = 0.0; // physical key position in cm this.time = 0.0; // onset time in seconds this.duration = 0.0; this.fingering = 0; // assigned finger (1-5) this.measure = 0; this.staff = 0; } } The x field is crucial. It maps a MIDI pitch to a physical position on the keyboard in centimeters, measured from a reference point. This lets us reason about actual hand spans and finger distances rather than abstract semitone counts. The parser walks the MusicXML DOM tree, extracts elements, resolves tags with accidentals, and builds a timeline of EventInfo objects. It also handles duplicated notes that sometimes appear in poorly exported scores — we keep the longer duration and drop the rest. function _pitch_from_note(noteEl) { const pitchEl = elFindDirect(noteEl, "pitch"); const step = elFindDirect(pitchEl, "step").textContent.trim().toUpperCase(); const alter = parseInt(elFindDirect(pitchEl, "alter")?.textContent || "0", 10); const octave = parseInt(elFindDirect(pitchEl, "octave").textContent, 10); const semitone = _STEP_TO_SEMITONE[step] + alter; const midi = (octave + 1) * 12 + semitone; return new PitchInfo(_note_name(step, alter), octave, midi); } Once we have the events, noteseq_from_part converts them into the INote sequence used by the optimizer. Chords get special handling: notes sharing the same onset are grouped, and each chord note gets a tiny time offset (default 50ms) so the optimizer treats them as simultaneous but distinguishable events. This is where things get interesting. Assigning fingers to notes isn't just about "thumb on C, middle finger on E." A good fingering minimizes hand movement, avoids awkward stretches, respects the natural strengths of each finger, and stays within the player's physical reach. We model each hand as a Hand object with biomechanical constraints: export class Hand { constructor(noteseq, side = "right", size = "M") { this.LR = side; this.frest = [null, -7.0, -2.8, 0.0, 2.8, 5.6]; // relaxed finger positions (cm) this.weights = [null, 1.1, 1.0, 1.1, 0.9, 0.8]; // finger strength weights this.bfactor = [null, 0.3, 1.0, 1.1, 0.8, 0.7]; // black-key comfort factors this.size = size; this.hf = Hand.size_factor(size); // hand-size multiplier this.max_span_cm = 21.0 * this.hf; this.max_follow_lag_cm = 2.5 * this.hf; this.min_finger_gap_cm = 0.15 * this.hf; } } Hand sizes range from XXS to XXL, scaling the maximum comfortable span from about 7 cm up to 25 cm. A child with small hands gets very different fingerings than an adult with large hands. For each note, we need to pick a finger (1–5). The naive approach would try all 5 possibilities for every note, but that explodes exponentially. Instead, we use a sliding window with depth-limited backtracking search. The algorithm looks at a window of upcoming notes (up to 9, automatically adjusted based on note density) and tries every valid fingering combination: optimize_seq(nseq, istart) { const u_start = istart === 0 ? [...this.fingers] : [istart]; let best_fingering = [0, 0, 0, 0, 0, 0, 0, 0, 0]; let minvel = 1.0e10; const candidate = [0, 0, 0, 0, 0, 0, 0, 0, 0]; const backtrack = (level) => { if (level === depth) { const velocity = this.ave_velocity(candidate, nseq); if (velocity 0 && this.skip(candidate[level - 1], finger, nseq[level - 1], nseq[level], ...)) { continue; } candidate[level] = finger; backtrack(level + 1); } }; backtrack(0); return [best_fingering, minvel]; } The skip function is the secret sauce. It eliminates physically impossible or ergonomically terrible transitions before they waste computation time: Same finger on two different consecutive notes? Skip. Crossing fingers in the wrong direction (e.g., finger 3 moving left of finger 4 on the right hand)? Skip. Impossible stretches inside a chord? Skip. Thumb hitting a black key while moving upward? Usually skip. This pruning turns an exponential search into something that finishes in milliseconds. For each valid fingering sequence, we compute an "average finger velocity" — essentially, how much effort it takes to move the hand into position: ave_velocity(fingering, notes) { let vmean = 0.0; for (let i = 1; i this.max_follow_lag_cm) finger_positions[j] = target + this.max_follow_lag_cm; else if (lag this.max_span_cm) { // Clamp outer fingers toward center } } Once the optimizer produces a fingerseq — a timeline of finger positions for every note — we need to visualize it. The UI is a React client component that renders everything on a element. The keyboard is drawn from scratch for every frame. White keys are rectangles; black keys are slightly offset and shorter. We map MIDI pitches to horizontal positions using a constant key width (KEYBSIZE = 16.5 cm per octave): function pitchToX(pitch) { const octave = Math.floor(pitch / 12) - 1; const pc = pitch % 12; const names = ["C", "C#", "D", "D#", "E", "F", "F#", "G", "G#", "A", "A#", "B"]; const name = names[pc]; const whitePos = { C: 0.5, D: 1.5, E: 2.5, F: 3.5, G: 4.5, A: 5.5, B: 6.5 }; const blackPos = { "C#": 1.0, "D#": 2.0, "F#": 4.0, "G#": 5.0, "A#": 6.0 }; const step = name in blackPos ? blackPos[name] : whitePos[name]; return octave * KEYBSIZE + step * (KEYBSIZE / 7); } The canvas uses DPR scaling (device pixel ratio) so it looks crisp on Retina displays. Each finger is drawn as a colored line from the root of the hand down to the key surface. Engaged fingers (currently pressing a key) are drawn fully opaque and hover slightly lower; relaxed fingers are semi-transparent and raised: function drawFingers(ctx, hand, color, scaleX, minX, canvasHeight) { const fingers = { 1: { tipOffset: 30, wid: 15 }, // thumb: longer, thicker 2: { tipOffset: 10, wid: 10 }, 3: { tipOffset: 0, wid: 10 }, // middle: shortest reach 4: { tipOffset: 12, wid: 9 }, 5: { tipOffset: 26, wid: 8 }, // pinky: thin, awkward reach }; for (let f = 1; f { if (!playingRef.current) return; const elapsed = ((performance.now() - t0Ref.current) / 1000) * speedRef.current; updateHand(rhRef.current, elapsed); updateHand(lhRef.current, elapsed); drawFrame(elapsed); animIdRef.current = requestAnimationFrame(loop); }; The updateHand function uses two index pointers — press_idx and release_idx — to efficiently track note onsets and offsets without scanning the entire sequence every frame. Visuals are great, but hearing the notes helps you internalize the timing. We generate audio using the Web Audio API — no MP3s, no SoundFonts, just raw oscillators: const playNote = (pitch, duration) => { const freq = 440 * Math.pow(2, (pitch - 69) / 12); const osc = audioCtx.createOscillator(); const gain = audioCtx.createGain(); osc.type = "sine"; osc.frequency.setValueAtTime(freq, now); gain.gain.setValueAtTime(0.3, now); gain.gain.exponentialRampToValueAtTime(0.001, now + duration); osc.connect(gain); gain.connect(audioCtx.destination); osc.start(now); osc.stop(now + duration); }; It's a simple sine wave with an exponential decay envelope. Not concert-grand realism, but perfectly adequate for following the melodic line and checking rhythm. There's also a metronome click (square wave at 1200 Hz) that ticks along at the current BPM, complete with a little CSS pendulum animation on the UI. Piano scores often combine both hands in a single part with two staves. The engine automatically routes staff 1 to the right hand and staff 2 to the left hand. For the optimizer, left-hand logic is symmetric: we simply mirror the x-coordinates (anote.x = -anote.x), run the same optimization, then flip everything back for display. Users can also override this and process only one hand, or specify a custom measure range if they only want to practice a difficult passage. The complete pipeline from file drop to animated playback looks like this: The magic isn't any single breakthrough — it's the combination of several pragmatic choices: Biomechanical modeling beats rule-based heuristics. By simulating actual finger positions and hand spans, we get fingerings that feel natural rather than theoretically "correct." Depth-limited backtracking with aggressive pruning gives us optimal-ish results without waiting for a GPU cluster. Most scores process in under a second. Client-side execution removes every privacy, latency, and availability concern. The tool works anywhere, instantly. Canvas + Web Audio keeps the stack simple. No WebGL, no WASM audio engines, no external dependencies beyond the parser itself. Try It on Your Own Scores Got a MusicXML file sitting around? Whether it's a Bach prelude, a pop lead sheet, or your own composition, you can see exactly how your hands should navigate the keyboard. Head over to our free piano finger visualization tool and give it a spin. Upload your score, hit Process, then press Play. You'll see your fingers dance across the keys in real time — all without a single packet leaving your browser.