building a peer-to-peer piano platform: pt 2

the conductor problem

In the last part we discussed how orchestras achieve synchronized sound across musicians and physical space, and how this is a shared problem in p2piano. We designed a peer-to-peer network to reduce the delay between participants and essentially shrink the amount of physical space our orchestra takes up. As helpful as that work is, we still need to synchronize audio with the remaining delay.

Let’s first dive back into this problem in the context of a symphony orchestra. Orchestras are generally laid out in a series of curved rows.

an illustration of an orchestra from above, with hand drawn circles representing the musicians

Depending on where a musician is in this physical layout, the more or less time it takes for their sound to reach the conductor. For instance, the sound from the percussionists at the back of the orchestra takes considerably longer to reach the conductor than a cellist in the first row. In this example, 35ms longer.

a timing diagram showing the path of sound from percussionists to conductor takes 35ms longer than from cellists

Now, we remember from the first part that the conductor helps out in this case by giving a clear visual signal of time with their baton. This provides the percussionist the information they need to play early, ahead of the cellist. Here’s where our dilemma begins. p2piano doesn’t have a conductor, and we can’t predict when someone will play. Without the ability to anticipate what’s to come, we must instead lean on our ability to react to what’s already happened.

In p2piano, by the time we know that a peer has started a note, some time has already passed. The note has been played, captured by their device, sent across the internet, and received by us. This journey takes precious time that we just can’t get back. Let’s imagine both users start a note at the same time, and it takes 20ms to communicate over the internet. We can see that our notes will sound 20ms apart.

a timing diagram showing two peers playing notes simultaneously but hearing them 20ms apart

Our solution is to create a shared sense of “now” that’s slightly in the past. Think of it like a live TV broadcast with a small delay, everyone experiences the same moments together, just not at the exact instant they originally happened. In p2piano, we maintain a synchronized timeline that runs slightly behind schedule. This gives us the buffer we need to work with.

So, when we find out a peer has started a note, we play the sound immediately because it’s already 20ms behind. However, when we play our own notes, we wait for 20ms for it to sound to align with our shared timeline. Here’s what our note timeline looks like now:

a timing diagram showing how delaying local playback by 20ms synchronizes the experience for both peers

The question then becomes, at what point is the delay in our shared time noticable? In testing with musicians, we started to notice a little above 30ms and had issues above 40ms. This is in line with what Washburn, Wright, Chafe, and Fujioka found in their research paper Temporal Coordination in Piano Duet Networked Music Performance (NMP): Interactions Between Acoustic Transmission Latency and Musical Role Asymmetries.

This synchronization approach works well for two peers if we know what the delay will be ahead of time. A few interesting challenges naturally follow: how do you measure and coordinate timing across many participants, each with different and constantly changing network conditions? What do you do when someone’s connection is just too slow to play well with others? And how do you achieve precise timing on the web, where browsers and JavaScript execution can introduce their own unpredictable delays?