Notes from Socket.io Background

 

These are my notes from the socket.io talk on July 2, 2012 at NodeConf.

I'm going to talk about what motivated me to start a project like socket.io. Back in 2008 or so I got interested in this app framework called AppJet which created an app called EtherPad, a real-time collaborative text editor. During this time I was developing a passion for collaborative text editing. This project was what inspired me to become an early adopter for node.js. At the time, Google Docs used an algorithm that emulated real time, but didn't allow people to edit on the same line.

Etherpad did two things differently. It used JavaScript on the server, but it was doing long-polling. Basically you would use JavaScript as a PHP-like template language, whereas Etherpad was doing long-polling on top of Rhino. This was unlike any implementation of server-side JavaScript that I had ever seen. Secondly, EtherPad was truly making use of JavaScript on the client; they achieved the benefits of using a single language on the client and server.

Let's go on a tangent. When you do anything related to real-time collaboration, we're talking about state synching. In order to merge text changes and produce a result that is convergent, you can use an algorithm called OT. Google Docs allows two people to collaborate on the same line of text. Each user creates a change set based on a certain shared revision, and then publishes the change set to other users. These changes can be encoded as JSON. What we have here is intention preservation. To achieve this, the server needs to take the three operations and transform them somehow. In order for the server to do this, it needs to be smart and realize that they operations happen concurrently. The result that we end up with is that we transform the subsequent operations according to the arithmetic operations based on the earlier changes. JavaScript allows you to use the same codebase to execute these operations whether it's on the server or the client.

But, long-polling is awful! The only alternative at the time was iframe streaming. Gmail was getting a lot of popularity at the time, and they found out all these hacks that weren't documented at the time to sort out how you implement ajax on IE and access all the ActiveX APIs. At the time, this was the only alternative to long-polling, where we didn't need to display all these progress bars in the browsers. Comet wasn't any better; it enforced a paradigm without a really clean API.

Etherpad was acquired by Wave, which leveraged operational transformations in a way that's even more complicated than what I explained because they enabled rich-text editing. They added the notion of XML on top of the existing algorithm. They probably made the same realization that I made, which is that long polling is just terrible for us. The web needed a workaround.

Ian Hickson started working on websocket, which started as incremental drafts, which took after the work of the creator of Orbited, to bring TCP to the web directly. Eventually websocket gets recognized and documented in RFC 6455. It's a simple API which inspired what eventually became socket.io.

The number one problem that we have today is that we have a lot of different protocols in each browser. Different browsers implement different drafts. Even today, the only browsers that implement the RFC completely are Chrome and IE 10. iOS 4.2, Safari and Opera are on older, less secure protocols. These protocols now have to be supported on both ends. FireFox uses MozWebSocket :(

To solve this mess I developed a project called websocket.io, which ensures that it works under all circumstances. On the server side, you can look at the headers that you're getting and support different drafts pretty much transparently.

Meanwhile, in the real world we still need a cross-browser, cross platform thingy that works everywhere and beats anti-virus software. Socket.io was born. We added JSON, events, multiplexing, and more. Eventually it became almost like a standard as more and more people started using it. Gree, Zynga, Yammer, Zendesk, Trello, Flightaware, Cloud9, and other really cool companies use it. Being deployed in so many places, we need to be really smart about what we do to it. In 1.0, a lot of work is going into making sure we don't make any mistakes.

Engine.io is currently the most important thing happening in the socket.io realm; the first stable release is going to be tagged today. It has very clear goals:

It's going to upgrade instead of fallback. We'll long poll everything. It connects right away on every browser and every network. The idea is to connect with long polling and connect with sockets on the side. Engine.io takes after socket.io in the sense that it connects to a server.

Did you enjoy this post? Please spread the word.