Notes from Streaming at Scale

 

These are my notes from the Streaming at Scale talk given July 3, 2012 at NodeConf 2012.

Streams are useful to achieve backpressure. You've got some work to do and you can't get it done in time; what do you do? Streams are supposed to help you solve this problem, and they do help. They're better than nothing. But the abstraction kind of hides a big problem. Streams are fire and forget. You create a stream, pipe it, and then leave it alone. However in the real world, all sorts of things are conspiring against you. If something gets stuck, that' s not really exposed in the Stream/pipe API. The actual pause/resume buffering logic isn't always ideal in the real actual internet.

These days, the actual internet is all about mobile phones. On the actual internet, you can't control anything between your server and the user's phone, from helpful proxies and routers and buffers. Like anything interesting in life, this is about tradeoffs.

It's easy to schedule new work and not think about when it will be done. This is one reason is that it's fun! Network programming is hard, and you get to do it in JavaScript and it makes a lot of it easier! Node is sort of like a credit card. You buy things and don't have to deal with change, and keeping cash with you. If you don't pay your bill at the end of the month you get socked with finance charges. Node is like this! It's hard to know if you're getting in over your head, which I think is the hardest problem with distributed systems in general.

What if your disks are too busy? What if your streams are stacking up and no data events are being emitted? Maybe you're stacked up behind Redis? All these queues can also use up memory. I'm not sure exactly what to do on the disk issue, but you can deal with many of the other issues.

Redis
I put in this crude high water/low water mark where your Redis operation will return false if there are too many pending requests which are unacknowledged. It's tricky because people assume Redis will always work. Even though this true/false thing is in there, it's advisory.

HTTP
Node's HTTP client API does not make concurrency limits very easy. You can http.request() your way right into a huge concurrency problem. You might be tempted to change maxSockets, but this just moves the problem around. If it's too low things will get queued up inside of Node, if it's too high you can move out of memory or exhaust other resources. It's a tricky problem. We have a library to help us manage this. We have to deal with connections timing out, and there's no obvious way to affect this. Coulee will manage your sockets.

Hard problems are very hard. This is a hard problem with no obvious easy answers! There are some things you can do here and there, but you really have to understand the internals of what's happening in node to even approach this problem. It almost violates the abstraction that node provides; you have to build another layer in places to remember how much work you're actually doing. The subtlety of when things break and how they break is hard, which is part of the reason that we write most of our own libraries.

Did you enjoy this post? Please spread the word.