Why is Twitter using XMPP/Jabber protocols?

May 21st, 2008 by mgkimsal Leave a reply »

This is probably going to make me look dumb, but that’s never stopped me from posting before.  Was reading another article this morning about how Twitter’s been down *again*, which got me to thinking about other options, then to think more about the technical hurdles facing Twitter.  One thing I’ve read a few places is that Twitter uses XMPP (Jabber’s protocol) for pushing data around.  I’m over simplifying here, partially because that’s about as much as I know :)   I was able to get Twitter to send me an XMPP subscription to their full feed, and I set up a jabber server to listen and collect the data for awhile.  Interesting concept, and perhaps that’s why they did it that way – to allow for a truer ‘publishing’ model.  However, it seems that the overhead of that process might be causing them problems.

Am I too naive in thinking that straight polling would be easier, more scalable and perhaps more efficient?  Publishing involves having to keep track of listeners and sending information (writing) multiple times.  A straight polling approach would convert the app to more of a ‘read’-heavy app – something which most databases are very good at.  Why is XMPP in the mix at all at Twitter?

Again, yes, there’s probably a perfectly good technical rationale for this, but since the technical infrastructure has seemd to remain largely a secret (yes we know they use Ruby on Rails and that couldn’t possibly have any bearing on their scaling issues because they’ve solved that scaling issue over and over…) I’m just guessing here.  Can someone more informed than me shed more light on this?  Thanks.  :)

Share and Enjoy:
  • del.icio.us
  • DZone
  • Facebook
  • Reddit
  • StumbleUpon
  • Digg
  • Simpy
  • Technorati
Advertisement

3 comments

  1. IF they were running XMPP in the backend, scaling it would be trivial (just partition the userbase over many domains with a hashing function and make that handled by separate servers), so I very much doubt they do depend on it for their core infrastructure. More likely they just gateway to it for external use.

    In fact, there are no good reasons for Twitter to go down as much as they do – their scalability is trivially solved by partitioning (even if it leads to more data duplication ultimately, because you’d need to duplicate “tweets” on each partition where there’s a follower for it to be effective). It’s just not a hard problems to scale the moment you’ve accepted you need to partition the data.

    Besides, your network costs with polling are larger (the polling requests needs to be received too) and the cost of keeping the list of listeners is trivial. It scales close to linearly with replication/chaining, and you’re likely to run into CPU and network limits long before you run into RAM limits over the number of connections on a properly tuned system.

  2. I don’t think they’re using XMPP on the backend, they have a front-end bot at twitter@twitter.com that can send and receive updates once you link your account to your Jabber ID.

    I get twitter notifications via IM from that bot, I’m pretty sure they don’t use XMPP as a messaging protocol between their servers.

  3. mgkimsal says:

    OK guys – you’re probably correct. Not sure where I inferred that XMPP was used internally – it’s probably only for external interactions.

Leave a Reply