Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Cody A. Ray
2 min readMay 27, 2014

Over the past several months, I’ve been leading an effort to replace our aging Scribe/MongoDB-based stats infrastructure with a more scalable, cost-effective solution based on Suro, Kafka, Storm, and KairosDB.

Let’ see what each of these pieces gives us:

  • Suro effectively replaces Scribe as the store-and-forward component, enabling us to survive the frequent network partitions in AWS without losing data.
  • We’ve introduced Kafka to serve as a queue between our stats producers and consumers, enhancing the reliability and robustness of our system while enabling easier development of new features with alternative stats consumers.
  • Storm is used to pre-aggregate the data before insertion into KairosDB. This drastically decreases the required write capacity at the database level.
  • We’re replacing MongoDB with KairosDB, which is a time-series database built upon Cassandra. This provides us with high linear scalability, tunable replication, and impressive write-throughput.

Last week, I discussed the last two components in this pipeline at Gluecon 2014 in Denver.

Title: Building a Scalable Distributed Stats Infrastructure with Storm and KairosDB

Abstract: Many startups collect and display stats and other time-series data for their users. A supposedly-simple NoSQL option such as MongoDB is often chosen to get started… which soon becomes 50 distributed replica sets as volume increases. This session is about designing a scalable distributed stats infrastructure from the ground up. KairosDB, a rewrite of OpenTSDB built on top of Cassandra, provides a solid foundation for storing time-series data. Unfortunately, though, it has some limitations: millisecond time granularity and lack of atomic upsert operations which make counting (critical to any stats infrastructure) a challenge. Additionally, running KairosDB atop Cassandra inside AWS brings its own set of challenges, such as managing Cassandra seeds and AWS security groups as you grow or shrink your Cassandra ring. Join a deep-dive session where we’ll explore how we’ve used a mix of open-source and in-house tools to tackle these challenges and build a robust, scalable, distributed stats infrastructure.

If you want a peek into how these pieces fit together, peep the slides.

View The Slides (and Presenter Notes)

This is still a work-in-progress, so if you have any ideas or suggestions for more effectively tuning or scaling this system, hit me up! :-)

How does your company handle high-volume time-series data?

Originally published at http://codyaray.com on May 27, 2014.

--

--

Cody A. Ray

Personal Finance Expert and Investor, Engineering Leader, and Business Strategist