Here at Chicory we collect and analyze over 100 Million data points per month and that number grows daily! We use this data to measure consumer interaction with our products, which helps us make data-driven decisions about how to improve our product and our operations. Additionally, this data serves as the basis for our ad tech offerings

Given the volume of data we collect and the rate of growth that we experience, a pivotal problem we had to tackle was the classic "Build vs Buy" question around whether to pay for commercial analytics and visualization tools or to leverage Big Data/Analytics cloud services such as AWS's Redshift or Google Cloud's BigQuery. Spoiler alert: we went "build." But I think the discussion around how to do this and the options to consider are crucial for startups that are boostrapped for resources.

We'll dive into some of the options to see how they stack up against one another, but first let's break down what is needed in a full analytics solutions that handles this scale of data. 

A full analytics solution needs to address the collection, retention, analysis and visualization of data in order for it to be an effective tool for your business. The analytics market is very crowed and you can find products offering solutions that address individual parts of this pipeline all the way to full end-to-end setups.

Companies offering full solutions market themselves as "web analytics" or "mobile analytics" companies. These include Mixpanel, Kissmetrics, RJMetrics, Google Analytics and Amplitude. These companies all provide amazing products that empower your team to hit the ground running. They have mature web and mobile SDKs (software development kit) that allow you to start collecting events with a few lines of code. There are also the stunning visualization tools that allow you to perform time series, segment/cohort and funnel analysis.

If your website or app is generating a level of data that can fit in their introductory tiers, then their pricing is actually very attractive. Mixpanel let's you process up to 8 million data points for $1,000 per month, Kissmetrics gives you 10 Million data points for about $1,700 per month, and both Amplitude and Google Analytics treat you to 10 Million events for free! While they all also offer different tiers providing various features, your volume of data will ultimately be what determines your costs.

So, what happens, then, when you're processing ten times the data than your tool's threshold, like we are? Well, expect to be dealing directly with their sales teams and negotiating a custom contract. That's where things get tricky, as these contracts can run from several thousands of dollars to tens of thousands of dollars per month, and the price doesn't necessarily grow proportionally to the volume of data.

This is when you consider your choices. When your company reaches a certain stage of growth, do you continue to pay for a plug-and-play solution (in many cases, a solution that your team members are already comfortable and confident with) or do you consider taking your data back and building a custom analytics tool?

The decision should come down to priorities and top-line metrics. If one of your company's core assets is data, then you should consider building an analytics solution in house. You probably already have a sense of if this is true for your company, but for us, data informs almost every asset of our business. At our core, we need to know recipes and ingredients really, really well. Which meant "build" was our smartest option.

The leading cloud service providers (Amazon Web Services, Azure and Google Cloud) are all investing heavily in developing managed solutions to help engineering teams pull together high volume processing systems like the one needed for an analytics platform. Chicory currently operates an in-house solution that leverages several products from AWS--including Kinesis Firehose, Redshift and DynamoDB--to house our data infrastructure. We also use the open-source data visualization package, Metabase, running on GCE, to handle our data visualization needs.

The entire system costs under $1,000 month and processes over 100 Million events, with plenty of room to grow. More importantly our costs grow linearly with volume of data we process and we can therefore predict and control tech budgets effectively. With full control over the entire data pipeline we've also been able to find synergies between our technology systems. Our data helps us detect anomalies in near real-time and allows us to syndicate data to our strategic partners. 

The decision to build vs. buy is one that is unique to your company's situation. Consider the volume of data that you are processing now, at what rate that volume is growing, and if that data is a core asset of your company.

Comment