Is my data big or just slightly rotund?


                    Jim Headley

Jim Headley

Jul 31, 2019

Is my data big or just slightly rotund?
Photo by Markus Spiske on Unsplash

So you have several Terabytes of data sitting around and someone asks if you are a “Big Data” practitioner. Likely not. Well then what is Big Data and how can I get me some and leverage it to drive my business forward?

Let’s start with a definition. Oxford Dictionary defines Big Data as “Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.” O.K. Still a little too ambiguous for you? Me too.

The term Big Data underlies the central concern that we at MacLaurin Group have expressed. Data is a single, important ingredient in creating a data driven culture. Let’s examine cake for a moment. You need flour to make a cake (ignoring the comments from the gluten free contingent), but a cake is not just flour. The full cake experience demands additional ingredients. Let’s see if we can create a usable recipe for the ultimate objective, actionable analytic insight.

Easy as…cake?

Is my data big or just slightly rotund?
Photo by Alina Karpenko on Unsplash

We will stick with the whole cake analogy. I only hope there are enough ingredients to support icing in this metaphorical confection. The data in Big Data is a very important ingredient and has some fundamental characteristics as defined by Doug Laney’s three V’s. These include variety, volume and velocity.

Variety

Let’s call this salt. Variety, as it turns out, is not only the spice of life, it is also important in the formulation of Big Data. Variety refers to the ever increasing sources, forms and varied context associated with Big Data. Most are familiar with the data that organizations create in the course of doing business. The majority of this data exists in Master Data Stores or Data Warehouses. This data is typically structured and heavily manicured through batch processing. If this is all of the data you possess, your data is likely just rotund. To get Big we need to carbo load. Lots of carbs in cake so Big Data should also include unstructured data. Unstructured data is acquired through sources such as video, audio, images, mobile, social media, the Internet of Things (IoT), web interactions, etc. These types of data exist in many various formats and do not lend themselves to the type of curation necessary for inclusion in a structured data environment.

Velocity

This feels like sugar. This is a real differentiator. Big Data is not just big, it is also fast. Like drinking out of the proverbial fire hose. Many of the sources mentioned in Variety above, have a near real-time or real-time aspect. That is important. Real-time data enables a business to react to the customer while they are engaged with their business.

If you want to be able to anticipate customer needs in that critical moment, then you need up to the minute insight into how they are engaging and/or what they are purchasing as well as historical information. Because this data is time sensitive, it does not lend itself to the manipulation necessary to incorporate it into a structured data environment (data warehouse). If you did incorporate it into the data warehouse, you would not be able to interact in real-time.

Volume

Is my data big or just slightly rotund?
Photo by Sharon McCutcheon on Unsplash

Baking powder. I’m struggling to keep this analogy afloat now. Dang, I should have gone nautical. Traditionally, companies were their own source of data. This has changed. The mountains of data that are being generated today are not just as a result of a customer’s engagement with a company. It comes from an organization’s various professional affiliations and is generated by the customers themselves. All that IoT and social media data takes up a lot of space. It also provides incredible insight into customer preferences.

Technology

We’ll call this milk. The whole reason Big Data is consumable, is that there are now products that can deal with really large quantities of unstructured data. Products like Apache Spark, Amazon EMR, and Azure HDInsight make processing these large data sets possible. Additionally, who asks if we should be cutting back on data storage now that it costs $0.001 per Gig! Nobody! Keep it all and let the analysts sort it out! The Big Data technology topic really requires its’ own analogy to do it justice, so we will save that for another day.

Analysts

We’ll call this icing. We made it. It was touch-and-go there for a little while. Cake without icing is little more than sweet bread. In Big Data parlance, this would be analytic rigor. You can have morbidly obese data and spend buckets of cash on technology, but in the end, if you lack the analytic resources and a data driven culture to support the creation of actionable insight, you have an upset executive team with little to show for their investment. This is where the vast majority of companies seeking analytic nirvana fall down. They focus exclusively on the “How” (3 Vs, and Technology) and dedicate little or no energy to the “Why” (insights that affect outcomes). Don’t get me wrong, the “How” is an important step but this cake is only half baked without the “Why”.

Let’s wrap this thing up before the cake analogy goes stale.

The question should not be “Do I have big data?” It should be, “Do I foster a culture that values data as a means to an end?”.

The end in this case is analytic rigor that yields actionable insights that align with an organization’s strategic objectives. If you lack a data driven culture and are currently throwing your flour, I mean data, away, stop. While you may not be ready to preheat the oven, you can at least begin gathering the ingredients.

Recommended