Throughout the years, lots of buzzwords become fashionable throughout many industries. There are few that have become so popular, and for so long, as big data. But what is big data, exactly?
Big data refers to a virtual ocean of information from a variety of sources, analyzed and filtered in such a way as to develop meaningful and actionable results.
The process of converting “big data” into meaningful results can appear complicated and difficult. However, once you understand what big data is and how it works, understanding how to make it meaningful doesn’t seem so complicated.
What is Big Data?
When you hear people talk about “big data”, it’s usually with a lot of hand waving and big words. But when you boil down all of the hyperbole, the actual “data” is actually many multiple data input streams.
To understand this, an example can help. Let’s say you run an umbrella manufacturing company. Your marketing department is looking for a way to better predict when market demand is about to spike.
Before the days of big data, marketers would study market trends, send out customer surveys, and many other activities.
They would collect all of that data and store it on their own company’s internal databases. Someone might even be in charge of updating marketing research data on an annual or quarterly basis.
However, the advent of big data expands the capability of conducting this kind of research. In particular, big data is especially effective at identifying important trends or events in near real-time.
Data inputs for this kind of “big data” analysis might include realtime data streams by writing code that plugs into the Application Programming Interface (API) of many different companies that have made that data public:
- Twitter and Facebook: Identify when and why people are discussing purchasing umbrellas.
- Weather: Identifying weather conditions or predictions that could turn into higher umbrella sales.
- Stock Market: Seasonal changes in the cost of raw materials to produce umbrellas.
- Customer Web Use: Using information from the computer cookies of people who visit the company catalog to understand buying behaviors.
- Customer Purchase History: Tracking the geography and seasons of point of sale trends from retailers.
To use big data, this company’s marketing team would need to, in some cases, install new technologies.
Big Data and the Internet
This might include the Internet of Things (IoT) technology at retailers that tracks and reports on consumer behaviors. Or it might involve having a programmer write the code required to interface with Twitter’s API to filter out any Tweets that mention “umbrellas” or the company name.
Each of these technologies is now available thanks to the internet. The internet allows anyone to tap into streams of data from across the globe.
Here is how the setup in our own example may work in this case.
This diagram shows how data flows into the company’s “data lake” from many different sources. The incoming data may be structured differently, but the important thing is to collect as much data as possible from all sources.
What is a Data Lake?
Unlike a database, which contains structured data organized in specific columns and rows, a data lake is a massive repository for many different forms of data.
The data that’s stored could be structured or unstructured. Meaning it may have structured rows and columns, or it may not. The data could be strings that use specific formatting to separate data. Each data source can submit data to a data lake in whatever form it likes.
Picture a data lake like a massive library that contains many forms of media, like books, images on microfiche, and video on DVDs.
Imagine digital intelligence and data analytics engineer as patrons of that library. These patrons can digitally pull data out of books, microfiche, and DVDs and find ways to mix and combine that data and learn things from how the data correlates.
Out of those learnings come actual, actionable intelligence. Some of these from our example might include:
- Chatter on Twitter and Facebook indicate an approaching storm in New York City, with thousands of customers planning to buy umbrellas.
- Computer cookie purchasing data and retail checkout machines indicate that buyers in California are willing to pay more for designer umbrellas than people in Virginia are.
- A large approaching storm pattern indicates most of the East Coast will be covered with a rainstorm for a full week.
All of these learnings could prompt the marketing team to invest in more advertising geographically where umbrella sales demand is much stronger. Manufacturing operations could also shift their production efforts to those areas of the world closer to where sales are more likely to climb.
In this way, using big data, any company can streamline their marketing and operations.
What is Hadoop?
The next question is, how do companies process such high volumes of data and identify trends?
This kind of data crunching requires massive computer resources. So much so, that companies no longer use large mainframe computers on-premises like they used to. Many of these services are now purchases from the cloud. Cloud data intelligence services like Apache Hadoop offer many computer nodes on a large cloud network. Each of these nodes contributes to the processing power required to analyze massive streams of data from multiple sources.
This kind of processing power is the heart of machine or digital intelligence and data analytics. Hadoop is the software framework that makes this entire network of massive computational power work as required for digital intelligence engineers.
Once the computational engine produces actionable intelligence, these are usually delivered to the company in the form of dashboards or reports.
Big Data isn’t Just Buzzwords
The truth is that “big data” is more than just corporate lingo. Many companies are learning that by making better use of data they’re able to accomplish numerous achievements.
- Manufacturers can improve critical production metrics like yield, quality, and efficiency.
- Retailers can better align marketing, advertising, and business investments based on marketplace signals.
- Distributors are able to predict potential problems in a supply chain to preemptively develop contingency plans.
- News organizations can quickly identify newsworthy events by analyzing public signals on the internet.
- Cybersecurity experts use signals across the internet to identify cyber-attacks while they’re in progress.
While much of what big data has accomplished in recent years remains virtually invisible to the public, big data has actually had a significant impact on everyday life for people across the world.