What is Big Data, and why should you care?
With the technology having already reached the pinnacle of its highest uses implementation, you would be quite aware of its major functionalities, processes, uses, and the overall importance. In the August of 2015, it slipped off Gartner’s 2015 Hype Cycle for Emerging Technologies and created a huge buzz in the tech-driven world.
If you haven’t been all that tech savvy and missed on crucial information on Big Data, this write-up will furnish you with details on all that you need to know at the outset to understand the technology better.
Big Data – What does it mean?
As Gartner defines it – “Big Data are high volume, high velocity, and/or high-variety information assets that require new forms of processing to enable enhanced decision making, insight discovery and process optimization.”
Let's dig deeper and understand this in simpler terms.
The term ‘big data’ is self-explanatory − a collection of extremely big data sets that normal computing techniques cannot process. The term not only refers to the data, but also to the various frameworks, tools, and techniques involved.
Technological advancement and the advent of new channels of communication (like social networking) and new, stronger devices has presented a challenge to industry players in the sense that they have to find other ways to handle the data.
From the beginning of time until 2003, the entire world only had five billion gigabytes of data. The same amount of data was generated over only two days in 2011. By 2013, this volume was generated every ten minutes. It is, therefore, not surprising that generation of 90% of all the data in the world has been in the past few years.
All this data is useful when processed, but it had been in gross neglect before the concept of big data came along.
Pro-Tip: To learn more about Big Data and get your foot in the Data Science industry door, consider professional certification training in Big Data or in allied technologies, such as Impala, Cassandra, Spark and Scala.
The Major Sources of Big Data
- Black Box Data: This is the data generated by airplanes, including jets and helicopters. Black box data includes flight crew voices, microphone recordings, and aircraft performance information.
- Social Media Data: This is data developed by such social media sites as Twitter, Facebook, Instagram, Pinterest, and Google+.
- Stock Exchange Data: This is data from stock exchanges about the share selling and buying decisions made by customers.
- Power Grid Data: This is data from power grids. It holds information on particular nodes such as usage information.
- Transport Data: This includes possible capacity, vehicle model, availability, and distance covered by a vehicle.
- Search Engine Data: This is one of the biggest sources of big data. Search engines have vast databases where they get their data.
Additionally, Bernard Marr, a Big Data and Analytics expert, has come up with his brilliant list of 20 Big Data sources that are freely available to everybody on the web. Some of them are briefed about here.
- Data.gov – where all of the US Government’s data is freely accessible, and information ranging from climate to crime is available.
- Analogous to this is the UK Government’s portal, Data.gov.uk, where metadata on all of UK books and publications since 1950 can be gathered.
- There is also the US Census Bureau – that covers valuable information like population, geographic and other data. Identical to this is the European Union Open Data Portal, comprising the census data from European Union institutions.
- And something more close to our interests – the Facebook Graph, provides the application program interface information (Graph API), after gathering info from all the data that is shared publicly by its users.
- In the healthcare sector, there is the Healthdata.gov and NHS Health and Social Care Information Centre, from the US and the UK, respectively.
Google Trends, Google Finance, Amazon Web Services public datasets, are all similar examples.
From these examples, it is clear that big data is not about volumes alone. It also includes extensive variety and high velocity of data. In 2001, Doug Laney -an industry analyst-articulated the 3 Vs of big data as velocity, volume, and variety.
The speed at which data is streamed, nowadays, is unprecedented, making it difficult to deal with it in a timely fashion. Smart metering, sensors, and RFID tags make it necessary to deal with data torrents in almost real-time. Most organizations are finding it difficult to react to data quickly.
Not many years ago, having too much data was simply a storage issue. However, with increased storage capacities and reduced storage costs, industry players like Remote DBA Support are now focusing on how relevant data can create value.
There is greater variety of data today than there was a few years ago. Data is broadly classified as structured data (relational data), semi-structured data (data in the form of XML sheets), and unstructured data (media logs and data in the form of PDF, Word, and Text files). Many companies have to grapple with governing, managing, and merging the different data varieties.
Veracity (the quality of the data), variability (the inconsistency which data sometimes displays), and complexity (when dealing with large volumes of data from different sources) are other important characteristics of data.
Few of the many merits…
- Today’s consumer is very demanding. He talks to past customers on social media and looks at different options before buying. A customer wants to be treated as an individual and to be thanked after buying a product. With big data, you will get actionable data that you can use to engage with your customers one-on-one in real time.
- Big data allows you to re-develop the products/services you are selling. Information on what others think about your products -such as through unstructured social networking site text- helps you in product development.
- Big data allows you to test different variations of CAD (computer aided design) images to determine how minor changes affect your process or product. This makes big data invaluable in the manufacturing process.
- Predictive analysis will keep you ahead of your competitors. Big data can facilitate this by, as an example, scanning and analyzing social media feeds and newspaper reports. Big data also helps you do health-tests on your customers, suppliers, and other stakeholders to help you reduce risks such as defaulting.
- Big data is helpful in keeping data safe. Big data tools help you map the data landscape of your company, which helps in analysis of internal threats. As an example, you will know if your sensitive information has protection or not. A more specific example is that you will be able to flag the emailing or storage of 16 digit numbers (which could, potentially, be credit card numbers).
- Big data allows you to diversify your revenue streams. Analyzing big data can give you trend-data that could help you come up with a completely new revenue stream.
- Your website needs to be dynamic if it is to compete favorably in the crowded online space. Analysis of big data helps you personalize the look/content and feel of your site to suit every visitor based on, as an example, nationality and sex. An example of this is Amazon’s IBCF (item-based collaborative filtering) that drives its “People you may know” and “Frequently bought together” features.
- If you are running a factory, big data is important because you will not have to replace pieces of technology based on the number of months or years they have been in use. This is costly and impractical since different parts wear at different rates. Big data allows you to spot failing devices and will predict when you should replace them.
- Big data is important in the healthcare industry, which is one of the last few industries still stuck with a generalized, conventional approach. As an example, if you have cancer, you will go through one therapy and if it does not work, your doctor will recommend another therapy. Big data allows a cancer patient to get medication that is developed based on his/her individual genes.
So have we decoded the enigma of big data to you? Still curious to know more? Here’s another write-up – Is Big Data Overhyped?, that dives deeper into the importance of the technology and the hype factors that surround the domain.
Preparing for a career in Data Science? Take this test to know where you stand!
To know more on the Big Data Hadoop Administrator course offered by Simplilearn, click here.
Here’s an introduction to the Big Data and Hadoop training offered by us:
About the On-Demand Webinar
About the Webinar