The amount of data in the business world continues to triple every year. The McKinsey Global Institute forecasts that the global data volumes will grow at a compounded annual rate of 40% between 2009 and 2020. This is broadly explained by the importance of data in supporting the decision making process. Every single day, organizations collect loads of data concerning their clients, employees, competitors and the markets in which they operate. Looking beyond the data, lays the potential to create and develop business capabilities and competencies.
Mining these data sets to unearth critical information on how various stakeholders perceive the organization and in developing better products to satisfy customers is the real art. Going forward, this will certainly be the new way of business operation. The emergence of big data discipline is now entrenched in academic and business practice. The rise of multimedia platforms, social media and the internet will continue to drive the rise of data and information in organizations.
Preparing for a career in Data Science? Take this test to know where you stand!
Characteristics of big data
While most organizations keep attention on the volume of data generated every year, there are a couple of other features of data that are equally important. The velocity of data refers to the speed at which data streams into organizations. The emergence of social media platforms means that customers are able to give feedback at a faster rate than ever before.
This influx of opinions can help organizations develop better customer management capabilities. Another feature is the variety of data. The modern data methods feature exciting capabilities that are able to capture more information as new services are deployed.
Finally, the value of data is equally important. Data mining is the process by which organizations identify what is important and what is not important. There is little doubt in increasing need to capture as much data as possible in organizations and unlock the hidden information hidden in it.
The big advantage continues to create competitive organizations that are able to create and capture more value. More importantly it has become a key theme in various industries. However, is an interesting fact that what we consider big data today will certainly be very minute data in the next 10-20 years. The exponential growth of data will continue on an upward trajectory in the coming future.
Redefining big data
For a very long period of time, organizations relied on the traditional data storage platforms to make decisions. Most of the decision making process was done through transactional data captured on organizational databases. The development of the non-traditional and less structured data has been of great significance. In the recent times, there has been massive development of weblogs, email platforms, social media, sensors, and photographs.
These are crucial data sources that can be mined to establish critical insights in business managements. These non-traditional data sources are equally efficient and less costly. As a result, most organizations are quickly incorporating these tools in their business intelligence networks. In a nut shell, big data involves the traditional enterprise data, the machine generated data, and the social data. Investment in right tools is important in order to derive maximum value from the big data. Right from capturing transactions to performing complex analysis, organizations need to develop new insights and unearth new business relationships.
The big data advantage
A study conducted by the MGI and the Mckinsey’s technology office is quite interesting. The study covered five major sectors both in the United States and Europe, viz, the Healthcare system in the US, the Public enterprises in Europe, the retail sector in the US, manufacturing, and personal location globally. The underlying conclusion was that big data has the potential of creating more value for these sectors. In the retail sector, it was discovered that a retailer leveraging on the big data has the potential of improving the operating margins by more than sixty percentage points.
The United States healthcare system could add more than $300 billion a year if the big data approach is effectively applied. Similarly, the developed economies of Europe could save $149 billion in service delivery and substantially reduce fraud, errors, and corruption. Besides the advantages of the big data, organizations and governments need to consider the data storage aspects.
Security threats and pitfalls call for the need to ensure these data sets are well guarded to avoid any form of intrusion. As such, people need to constantly evaluate the new data storage rules in big data era. The significance of the big data can be manifested in five major ways. As discussed earlier, big data sets can unlock immense value by making information more transparent and available at a higher rate. It therefore speeds up the decision making process.
Additionally, the organizational management and employees have a better understanding of their businesses. This will more likely lead into increased productivity, greater innovation and a competitive advantage. All these aspects have a positive impact on the top and bottom lines of an organization. Good examples include the use of home monitoring devices to monitor the patient at home. These gadgets help improve the delivery of healthcare services.
Patients can be monitored from home hence they don’t have to frequently visit the hospital. Manufacturing companies attach sensors on their products to monitor their customers. These companies are able to receive usage patterns and failure rates which lay the foundation for further improvements. The extensive use of mobile phones and the GPRS enabled devices enable marketers and advertisers to reach out to consumers when they are at very close distance to their stores. Hence, organizations are able to boost their revenue streams. Second, since organizations are continually collecting information regarding their products and services, it becomes pretty easier to develop detailed performance reports which help to expose weak areas in the organization.
Data can also be used to simulate the organization under different circumstances. This can help improve decision making through forecasting and now casting techniques. Third, it enables organizations to develop narrower segmentation of their clients and hence develop better products for them. Fourth, it allows for the creation of more sophisticated business analytic tools. Finally, these pieces of data can be used to develop the products further and services offerings.
Developing big data capabilities
Developing a big data platform requires special considerations. The recent spate of systems hacking and security hitches requires developers to carefully maneuver the extricate chains involved to develop a sound platform that captures and maintains the integrity of the system. In developing the platform detail need to be placed in these three key areas; data acquisition, data organization and data analysis. The acquisition of data should be able to capture huge streams of data.
The big data phenomenon involves higher velocity and higher variety. The platform should be predictable in capturing data and the execution of simple queries. More importantly, it should be able to accommodate fairly high volumes of transactions, which are usually distributed in many locations. Data organizations involve the integration of the data captured in a manner that can be easily analyzed. For example, one may wish to cluster all the customers according to some specified criterion.
The development of huge volumes of data requires the platform to develop mechanisms of organizing the data at its initial destination. This helps in saving money and time because it does not involve moving around large volumes of data. The development of Hadoop technology is quite interesting. It enables the processing of large data volumes on the initial data storage devices and clusters. Take an example of the Hadoop Distributed File System, which is a long-term storage system for the Web logs. The technology manipulates the web logs into browsing sessions by the use of MapReduce programs. The programs generate aggregate results on the same cluster which are then loaded into the relational Database Management Systems.
Data analysis may involve the movement of data from the initial storage location or it may also be done in distributed locations. In a distributed environment, some data may be left at its original location while the rest is moved to a data warehouse. Data analysis involves the deeper analytics like statistical methods and data mining capabilities. Good data analytics should be able to process large volumes, develop quick response mechanisms and automate decisions based on models developed. Better insights should be able to analyze the new data in the context of the previous data sets and provide new insights on the previous problems.
The solution and storage spectrum
There is a wide variety of technological platforms to address the infrastructural requirements necessary to take care of the data capture, integration and analysis. While there are more than 120 open source databases for capturing and storing data, the Hadoop technology has stood out as the most common in organizing huge data. These systems have also created solutions comprised of SQL and non SQL solutions. While the non SQL systems are developed to capture data and classify it on entry into the system, the SQL systems place the data in defined structures.
Every time organizations want to move huge amounts of data from the existing storage platforms to the big data capabilities, they will need to greatly alter the ETL processes in a big way. The recent developments in the CPU technology guarantee that the new processes will be smoothly integrated. The biggest challenge however has been the storage capabilities. Big data platforms require massive transfer of huge amounts of data. In many circumstances, the databases may need to be taken through the ETL process. During this process, the information is usually removed from the current sources and developed to a compliant format like the Hadoop before being uploaded to the final HDFS format.
As such, the ETL process is a major component in the big data process. It is important to realize that the network and storage speeds usually run on a minute fraction of the whole computing process. This therefore reduces the effectiveness and the latency if the system. The Hadoop technology processes data in batches, which requires all information to be re-routed through the ETL. Solid-state storage solutions improve the performance of the reading or writing of information at the end of the ETL process. They allow data reading and writing to be done at high speeds. The installation of solid-state memories at either end of the network is certainly the best solution for the big data capabilities. It works by enabling information to be quickly transmitted from one end of the network to the other.
Fast speed of information allows data writing and latency. Information bandwidth is also improved. It is also very easy to do away with the high speed network if the source and destination storages are on the same systems. It effectively reduces the delays that are associated with the ETL process when the data does not necessary have to travel over a network. Solid-state storage capabilities provide practical and economical solutions to the big data platforms and technologies. This is so because; they usually process millions of data. On the contrary however, the traditional hard drives are usually too slow and very expensive. Performance also greatly depends on the time to value and time to insight capabilities.
The time to insight can be reduced by processing more information faster. However, to reduce the time to value may require augmenting the process with experience and intuition. As such, it makes simplicity a key consideration when solving the storage problems for the big data platforms. The big data platforms usually incorporate a number of improvements to deal with increased complexities. Complexity is usually a recurring feature. As more and more components require to be integrated, the level of complexity equally rises. The management of the increased components can also be another headache.
It is important to note that by reducing the number of components, one does not necessarily reduce the complexity of the big data platform. The use of an all-flash memory array has been noted as one of the best practice in storage problems in the big data technology. Solid-state devices do not require to be turned on as is the case with the traditional magnetic storage systems. The big data technologies present an array of challenges. They are likely to fail quite often. Hence, constant testing is often a repeated and trusted mantra in the development and maintenance of these systems.