New technologies, such as Hadoop, use a map/reduce paradigm that enables parallel processing of massive volumes of differently structured data that is spread across potentially hundreds and thousands of nodes. This breaks down the analysis of seemingly unmanageable data volumes into small discrete analytics jobs, and then the reduced result sets are combined to provide the complete answer. This IBM® Redbooks® Solution Guide is intended to help organizations understand how IBM InfoSphere® BigInsights™ for Linux on System z® and other related technologies can help deliver improved business outcomes as part of a big data strategy.
Information is power if you know how to extract value and insights out of it. The more that is known about a particular issue, situation, product, organization, or individual, the greater the likelihood of a better decision and business outcome.
Data is like oil because it can be refined and used in many different ways, increasing its market value. Unlike oil, however, it is a renewable resource.
Large enterprises have many applications, systems, and sources of data, some of which are used to fulfill specific business functions. Often, it is necessary and beneficial to bring these “islands” of information together to reveal a complete and accurate picture, ultimately getting closer to the truth by taking into account multiple perspectives.
The industry term that is used to describe the integration and analysis of multiple different sources of data (see Figure 1) to gather deeper insights from is known as big data.
Figure 1. Multiple different sources of data
Did you know?
Today, most business analytics are based on information that is stored in enterprise data warehouses that are fed mainly from transaction and operational systems. This data is rich in value, is trusted and understood, as is its provenance. Used by 96 of the top 100 global banks, and 23 of the top 25 US retailers, IBM® System z® holds a significant amount of the world’s business critical information.
Although valuable, this data on its own provides just one view of the world. The big data paradigm focuses on combining this data with many other information sources, such as social media, web logs, emails, documents, multi-media, text messages, and sensor information, providing a richer and complete view to augment our knowledge of the world around us.
New technologies, such as Hadoop, use a map/reduce paradigm that enables parallel processing of massive volumes of differently structured data that is spread across potentially hundreds and thousands of nodes. This breaks down the analysis of seemingly unmanageable data volumes into small discrete analytics jobs, and then the reduced result sets are combined to provide the complete answer. This IBM Redbooks® Solution Guide is intended to help organizations understand how IBM InfoSphere® BigInsights™ for Linux on System z and other related technologies can help deliver improved business outcomes as part of a big data strategy.
Business value
The interest and uptake of Apache Hadoop in the market has been described as unstoppable by analysts, including Forrester Research. The appeal of no-cost open source software and low-cost commodity hardware favors a "divide and conquer" parallel processing approach to analyzing large semi-structured and non-structured data sets. But what starts as an experiment of low cost, “good enough” hardware and software often falls apart when this situation is applied to mission-critical data. A challenge that clients face in big data initiatives is efficiently extracting, transforming, and loading (ETL) large volumes of data from sources such as IBM DB2®, IBM IMS™, and VSAM into Hadoop clusters in a timely and cost-efficient manner.
One critical decision clients make is choosing where to analyze the data. This decision is often influenced by where the data originates and the classification of the data’s sensitivity.
IBM InfoSphere BigInsights elevates “good enough” Hadoop to an enterprise-ready, business-critical analytics solution. IBM InfoSphere BigInsights for Linux on System z, combined with IBM InfoSphere System z Connector for Hadoop, provides customers with two key advantages:
The material included in this document is in DRAFT form and is provided 'as is' without warranty of any kind. IBM is not responsible for the accuracy or completeness of the material, and may update the document at any time. The final, published document may not include any, or all, of the material included herein. Client assumes all risks associated with Client's use of this document.