Big Data Analytics: Asking the Right Questions

(Part 2 of 4)

According to IDC’s 2011 Extracting Value from Chaos study – the 5th consecutive report of its kind – last year the amount of data created and replicated burst through the zettabyte barrier for the first time. That’s more than one trillion gigabytes of data. Even if you don’t know the scale between zettas and gigas, you know that’s Big Data. In her second post of a 4-part series for the SHARE President’s Corner, veteran tech journalist Renee Boucher Ferguson explores how organizations are gleaning Big Analytics from Big Data.

Two business trends are changing the way Big Analytics are derived from Big Data – and both have more to do with dollars than zettabytes. The first trend is the global economic recession, which, in many cases, has forced companies to re-evaluate the way they do business and reengineer processes based on the bottom line. That means understanding what customers want—or are saying they want—and being able to respond to those needs on the fly. The second trend is risk mitigation. Big data analysis helps organizations cope with each.

A number of analytical approaches—and technologies—help companies react quickly to business changes, from streaming and predictive analytics to data visualization and sentiment analysis. And their uses span vertical markets.

Streaming analytics essentially is the ability to analyze data in real time. Examples include location information or sensor data, where companies need to react fast to changing scenarios. Last year Yahoo! open-sourced its S4 platform for developing real-time MapReduce applications. (Developed initially by Google, MapReduce pushes code down to data in Hadoop for analysis.)

Streaming analytics also has huge potential in healthcare. The University of Ontario Institute of Technology, for example, has been working with IBM to detect changes in streams of real-time data to measure, for example, respiration, heart rate and blood pressure. The analytics can be applied to models to compare the differences and similarities of diverse populations of premature babies. The results can be used to tune rules that alert specialists in neonatal intensive care units when symptoms occur in real time.

Predictive analytics are not a new concept and sometimes, as in the case with IBM’s InfoSphere Identity Solutions, overlap with streaming analytics. In the mid-1990s, IBM developed a statistics-based program to help the NBA apply technology to be more predictive about the way in which players play a basketball game. What’s changed with big data analytics is the ability to explore different data types to look at trends, patterns and deviations to predict the probability of outcomes. SAS Institute and IBM are two vendors developing predictive capabilities for big data analytics.

In October 2011, IBM announced new predictive analytics software, SPSS Statistics 20.0, with a mapping feature that can be used across industries for marketing campaigns, retail store allocation, crime prevention and academic assessment. At the same time, the company announced its IBM Content and Predictive Analytics for Healthcare capability that uses natural language processing to help doctors and healthcare professionals advance diagnosis and treatment by understanding the relationships buried in large volumes of clinical and operational data.

In its “Big Data Analytics” report, The Data Warehousing Institute (TDWI) found advanced data visualization (ADV) to have strongest potential among options for big data analytics. While 20 percent of respondents said they are currently using ADV tools, 47 percent said they will be using it in three years. More importantly, 58 percent of those surveyed were committed to implementing ADV at some point in the future. A number of smaller companies have sprung up —Yellowfin, Tableau, Tibco Spotfire—with ADV products that are simple and visually powerful.

Cornell University began using Tableau’s visually based analytics software nearly five years ago as a reporting tool that, initially, would allow college deans to keep better track of KPIs (key performance indicators.) Today more than 600 employees use Tableau to do all manner of analysis, from dissecting the student applicant pool, evaluating risk and analyzing university expenditures to visualizing faculty salary statistics, keeping track of which students are in what classes and managing contributor relations.

Sentiment analysis also has had a lot of play—especially in the hearts and minds of marketers—as it helps organizations sift through mounds of unstructured data to determine what’s being said online about their company. Sentiment analysis uses natural language processing, computational linguistics and text analytics to extract opinions, emotions and sentiments in text. Big tech companies like SAS and IBM have products that mine data in blogs, wikis and social networking sites to uncover insights in a number of industries. There also are smaller providers like Lexalytics.

To gain insight on what customers were saying about Bank of America, consulting firm Beyond the Arc analyzed more than 41,000 comments on Facebook and Twitter to identify key trends. The company focused on about 9,000 comments, 88 percent of which were short, sentiment-driven tweets and 12 percent that were more narrative in nature, according to a Beyond the Arc case study. The team uncovered a number of primary themes of concern about the bank, including about 20 actionable service breaks. Among them a failure to send out new debit cards, payments not posted on time, late tax forms and duplicate charges; all had customers considering switching banks. The company found, too, that many customers were outraged by a news article that suggested Bank of America might impose a $50 limit on debit-card transactions. The story had many customers threatening to leave the bank over the issue.

The various big data analytics methodologies can be divided in two separate processing approaches: batch and streaming. Since Hadoop is batch it enables deep discovery as an iterative process--finding an answer, reflecting on the implications and finding new answers--while streaming provides more immediate analytics.

“[Batch processing] is a hugely important process because that’s where you think, yeah maybe this and that are related,” said Jeff Jonas, chief scientist of the IBM Entity Analytics group and an IBM Distinguished engineer. “In fraud systems you would do this kind of deep reflection at the end of the month to say, ‘oh, there is a pattern emerging’. One of the things I’ve really come to appreciate over the last year is this is a form of feedback loops.”

“You’re doing real-time sense and respond [with streaming analytics] but you want to do this periodic reflection and out of that you find really interesting, emerging patterns that you really couldn’t notice on a stream,” Jonas continued. “And then you promote those back and you start to become aware of them as they’re happening.”

In the third installment of this 4-part series, Renee Boucher Ferguson continues her conversation with experts in the field, who discuss in-depth how IBM in particular is honing in on Big Data Analytics.

Recent Stories
DevOps Transformation is About Gaining the Speed to Face Industry Pressures
Discussing the Future of the Mainframe

IBM Systems Magazine: December Editor's Picks for SHARE