As the scope of big data rapidly expands, so does the scope of the analytics that are necessary to extract insight from that data. It is simply impossible for humans or indeed rules-based engines to take that information to action. More and more, clients need analytics to make the best decisions possible; or better yet, embed those analytics into processes to automate the decision-making process, which they simply the answers based on the questions being asked at the point of impact. In order to address these rapidly evolving needs, we need to ensure the right analytics capability are deployed to suit each situation, each point of interaction and each decision point within a process. Join this session, and learn how IBM can provide a solution for the varying types of analytics: from descriptive to predictive to prescriptive to cognitive.
To put cognitive systems into the proper context lets take a look at some of the differences over a more traditional programmatic approach to problem solving. What Google is to search Watson is to discovery. We all have entered key words into a search bar only to have millions of entries returned for our review. Unfortunately, the majority of the information retrieved is not what we were looking for so we start over. Watson looks to bring back relevant results, with confidence, putting content into context. Unlike deterministic outcomes Watson is probabilistic in nature.
Take a simple question like 2+2. A precise answer is 4. That is exactly how a deterministic system would respond. However, Watson is not so sure. It may have a high confidence that 2+2=4 is the right answer, however if the context of the question was regarding automotive, 2+2 could have been a car configuration – two front seats, two back seats. If we had been talking to a family psychologist 2+2 could have been referencing a family unit with 2 parents and 2 children. You can quickly see how things have varying meaning which need to be analyzed and properly considered in the context of the broader questions being asked.
Unlike traditional systems which thrive off structure, were information is stored in a binary fashion all neatly organized into rows and columns, Watson can tackle unstructured data spread across disparate sources to unlock patterns and possibilities. And we have already touched on the importance of working in natural language.
This chart illustrates the evolution from descriptive to predictive and presecriptive to cognitive analytics and lists the key characteristics of each phase or analytics domain.
Important is to highlight the different possible analytics journeys and entry points. Clients will not always need to have a mature presecriptive analytics platform in place to launch a cognitive analytics initiative.
We define Cognitive Systems as those systems that can navigate the complexities of human language and understanding, ingest and process vast amounts of structured and unstructured data, generate and evaluate countless possibilities, and scale in proportion to the task. These systems apply human-like characteristics to conveying and manipulating ideas, that when combined with the inherent strengths of digital computing can solve problems with higher accuracy, more resilience, and on a massive scale.
Watson is an example of a Cognitive System. It is able to tease apart the human language to identify inferences between text passages with human-like high accuracy, and at speeds and scale far faster and far bigger than any person could do on their own. Watson doesn’t really understand the individual words in the language, but it does understand the features of language as used by people and from that is able to determine whether one text passage (call it the ‘question’) infers another text passage (call it the ‘answer’) with incredible accuracy under changing circumstances. In Jeopardy! we had to determine whether the question, “Jody Foster took this home for her role in ‘The Silence of the Lambs’” inferred the answer “Jody Foster won an Oscar for her role in ‘The Silence of the Lambs’”. In this case, taking something home inferred winning an Oscar. But it doesn’t always. Sometimes, ‘taking something home’ infers, ‘a Cold’, or ‘Groceries’, or any number of things. Context matters. Temporal and spatial constrains matters. All of that adds to enabling a Cognitive System to behave with human-like characteristics. A rules-based approach would require a near infinite number of rules to capture every case we might encounter in language.
This chart illustrates the various technical capabilities that characterize the various analytics areas. The reality of analytics-related use case sceanrios is that clients may have requirements on the entire continuum of analytics, ie. The focus may be on descriptive analytics with the need to implement all corresponding technical capabilities, but also analytical requirements from the remaining 3 analytics domains, including cognitive analytics. For instance, some clients may have a rather mature descriptive analytics platform, and require natural language processing capabilities for a sentiment analytics project to derive to brand sentiment, affinity analytical insight, without necessarily implementing a sophisticated predictive analytics platform.
When walking through this chart, explain the individual technical capabilities for each analytics capability.
This chart lists some of the clients that IBM has worked with. In order to become familiar with the details of these projects, please visit the “IBM client reference Database”.
Here is the link: http://w3-01.ibm.com/sales/references/crdb/ibmref.nsf/winsubmit?openform
The key message of this chart is the meaning of the blue areas to enable the various analytics domains. These blue areas are BI Data Infrastructure and Big Data Analytics with its capabilities to embrace the mobile needs, social media analytics, and cloud deployment models. These blue domains collectively enable the predictive, prescriptive and cognitive analytics (and descriptive analytics as well – although not explicitely mentioned on the chart). The capabilities transform into the key initiatives such as Smarter Commerce, Smarter Workforce, Smarter Analytics, and Smarter Cities and provide key business value to C-level stakeholders listed at the top of the chart. The business value is delivered for all Industries, which is illustrated by the 12 little symbols on the top of the figure. So the key message of the chart is the illustrateion – or rather transformation – of key technical domains such as BI Data Infrastructure and Big Data & Analytics capabilities to a broad industry-relevant set of business values.
Following are the 3 key messages of the chart:
Clients are leveraging various types of analytics to solve real business challenges. They soon realize there is no single solution to address all of their analytics requirements. Businesses in different industries will have specific needs that applying analytics can address, but there is no one size fits all.
This is why IBM offers many different analytics offerings. This includes industry specific solutions that address unique needs in major industries as well as optimized business and predictive analytics solutions. Cognitive computing, like IBM Watson is another example.
IBM is delivering these solutions on Power Systems because of the platform’s design points and capabilities. Power was built from the ground up to handle data-related applications and analytics workloads.
Key Points of the chart is to highlight the 4 Vs that represent an essential way to characterize Big Data: veracity, variety, velocity, and volume.
Volume is about rising volumes of data in all of your systems – which presents a challenge for both scaling those systems and also the integration points among them
Variety is about managing many types of data, and understanding and analyzing them in their native form.
Velocity is about ingesting data in real time and in-motion
Veracity deals with the certainty, or truthfulness of big data. Veracity is a big issue – and one that directly relates to confidence. In fact, as the complexity of big data rises (the first 3 Vs grow), it actually becomes harder to establish veracity.
The left part of the chart illustrates just 1 dimension (volume) in the context of increasing analytical complexity.
This chart puts into perspective three key areas that influence and drive the need of cognitive systems and analytics:
Big Data: highlight the 4 Vs again as a key driver, especially the need to analyze text, speech, video content, and other non-structured data, such as LOGs, call center transcripts, etc. Also highlight the veracity – meaning trustworthiness – of the data that requires reasoning, and other cognitive analytical capabilities to put insight into context and provide contextual meaning
Cloud: as a key deployment model, cloud represents a driving force to also take into consideration cognitive analytics in the cloud. Highlight the need to provide analytics capabilities that can be deployed and leverage in the cloud. As an example, point out IBM’s Social Media Analytics (SMA) v1.2 – the former Cognos Consumer Insight – which is not only cloud enabled but is offered as a cloud service by IBM.
SoLoMo: still a rather term, SoLoMo (Social, Local, Mobile) is increasingly used to describe these 3 aspects as a combined area that characterizes today’s consumer lifestyles. As such, all 3 aspects drive specific requirements and influence the technical and business capabilities that cognitive analytics needs to deliver. Social means for instance to understand contributions to social media networks, the meaning in context, and to be able to analyze natural language and text in all languages, dialects, and also the sometimes unique style of communication that takes place in social media networks. Local requires the locality awareness for instance to deliver location based services, preferences and culture-awareness when running cognitive analytics. Mobility in regards to cognitive analytics requires the inclusion and understanding of the mobile lifestyle, mobility patterns and preferences.
This chart focuses on the veracity and trustworthiness of big data, and introduces some dimensions of trustworthiness and veracity (right side of the chart). One of the key aspects of big data is the analytics of social media networks. Contributions via social media networks, however, need to be analyzed by taking the listed dimension into consideration. For instance, what was the usage intention of a social media network contribution, what is its relevance for the given analytics in scope or the use case scenario. The left side of the chart lists some of the challenges that requires sophisticated and state-of-the-art cognitive analytics capabilities in order to for instance understand whether a statement/contribution is done in a certain mood or emotional state, whether it is meant as a joke, whether it represents a sarcastic statement, and so forth.
Watson – the computer system we developed to play Jeopardy! is based on the DeepQA softate archtiecture.Here is a look at the DeepQA architecture. This is like looking inside the brain of the Watson system from about 30,000 feet high.
Remember, the intended meaning of natural language is ambiguous, tacit and highly contextual. The computer needs to consider many possible meanings, attempting to find the evidence and inference paths that are most confidently supported by the data.
So, the primary computational principle supported by the DeepQA architecture is to assume and pursue multiple interpretations of the question, to generate many plausible answers or hypotheses and to collect and evaluate many different competing evidence paths that might support or refute those hypotheses.
Each component in the system adds assumptions about what the question might means or what the content means or what the answer might be or why it might be correct.
DeepQA is implemented as an extensible architecture and was designed at the outset to support interoperability.
<UIMA Mention>
For this reason it was implemented using UIMA, a framework and OASIS standard for interoperable text and multi-modal analysis contributed by IBM to the open-source community.
Over 100 different algorithms, implemented as UIMA components, were integrated into this architecture to build Watson.
In the first step, Question and Category analysis, parsing algorithms decompose the question into its grammatical components. Other algorithms here will identify and tag specific semantic entities like names, places or dates. In particular the type of thing being asked for, if is indicated at all, will be identified. We call this the LAT or Lexical Answer Type, like this “FISH”, this “CHARACTER” or “COUNTRY”.
In Query Decomposition, different assumptions are made about if and how the question might be decomposed into sub questions. The original and each identified sub part follow parallel paths through the system.
In Hypothesis Generation, DeepQA does a variety of very broad searches for each of several interpretations of the question. Note that Watson, to compete on Jeopardy! is not connected to the internet.
These searches are performed over a combination of unstructured data, natural language documents, and structured data, available data bases and knowledge bases fed to Watson during training.
The goal of this step is to generate possible answers to the question and/or its sub parts. At this point there is very little confidence in these possible answers since little intelligence has been applied to understanding the content that might relate to the question. The focus at this point on generating a broad set of hypotheses, – or for this application what we call them “Candidate Answers”.
To implement this step for Watson we integrated and advanced multiple open-source text and KB search components.
After candidate generation DeepQA also performs Soft Filtering where it makes parameterized judgments about which and how many candidate answers are most likely worth investing more computation given specific constrains on time and available hardware. Based on a trained threshold for optimizing the tradeoff between accuracy and speed, Soft Filtering uses different light-weight algorithms to judge which candidates are worth gathering evidence for and which should get less attention and continue through the computation as-is. In contrast, if this were a hard-filter those candidates falling below the threshold would be eliminated from consideration entirely at this point.
In Hypothesis & Evidence Scoring the candidate answers are first scored independently of any additional evidence by deeper analysis algorithms. This may for example include Typing Algorithms. These are algorithms that produce a score indicating how likely it is that a candidate answer is an instance of the Lexical Answer Type determined in the first step – for example Country, Agent, Character, City, Slogan, Book etc.
Many of these algorithms may fire using different resources and techniques to come up with a score. What is the likelihood that “Washington” for example, refers to a “General” or a “Capital” or a “State” or a “Mountain” or a “Father” or a “Founder”?
For each candidate answer many pieces of additional Evidence are search for. Each of these pieces of evidence are subjected to more algorithms that deeply analyze the evidentiary passages and score the likelihood that the passage supports or refutes the correctness of the candidate answer. These algorithms may consider variations in grammatical structure, word usage, and meaning.
In the Synthesis step, if the question had been decomposed into sub-parts, one or more synthesis algorithms will fire. They will apply methods for inferring a coherent final answer from the constituent elements derived from the questions sub-parts.
Finally, arriving at the last step, Final Merging and Ranking, are many possible answers, each paired with many pieces of evidence and each of these scored by many algorithms to produce hundreds of feature scores. All giving some evidence for the correctness of each candidate answer.
Trained models are applied to weigh the relative importance of these feature scores. These models are trained with ML methods to predict, based on past performance, how best to combine all this scores to produce final, single confidence numbers for each candidate answer and to produce the final ranking of all candidates.
The answer with the strongest confidence would be Watson’s final answer. And Watson would try to buzz-in provided that top answer’s confidence was above a certain threshold.
----
The DeepQA system defers commitments and carries possibilities through the entire process while searching for increasing broader contextual evidence and more credible inferences to support the most likely candidate answers.
All the algorithms used to interpret questions, generate candidate answers, score answers, collection evidence and score evidence are loosely coupled but work holistically by virtue of DeepQA’s pervasive machine learning infrastructure.
No one component could realize its impact on end-to-end performance without being integrated and trained with the other components AND they are all evolving simultaneously. In fact what had 10% impact on some metric one day, might 1 month later, only contribute 2% to overall performance due to evolving component algorithms and interactions. This is why the system as it develops in regularly trained and retrained.
DeepQA is a complex system architecture designed to extensibly deal with the challenges of natural language processing applications and to adapt to new domains of knowledge.
The Jeopardy! Challenge has greatly inspired its design and implementation for the Watson system.
IBM Watson is the very embodiment of the new era of cognitive systems. It represents a new category of solutions that leverages deep content analysis and evidence-based reasoning to accelerate and improve decisions, reduce operational costs, and optimize outcomes. Cognitive Systems offer a whole new way of computing. Keeping pace with the demands of an increasingly complex business environment requires a paradigm shift in what we should expect from IT. We need an approach that recognizes today’s realities and treats them as opportunities rather than challenges.
Main point: At the core of what makes Watson different are three powerful technologies - natural language, hypothesis generation, and evidence based learning. But Watson is more than the sum of its individual parts. Watson is about bringing these capabilities together in a way that’s never been done before resulting in a fundamental change in the way businesses look at quickly solving problems
Further speaking points:. Looking at these one by one, understanding natural language and the way we speak breaks down the communication barrier that has stood in the way between people and their machines for so long. Hypothesis generation bypasses the historic deterministic way that computers function and recognizes that there are various probabilities of various outcomes rather than a single definitive ‘right’ response. And adaptation and learning helps Watson continuously improve in the same way that humans learn….it keeps track of which of its selections were selected by users and which responses got positive feedback thus improving future response generation
Additional information: The result is a machine that functions along side of us as an assistant rather than something we wrestle with to get an adequate outcome
This section introduces the Big Data analytics reference model and serves as an introduction into the use case scenarios, which illustrate the various stages of analytics
Best in Breed Analytics Placed On Top
Fuel all decision-making with powerful analytics & analytic adoption without silos
Analyze all data wherever it lives
Accelerate business value with solutions that leverage all data types, with predictive insight to let you know what has happened, what is happening and what is likely to happen next
Delivering optimized decisions at point of impact through business applications
Empower end business users with the information to deliver the best decision every time
All touchpoints are managed in real time, via the appropriate channel
Feedback loop ensures all decisions are accurate, dynamic
Business rules integrated with analytics and optimization
All of these different technologies come together (“an integrated platform”) to create decision services for the different LOB areas (e.g. marketing)
The depicted Big Data analytics reference model serves as an introduction into this section, and illustrates the key capabilities. In presenting this chart, explain the capabilities in the order listed here:
Heterogeneous data sources
Data transformation and integration layer
Data persistency layer
Business analytics and application layer
Visualization and reporting layer
Infrastructure services
Highlight the message that these capability categories need to be addressed in every project. The focus, however, can vary depending on specific project requirements.
This figures describes the Big Data analytics reference model in just a slightly different way and lists the different technical capabilities within the various layers. We are listing essentially the same components as on the previous chart and highlight the breadth of different technical capabilities that each component - or layer – needs to be comprised of. Point out the fact that not all capabilities need to be included in every project. The concrete sublist of technical capabilities is derived from the concrete requirements and set of use cases for an individual project.
This chart contains a product mapping to the Big Data analytics reference model, which has been further customized for CSPs (Communication Service Providers). This chart and the 2 previous ones also serve as an introduction to the examples that are described in the following section. The presenter should make himself familir with all products and tools that are referenced in this product mapping chart.
This section describes – at a very high-level – sample projects for all 4 analytics areas: descriptive, predictive, prescriptive, and cognitive. All examples and corresponding use case sceanrios in this section are derived from real customer engagements in Asia Pacific.
This is an example of descriptive analytics project, where a Telco Service Provider is interested in competitive analytics based on CDR (Call Detail Records). Analysis of CDR records was optionally enhanced with analytics from social media networks. The data sources are depicted on the left side of the chart. The component in the center of the chart is comprised of the core capabilities that are derived from BigInsights and BigInsights applications, such as Customer Modeler (an IBM Research asset). Analytical insight is derived from the combined components in this central box. The analytics can optionally be enhanced with SPSS to deliver predictive analytics. In the real customer project, however, this was not part of the use case scenario.
The left side of the chart illustrates the data warehouse and the descriptive BI analytics component, Cognos BI.
This example also illustrates that descriptive analytics is very much a part of Big Data. CDR records are very large in volume, semi-structured, and especially the combination with non-structured data from social media networking sites, makes descriptive analytics very well Big Data use case scenario. It illustrates the changing paradigm that descriptive analytics plays in Big Data.
This sample project is geared towards determining demographics information for unknown pre-paid subscribers. The 1st main heading on the chart (gain analytical insight for pre-paid demographics) explains the logical flow and main steps that need to be performed. The 2nd main heading (required data sources) lists the data srouces such as voice and data CDRs, behavioral data and so forth. This is also a very nice example, which illustrates that predictive analytics – as well as descriptive analytics – are part of Big Data, ie. can be seen from a Big Data angle.
The main step in this analytics flow is to predict demographics information for pre-paid subscribers by correlating and mapping post-paid with pre-paid subscribers.
This sample project is further described on the following 2 charts with:
a contextual diagram and
an architecture overview diagram
This chart contains a high-level contextual digram with the key components, such as the data sources on the left, the cloud-based analytics system that leverages the IBM SmartCloud at IBM Singaporeand the key products on the right of the chart.
The blue figure at the lower right corner is an illustration of the analytics and admin roles and responsibilities that exist in operating the environment.
The yellow figure at the left upper corner illustrates the LoB user using the system and deriving to predictive insight.
This chart contains an architecture overview diagram that contains the key components and the component interaction at a high-level.
Public data sources: will be used in the scenario to gain analytical insight and to leverage existing categorization of for instance websites that are visited by subscribers
Post-paid data sources: will be used to understand preferences, interest, websites visited, performing micro-segmentation, etc. for post-paid subscribers
Prep-paid data sources: the same data sources will be used from pre-paid subscribers, where the same analytical insight is derived for this category of subscribers
Post-paid demographics information: will be used and correlated with the analytical insight that is derived from post-paid data sources. This allows a comprehensive view on post-paid subscribers, which includes knowledge on demographics.
The analytics engine – depicted in the centre of the chart is used to correlate post- with pre-paid segments, clusters, behavior, interest, … and map known demographics for post-paid to corresponding pre-paid subscribers. This will allow prediction of demographics for pre-paid subscribers, e.g. age, gender, income, and other demographics measures.
Client Name:
XO Communications
Case study Link:
http://www-01.ibm.com/software/success/cssdb.nsf/CS/STRD-9E4L7Y
Pull Quote:
"We are only just starting to realize the true potential that IBM analytics holds across the business."
—Bill Helmrath, Director of Business Intelligence,
XO Communications
Company Background:
XO Communications is one of the United States’ largest communications service providers, offering a comprehensive portfolio of communications, network and hosted IT services through a 19,000-mile nationwide inter-city network and over 1,000 office locations. Priding itself on superior customer experience, the company is always looking for ways to raise the bar.
Solution components:
Software
• IBM® SPSS® Analytics Catalyst
• IBM SPSS Modeler
• IBM SPSS Modeler Server
• IBM SPSS Statistics
• IBM InfoSphere® BigInsights™
Business challenge:
XO Communications had already taken the first steps in identifying customer retention risks through analytics; now it wanted to seize the opportunity to put these insights into action more effectively.
The benefit:
142 percent estimated reduction in revenue erosion for customers at most risk of churning.
$10 million+ estimated savings per year from increased customer retention and reduced customer service costs
5 months to achieve full return on investment
Link to reference profile: http://w3-01.ibm.com/sales/ssi/cgi-bin/ssialias?infotype=CR&subtype=NA&htmlfid=0CRDD-8C53TV&appname=crmd
Solution synopsis
A global provider of information management and electronic commerce services for financial institutions in the United States anticipates increased revenue and increased competitive edge when it works with IBM Global Technology Services - Integrated Technology Services and IBM Software Services for Information Management to develop a powerful predictive analytics service for small to midsize banks comprising IBM Power Systems technology and IBM Information Management software
This chart describes at high-level a sentiment analytics project with ABS-CBN in the Philippines.
The objective of the project was to analyze social media about election candidates and the issues that impact them:
Buzz - Candidates, topics, personalities, broadcastersHow much / what is being said about the candidates (ongoing and for key "events" like debates, advertisements, etc), different shows, news anchors. How does this change over time – what is trending.
Sentiment – Popular OpinionWhat do voters like or dislike about the candidates, the parties, campaigns, constituents, etc?How does this sentiment break down by the different groups (voters, political affiliation, news professionals, demographics, affinity groups, etc)?Understand brand sentiment – whether ABS CBN are being perceived as being unbiased and trusted. How are the different news personalities being perceived – credible, neutral & fair.
Intent - What is the intent to act (support / vote) for each candidate?What election outcomes can be predicted (shifts in candidate sentiment, voter intent, etc)?
Main point: How does Watson work? It’s not a simple answer. But since Watson solutions are built on a set of repeatable assets that draw from decades of market leadership, research, and best practices. Beginning at the bottom:
Watson solutions are implemented with customers with a full lifecycle of readiness preparation, building the solution itself, teaching Watson about the industry, use case, and data involved, and finally putting it into production during which it continues to improve through experience and feedback loops
The basic platform of Watson operation is built on a core of ingestable natural language content, tooling to train and utilize Watson’s functionality, proven methods of successful lifecycle operation, algorithms for analytic parsing of language and identification of responses, and APIs to allow other modular functionality to interact with Watson.
Built on this platform of core function is a set of capabilities used across Watson solutions. These include natural language processing capabilities to understand human communication (both from a user interface perspective and more importantly, as a source of information upon which to draw for evidence-based responses) and machine learning capabilities to learn from experience. Data is the fuel of Watson’s engine and a currated data corpus of structured and unstructured data is where Watson draws for evidence in its responses. Watson draws on IBMs’ leadership in analytics (predictive, business, etc.) to find patterns and relationships invisible to the naked eye. Watson solutions use cloud-based delivery to help scale their reach, optimize utilization of the infrastructure required, and help improve accessibility for users. With cloud-based delivery comes mobile accessibility since processing requirements on the user interface device itself are minimized. And finally, Watson infrastructure is optimized for the unique workloads it requires yet Watson runs on commercial off the shelf IBM p-series hardware.
Drawing upon these capabilities are the Ask, Discover, Decide services discussed previously
Actual Watson solutions are developed in close collaboration with industry and domain leaders. IBM has partnered with leaders in healthcare, financial services, and other areas to develop Watson Advisor Solutions to help professionals make better use of available information to improve outcomes. Early brainstorming has led to initial pilots which has led to full production Watson Advisor Solutions, which is leading to expansion into new use cases, industries and domains. The future of Watson and Cognitive Systems is as bold and compelling as the imagination itself.
This chart elaborates on an IBM research effort to use BigInsights as a platform for massive scale Social Network Analytics (SNA).
Further description of X-RIME can be found here: