1. From sports to scientific research, a surprising range
of industries will begin to find value in big data.....
2. Digital Health Technologies
These are some of the most important DIGITAL HEALTH CATEGORIES.....
• Digital Imaging – (MRI / CTI / X-Ray / Ultrasound)
• Robotic Surgery – (Microsurgery / Remote Surgery)
• Patient Monitoring – (Clinical Trials / Health / Wellbeing)
• Biomedical Data – (Data Streaming / Biomedical Analytics)
• Epidemiology – (Disease Transmission / Contact Management)
• Emergency Incident Management – (Response Teams / Alerts and Alarms)
Here are a few of the most important DIGITAL MONITORING SMART APPS.....
• Activity Monitor – (Pedometer / GPS)
• Position Monitor – (Falling / Fainting / Fitting)
• Breathing Monitor – (Breathing Rate / SATS Level)
• Sleep Monitor – (Light Sleep / Deep Sleep / REM / Apnoea)
• Blood Monitor – (Glucose / Oxygen / Hormones / Organ Function)
• Cardiac Monitor – (Heart Rhythm / Blood Pressure / Cardiac Events)
3. Digital Health Technologies
These are some of the most influential FUTURE DIGITAL HEALTH leaders: -
– Huawei - John Frieslaar (Digital Futures)
– Cisco - Andrew Green (Digital Healthcare)
– ElationEMR - Kyna Fong (Digital Imaging)
– Microsoft - John Coplin (Digital Healthcare)
– Google - Eze Vidra (Head of Campus at Tech City)
– GE Healthcare - Catherine Yang (Digital Healthcare)
– MIT – Prof Alex “Sandy” Pentland (Digital Epidemiology)
– Telefónica Digital – Mathew Key – CEO (Digital Healthcare)
– Open University – Dr. Blain Price (Digital Patient Monitoring)
– UCLA – Prof. Larry Smarr (FuturePatient – Digital Patient Monitoring)
– Telefónica – Dr. Mike Short CBE (Digital Futures and the Smart Ward)
– Thames Valley Health Innovation and Education Cluster – David Doughty
– Department of Business, Industry & Skills – Richard Foggie, KTN Executive
– Science City Research Alliance – Sarah Knaggs (Strategic Project Manager)
4. Digital Healthcare – Executive Summary
• Digital Healthcare is a cluster of new and emerging applications and technologies that exploit digital, mobile
and cloud platforms for treating and supporting patients. The term "Digital Healthcare" is necessarily broad
and generic as this novel and exciting Bioinformatics and Medical Analytics innovation driven approach is
applied to a very wide range of social and health problems - from monitoring patients in intensive care,
general wards, in convalescence or at home – to helping general practitioners make better informed and
more accurate diagnoses, improving the effect of prescription and referral decisions for clinical treatment.
• Bioinformatics and Medical Analytics utilises Data Science to provide actionable clinical insights. Digital
Healthcare has evolved from the need for more proactive and efficient healthcare service delivery, and
seeks to offer new and improved types of pro-active and preventive monitoring and medical care at reduced
cost – using methods that are only possible thanks to emerging SMAC Digital Technology.
Digital Healthcare Technologies – Bioinformatics and Medical Analytics: -
– Digital Patient Monitoring •
– Biomedical Data Streaming •
– Biomedical Data Science and Analytics •
– Epidemiology, Clinical Trials, Morbidity and Actuarial Outcomes •
• Novel and emerging high-impact Biomedical Health Technologies such as Bioinformatics and Medical
Analytics are transforming the way that Healthcare Service Providers can deliver Digital Healthcare globally
– Digital Health Technology entrepreneurs, investors and researchers becoming increasingly interested in
and attracted to this important and rapidly expanding Life Sciences industry sector.
5.
6. Digital Healthcare – Executive Summary
• While many industries can benefit from SMAC digital technology – Smart Devices, Mobile Platforms,
Analytics and the Cloud – this is especially the case for Life Sciences, Pharma and Healthcare
industry sectors – resulting in more accurate diagnosis, improved treatment regimes, more reliable
prognosis, better patient monitoring, care and clinical outcomes. Let’s take a look at some of the
Digital Technologies that are bringing significant improvements and benefits to Healthcare
• Today, thanks to the regulatory compliance requirements for HIPAA, HITEC, PCI DSS and ISO
27001, the reluctance to adopt Digital Technology has been overcome, and Digital Healthcare
adoption is gaining increased traction. Many of the security features required for data protection and
patient confidentiality are being addressed by Digital Healthcare service providers, therefore relieving
healthcare delivery organizations from tedious and complex security and data protection frameworks.
Biomedical Data Analytics:
• The exploitation of data by applying analytical methods such as statistics, predictive and quantitative
models to patient segments or groups of the population will provide better insights and achieve better
outcomes. As far back as 2010, there was evidence that: “93 percent of healthcare providers
identified the digital information explosion as the major factor which will drive organizational change
over the next 5 years.”
(Related article: Cloud and healthcare: A revolution is coming)
7. Digital Healthcare – Executive Summary
Data Security and Privacy:
• Today, thanks to the regulatory compliance requirements for HIPAA, HITEC, PCI DSS and
ISO 27001, reluctance to adopt emerging technologies is starting to be addressed and digital
technology is beginning to gain traction - bear in mind also that many of the security features
required for data security and protection are addressed by the service providers, therefore
relieving the healthcare organization from tedious and complex security frameworks.
Mobility:
• Mobility Services, where Smart Devices, Smart Apps, Mobile Platforms and Cloud
Infrastructure is providing the backbone for medical personnel to access all sorts of patient
information from any place, any where - and from a wide range of mobile devices.
Collaboration with patients:
• Mobility means that complete patient records are now available to healthcare professionals
anytime, anywhere – allowing physicians to access historical patient case records , images
and clinical data to fine-tune their diagnosis and make informed decisions on treatment –
thus reducing diagnosis latency, increasing accuracy and improving patient care and clinical
outcomes from initial consultation to specialist referrals. Some scenarios are illustrated in
the following: -
• Physician Collaboration Solutions (PCS) •
• PCS solutions offers video conferencing to facilitate remote consultations and care
continuity, allowing patients to be viewed remotely. PCS allows physicians to consult with
patients and even perform remote robotic surgery. This is dubbed “tele-health solutions.”
8. Digital Healthcare – Executive Summary
• Electronic Medical Records (EMR) •
• Every piece of information pertaining to a specific is recorded and stored. The solution is
designed to capture and provide a patient’s data at any time of the patient’s monitoring
cycle, including the complete medical records and history.
• Patient Information Exchange (PIE) •
• This allows for the healthcare information to be shared electronically across organizations
within a region, community or hospital system. There are currently several Digital
Healthcare cloud service providers addressing this market, taking the role of collecting and
distributing medical information from and among multiple organizations.
• The New York Times has published an interesting article illustrating the use of the cloud
in healthcare - leveraging big data in the cloud to manage patient relationships and clinical
outcomes.
Collaboration among peers:
• Technology can provide medical assistance to doctors in the field, b e it in remote areas or
in emergency relief operations through satellite communications. Refer to the Remote
Assistance for Medical Teams Deployed Abroad (T4MOD project) which could easily
find its place in the Digital Healthcare cloud space.
10. GIS Mapping and Spatial Analysis
• 4D Geospatial Analytics is the
Geographic profiling and analysis of
large aggregated datasets in order to
determine a ‘natural’ structure of
clusters or groupings – this provides an
important basic technique for many
statistical and analytic applications.
• Environmental and Demographic
Geospatial Cluster Analysis – based
on geographic distribution or profile
similarities – is a statistical method
whereby no prior assumptions are
made concerning the nature of internal
data structures (the number and type of
groups and hierarchies). Geo-spatial
and geodemographic techniques are
frequently used in order to profile and
segment populations using ‘natural’
groupings such as shared or common
behavioural traits – Medical, Clinical
Trial, Morbidity or Actuarial outcomes -
along with many other common factors
and shared characteristics.....
11. GIS Mapping and Spatial Analysis
• GIS MAPPING and SPATIAL DATA ANALYSIS •
• A Geographic Information System (GIS) integrates hardware, software and digital data
capture and streaming devices – including machine generated data capture such as Computer-
aided Design (CAD) information from land and building surveys, Global Positioning System
(GPS) terrestrial location data, wearable technology and biomedical data streams – in order to
acquire, manage, analyse, distribute, communicate and display every type of static and mobile
geographically dependant location data, along with data streams such as imaging data feeds –
including personal, transportation and environment , HDCCTV, aerial and satellite image data.....
• Spatial Data Analysis is a set of techniques for analysing 3-dimensional spatial (Geographic)
data and location (Positional) object data overlays. GIS Software that implements spatial data
analysis techniques requires access to both the locations of objects and their physical attributes.
Spatial statistics extends traditional statistics to support the analysis of geographic data. Spatial
Data Analysis provides techniques to describe the distribution of data in a geographic space
(descriptive spatial statistics), analyse the spatial patterns of the data (spatial pattern or cluster
analysis), identify and measure spatial relationships (clusters and spatial regression), and create
3D surface models from sampled data (spatial interpolation, often categorised as geo-statistics).
• The results of spatial data analysis are largely dependent upon the type, quantity,
distribution and data quality of the spatial objects which are subject to analysis…
17. The Cone™ - Eight Primitives
Primitive Domain Function Product
Who ? People – Patient Patient Information System Electronic Medical
Records (EMR)
Where ? Places – Location 1st Responders, Emergency
Services, GP, Nurse, Doctor
Command / Control /
Geospatial Analytics
When ? Medical Incident / Event Event Type - Referral, Walk-in,
Appointment, Emergency
Incident Management –
Event Type / Time / Date
What ? Emergency / Medical /
Clinical Procedure
Investigate / Test / Diagnose /
Treatment / Follow-up
Patient Administration /
Patient Care Systems
Why ? Reason / Motivation /
Cause / Outcome
Triage Patient Status - Acute,
Chronic, Casual, Indifferent
Biomedical Information
Streaming and Analytics
How ? Patient Medical Data Automatic Streaming of
Biomedical Data to Cloud
Mobile Platforms / IoT,
Smart Devices / Apps
Which ? Investigation / Test /
Observe / Diagnosis
Healthcare Provider - GPs
Surgery, Clinics, Hospitals
Patient Administration /
Patient Care Systems
Via ? Referral Channel / Health
Service Delivery Partner
Healthcare Service Provider –
Surgery, Clinics, Hospitals
Healthcare Service
Partner / Procedure
18. The Patient Cone™ – EIGHT PRIMITIVES
Event
Dimension
Party
Dimension
Geographic
Dimension
Motivation
Dimension
Time
Dimension
Data
Dimension
Cone™
MEDICAL
FACT
WHO ? WHAT ? WHERE ?
HOW ?WHEN ?WHY ?
• Indifferent
• Casuals
• Chronic
• Acute
• Clinical Notes
• Images / Graphs
• Biomedical Data
• Lab Test Results
• Cardiac Activity
• Brain Activity
• Consultation
• Clinical Tests
• Diagnosis
• Treatment
• Appointment
• Attendance
• Phone Call
• Letter
• Location
• Attitude
• Movement
• Region / Country
• State / County
• City / Town
• Street / Building
• Postcode
• Person
• Organisation
Procedure
Dimension
WHICH ?
• Procedure
• Prescription
Channel
Dimension
VIA ?
• Channel / Partner
• Hospital / Clinic
Patient Data
Delivery Channel
Environment
Data
Subject
Location
Biomedical Data
Event
• Walk-in
• Emergency
• Referral
• Follow-upMotivation
Patient
Time / Date
Version 3 –
Healthcare
19. The Biomedical Cone™
Converting Data Streams into Actionable Insights
Salesforce
Anomaly 42
Cone
Unica
End User
BIG DATA
ANALYTICS
BIOMEDICAL DATA
Patient Monitoring
Platform
INTERVENTION
• Treatment
• Smart Apps
The Cone™ Patient
Biomedical Analytics
Actionable Medical Insights
Electronic Medical Records
(EMR)
• Individuals
• Households
• Geo-demographics
• Patient Streaming
• Patient Segmentation
PATIENT RECORDS
• Medical History
• Key Events
Insights
InsightsInsights
Anomaly
42
Unica
Biomedical
Data Streaming
People, Places
and Events
Health
Campaigns
• Clinical and Biomedical Data
• Images – X-Ray, CTI, MRI
• Procedures and Interventions
• Prescriptions and Treatment
Social
Media
Monitoring
EXPERIAN
MOSAIC
20. Proof-of-concept and Prototype
The Patient Pyramid™ approach is lean, agile, smart and creative: -
• We start by providing a custom Pyramid™ Enterprise Application as a proof of concept.
We then work with client key stakeholders to scope a detailed brief which articulates a
business problem domain that the Patient Pyramid™ can help understand and resolve.
• We then harvest all current and past patient records along with any other available internal
and public domain biomedical data – in order to establish a baseline Patient Pyramid™.
• This is augmented by overlaying external data - Social Intelligence and other live
streamed Biomedical and Patient Lifestyle Data that drives our new real-time Patient
Pyramid™ view describing the six primitives - who / what / why / where / when and how.
• Finally, we exploit social intelligence for Patient Lifestyle Understanding – creating new
actionable insights to inform creative medical campaign solutions against the agreed brief.
• Post proof-of-concept, we can then agree a Pyramid™ Enterprise Application fixed term
licence along with Patient Pyramid™ add-ons, enhancements, consulting, mentoring,
training and support – on-line, on-site, on-demand - whenever and wherever required.
21. 4D Geospatial Analytics in
Digital Healthcare
Digital Futures: - Creating new roles and value chains
Novel and emerging Biomedical Health Technologies are transforming the way that
Healthcare Providers can deliver Healthcare globally – with Digital Health
Technology entrepreneurs and investors becoming increasingly attracted to this
rapidly growing industry sector.
Healthcare Delivery is currently undergoing a global transformation – with Digital
Health Technologies leading the way. Companies such as BT Health, Blueprint
Health, BUPA, Microsoft (John Coplin), Telefonica Digital (Dr. Mike Shaw) and
Rockhealth - are all shaping novel and emerging Digital Healthcare Technologies -
bringing new and innovative business propositions to market.
22. 4D Geospatial Analytics
Geo-spatial and geodemographic
techniques are frequently used to
profile, stream and segment human
populations using ‘natural’ groupings
such as shared or common
behavioural traits – Medical, Clinical
Trial, Morbidity or Actuarial outcomes
– along with many other common
factors and shared characteristics.....
The profiling and analysis of large
aggregated datasets in order to
determine a ‘natural’ structure of
clusters or groupings, provides an
important basic technique for many
statistical and analytic applications.
Based on geographic distribution or
profile similarities – Geospatial
Clustering is a statistical method
whereby no prior assumptions are
made concerning the nature of
internal data structures (the number
and type of groups and hierarchies).
24. The Flow of Information through Time
• Space-Time is a four-dimensional (4D) integrated dimensional cluster consisting of the
three Spatial dimensions (x, y and z axes) plus Time (the fourth dimension - t). Space-
Time exists in discrete packages (Temporal Planes) - with the whole of Space-Time
existing as an endless stack of Temporal Planes extending from the remote Past, through
into our Present, and onwards to the distant Future. Events exist as a line through this
stack of Temporal Planes. Thus Time Present is always inextricably woven into both Time
Past and Time Future. Every item of Global Content in the Present is somehow connected
with both Past and Future temporal planes in a timeline which is composed of a sequence
of temporal planes stacked one on top of another. The “arrow of time” governs the flow of
Space-Time which can only flow in a single direction - relentlessly towards the future.
• Space-Time does not flow uniformly – the path of the “arrow of time” may be deflected or
changed by various factors – gravitational fields, dark matter, dark energy, dark flow,
hidden dimensions or unknown Membranes in Hyperspace. There may also exist “hidden
external forces” (unseen interactions) that create disturbance in the temporal plane stack
which marks the passage of time - with the potential to create eddies, vortices and
whirlpools along the trajectory of Time (chaos, disorder and uncertainty) – which in turn
posses the capacity to generate ripples and waves (randomness and disruption) – thus
changing the course of the Space-Time continuum. “Weak Signals” are “Ghosts in the
Machine” – echoes of these subliminal temporal interactions – that may contain within
insights or clues about possible future “Wild card” or “Black Swan” random events
25. The Flow of Information through Time
• String Theory physicists and mathematicians postulate that Space-Time exists in discrete
packages (Temporal Planes) - with the whole of Space-Time existing as an endless stack
of Temporal Planes extending from the remote Past, through into our Present, and
onwards to the distant Future. Thus Time Present is always inextricably woven into both
Time Past and Time Future. This yields the intriguing possibility of glimpses through the
mists of time into the outcomes of future Event Paths – both isolated Events and linked
Event Clusters – as any item of Data or Information (Global Content) may contain faint
traces which offer insights into the future trajectory of Past, Present and Future Events.
• If all future timelines were linear in nature - then every event would unfold in an unerringly
predictable manner towards a known and certain conclusion. The future is, however, both
unknown and unknowable (Hawking Paradox). Events exist as a line through this stack of
Temporal Planes. Future timelines are non-linear (branched) with an infinite multitude of
possible alternative futures – rendering future outcomes as uncertain and unpredictable.
Chaos Theory suggests to us that even the most ethereal and subliminal system inputs
originating from invisible random events in the Space-Time continuum, are able to project
minute unknown forces so small as to be undetectable, which may then simply disappear
– or become amplified over time through numerous system cycles to grow in influence and
impact – slowly deviating predicted Space-Time trajectories far away from their original
estimated path – thus fundamentally altering the flow and outcome of Future Events.
26. 4D Geospatial Analytics – The Temporal Wave
• The Temporal Wave is a novel and innovative method for Visual Modelling and Exploration
of Geospatial “Big Data” - simultaneously within a Time (history) and Space (geographic)
context. The problems encountered in exploring and analysing vast volumes of spatial–
temporal information in today's data-rich landscape – are becoming increasingly difficult to
manage effectively. In order to overcome the problem of data volume and scale in a Time
(history) and Space (location) context requires not only traditional location–space and
attribute–space analysis common in GIS Mapping and Spatial Analysis - but now with the
additional dimension of time–space analysis. The Temporal Wave supports a new method
of Visual Exploration for Geospatial (location) data within a Temporal (timeline) context.
• This time-visualisation approach integrates Geospatial (location) data within a Temporal
(timeline) data along with data visualisation techniques - thus improving accessibility,
exploration and analysis of the huge amounts of geo-spatial data used to support geo-visual
“Big Data” analytics. The Temporal Wave combines the strengths of both linear timeline
and cyclical wave-form analysis – and is able to represent data both within a Space
(geographic) and Time (history) context simultaneously – and even at different levels of
granularity. Linear and cyclic trends in space-time data may be represented in combination
with other graphic representations typical for location–space and attribute–space data-
types. The Temporal Wave can be used in multiple roles for exploring very large scale
datasets containing Geospatial (location) data within a Temporal (timeline) context - as an
integrated Space-Time data reference system, as a Space-Time continuum representation
and animation tool, and as Space-Time interaction, simulation and analysis tool.
27. 4D Geospatial Analytics – The Temporal Wave
• The problems encountered in exploring, analysing and extracting insights from the vast
volumes of spatial–temporal information in today's data-rich landscape are becoming
increasingly difficult to manage effectively. In order to overcome the problem of data
volume and scale in an integrated Time (history) and Space (location) context requires
not only traditional location–space and attribute–space analysis common in GIS Mapping
and Spatial Analysis - but now with the additional dimension of Space-Time analysis. The
Temporal Wave supports a new method of Visual Exploration for Geospatial (location)
data within a Temporal (timeline) context. The Temporal Wave is a novel and innovative
method for Visual Modelling, Exploration and Analysis of the Space-Time dimension
fundamental to understanding Geospatial “Big Data” – through simultaneously visualising
and displaying complex data within a Time (history) and Space (geographic) context.
Simplexity
Ordered
Complexity
Disordered
Complexity
Complex Adaptive
Systems (CAS)
Linear
Systems
ComplexitySimplicity (increasing element and interaction density)
ChaosOrder
EntropyEnthalpy The “arrow of time”
28. 4D Geospatial Analytics – The Temporal Wave
• The Temporal Wave time-visualisation approach integrates Geospatial (location) data
within a Temporal (timeline) dataset - along with other data visualisation techniques - thus
improving accessibility, exploration and analysis of the huge amounts of geo-spatial data
used to support geo-visual “Big Data” analytics. The Temporal Wave combines the
strengths of both linear timeline and cyclical wave-form analysis – and is able to represent
complex data both within a Time (history) and Space (geographic) context simultaneously
– even at different levels of granularity. Linear and cyclic trends in space-time data may be
represented in combination with other graphic representations typical for location–space
and attribute–space data-types. The Temporal Wave can be deployed and used in roles
as diverse as a Space-Time data reference system, as a Space-Time continuum
representation tool, and as Space-Time display / interaction / simulation / analysis tool.
Simplexity
Ordered
Complexity
Disordered
Complexity
Complex Adaptive
Systems (CAS)
Linear
Systems
ComplexitySimplicity (increasing element and interaction density)
ChaosOrder
EntropyEnthalpy The “arrow of time”
31. 4D Geospatial Analytics – London Timeline
• How did London evolve from its creation as a Roman city in 43AD into the
crowded, chaotic cosmopolitan megacity we see today? The London Evolution
Animation takes a holistic view of what has been constructed in the capital over
different historical periods – what has been lost, what saved and what protected.
• Greater London covers 600 square miles. Up until the 17th century, however,
the capital city was crammed largely into a single square mile which today is
marked by the skyscrapers which are a feature of the financial district of the City.
• This visualisation, originally created for the Almost Lost exhibition by the Bartlett
Centre for Advanced Spatial Analysis (CASA), explores the historic evolution of
the city by plotting a timeline of the development of the road network - along with
documented buildings and other features – through 4D geospatial analysis of a
vast number of diverse geographic, archaeological and historic data sets.
• Unlike other historical cities such as Athens or Rome, with an obvious patchwork
of districts from different periods, London's individual structures scheduled sites
and listed buildings are in many cases constructed gradually by parts assembled
during different periods. Researchers who have tried previously to locate and
document archaeological structures and research historic references will know
that these features, when plotted, appear scrambled up like pieces of different
jigsaw puzzles – all scattered across the contemporary London cityscape.
32. History of Digital Epidemiology
• Doctor John Snow (15 March 1813 – 16
June 1858) was an English physician and a
leading figure in the adoption of anaesthesia
and medical hygiene. John Snow is largely
credited with sparking and pursuing a total
transformation in Public Health and epidemic
disease management and is considered one
of the fathers of modern epidemiology in part
because of his work in tracing the source of
a cholera outbreak in Soho, London, in 1854.
• John Snows’ investigation and findings into
the Broad Street cholera outbreak - which
occurred in 1854 near Broad Street in the
London district of Soho in England - inspired
fundamental changes in both the clean and
waste water systems of London, which led to
further similar changes in other cities, and a
significant improvement in understanding of
Public Health around the whole of the world.
33. History of Digital Epidemiology
• The Broad Street cholera outbreak of
1854 was a major cholera epidemic or
severe outbreak of cholera which
occurred in 1854 near Broad Street in
the London district of Soho in England .
• This cholera outbreak is best known for
statistical analysis and study of the
epidemic by the physician John Snow
and his discovery that cholera is spread
by contaminated water. This knowledge
drove improvement in Public Health with
mass construction of sanitation facilities
from the middle of the19th century.
• Later, the term "focus of infection" would
be used to describe factors such as the
Broad Street pump – where Social and
Environmental conditions may result in
the outbreak of local infectious diseases.
34. History of Digital Epidemiology
• It was the study of
cholera epidemics,
particularly in
Victorian England
during the middle of
the 19th century,
which laid the
foundation for
epidemiology - the
applied observation
and surveillance of
epidemics and the
statistical analysis of
public health data.
• This discovery came
at a time when the
miasma theory of
disease transmission
by noxious “foul air”
prevailed in the
medical community.
35. History of Digital Epidemiology
Modern epidemiology has its origin with the study of Cholera
Broad Street cholera outbreak of 1854
36. History of Digital Epidemiology
Modern epidemiology has its origin with the study of Cholera.
• It was the study of cholera epidemics, particularly in Victorian England
during the middle of the 19th century, that laid the foundation for the science
of epidemiology - the applied observation and surveillance of epidemics and
the statistical analysis of public health data. It was during a time when the
miasma theory of disease transmission prevailed in the medical community.
• John Snow is largely credited with sparking and pursuing a transformation in
Public Health and epidemic disease management from the extant paradigm
in which communicable illnesses were thought to have been carried by
bad, malodorous airs, or "miasmas“ - towards a new paradigm which would
begin to recognize that virulent contagious and infectious diseases are
communicated by various other means – such as water being polluted by
human sewage. This new approach to disease management recognised that
contagious diseases were either directly communicable through contact with
infected individuals - or via vectors of infection (water, in the case of cholera)
which are susceptible to contamination by viral and bacterial agents.
37. History of Digital Epidemiology
• This map is John Snow’s
famous plot of the 1854
Broad Street Cholera
Outbreak in London. By
plotting epidemic data on a
map like this, John Snow
was able to identify that the
outbreak was centred on a
specific water pump.
• Interviews confirmed that
outlying cases were from
people who would regularly
walk past the pump and
take a drink. He removed
the handle off the water
pump and the outbreak
ended almost overnight.
• The cause of cholera
(bacteria Vibria cholerae)
was unknown at the time,
and Snow’s important work
with cholera in London
during the 1850s is
considered the beginning of
modern epidemiology.
Some have even gone so
far as to describe Snow’s
Broad Street Map as the
world’s first GIS.
39. Clinical Risk Types
Clinical Risk Types
Clinical
Risk Group
Employee
or Service
Provider
Patient
B
A
Human
Risk Process
Risk
D
Morbidity Risk Types
Morbidity
Risk Group
C
Legal
Risk
F
3rd Party
Risk
G
C
Technology
Risk
Trauma
Risk
E
Morbidity Risk
H E
J
G
A
I D
Immunological
System Risk
Sponsorship
Risk
Stakeholders
Disease
Risk
Shock
Risk
Cardiovascular
System Risk
Pulmonary
System Risk
Toxicity
Risk
Organ Failure
Risk
- Airways
- Cognitive
- Bleeding
Triage Risk
- Performance
- Finance
- Standards
Compliance Risk
H
Patient
Risk
Neurological
System Risk
F
B
Predation
Risk
Environment
Risk
Patients
42. • Case Study • Pandemics
• Pandemics - during a pandemic episode, such as the recent Ebola outbreak, current
policies emphasise the need to ground decision-making on empiric evidence. This section
studies the tension that remains in decision-making processes when their is a sudden and
unpredictable change of course in an outbreak – or when key evidence is weak or ‘silent’.
• The current focus in epidemiology is on the ‘known unknowns’ - factors with which we
are familiar in the pandemic risk assessment processes. These risk processes cover, for
example, monitoring the course of the pandemic, estimating the most affected age groups,
and assessing population-level clinical and pharmaceutical interventions. This section
looks for the ‘unknown unknowns’ - factors with a lack of, or silence, of evidence, which
we have only limited or weak understanding in the pandemic risk assessment processes.
• Pandemic risk assessment shows, that any developing, new and emerging or sudden and
unpredictable change in the pandemic situation does not accumulate a robust body of
evidence for decision making. These uncertainties may be conceptualised as ‘unknown
unknowns’, or “silent evidence”. Historical and archaeological pandemic studies indicate
that there may well have been evidence that was not discovered, known or recognised.
This section looks at a new method to discover “silent evidence” - unknown factors - that
affect pandemic risk assessment - by focusing on the tension under pressure that impacts
upon the actions of key decision-makers in the pandemic risk decision-making process.
44. Pandemic Black Swan Events
Black Swan Pandemic Type / Location Impact Date
Malaria
For the entirety of human history,
Malaria has been a pathogen
The Malaria pathogen kills more
humans than any other disease
20 kya – present
Smallpox (Antonine Plague) Smallpox Roman Empire / Italy Smallpox is the 2nd worst killer 165-180
Black Death (Plague of Justinian) Bubonic Plague – Roman Empire 50 million people died 6th century
Black Death (Late Middle Ages) Bubonic Plague – Europe 75 to 200 million people died 1340–1400
Smallpox Amazonian Basin Indians 90% Amazonian Indians died 16th century
Tuberculosis Western Europe, 18th - 19th c 900 deaths per 100,000 pop. 18th - 19th c
Syphilis Global pandemic – invariably fatal 10% of Victorian men carriers 19th century
1st Cholera Pandemic Global pandemic Started in the Bay of Bengal 1817-1823
2nd Cholera Pandemic Global pandemic (arrived in London in 1832) 1826-1837
Spanish Flu Global pandemic 50 million people died 1918
Smallpox Global pandemic 300 million people died in 20th c Eliminated 20th c
Poliomyelitis Global pandemic
Contracted by up to 500,000
persons per year 1950’s/1960’s
1950’s -1960’s
AIDS Global pandemic – mostly fatal 10% Sub-Saharans are carriers Late 20th century
Ebola West African epidemic – 50% fatal Sub-Saharan Africa epicentre Late 20th century
45. For the entirety of human history, Malaria has
been the most lethal pathogen to attack man
46. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
1 Malaria Parasitic
Biological
Disease
The Malaria pathogen has killed more humans than any other disease. Malaria
may have been a human pathogen for the entire history of our species. Human
malaria most likely originated in Africa and has coevolved along with its hosts,
mosquitoes and non-human primates. Humans could have originally caught
Plasmodium falciparum from gorillas. The first evidence of malaria parasites are
approximately 30 million years old, found in mosquitoes preserved in amber from
the Palaeogene period.About 10,000 years ago, a period which coincides with the
development of agriculture (Neolithic revolution) - malaria started having a major
impact on human survival. A consequence was natural selection for sickle-cell
disease, thalassaemias, glucose-6-phosphate dehydrogenase deficiency,
ovalocytosis, elliptocytosis and loss of the Gerbich antigen (glycophorin C) and
the Duffy antigen on erythrocytes because such blood disorders confer a selective
advantage against malarial infection (balancing selection). The first description of
malaria dates back 4000 years to 2700 B.C. from China, where ancient writings
refer to symptoms now commonly associated with malaria. Early anti-malarial
treatments were first developed in China from the Quinghao plant, which contains
the active ingredient artemisinin, re-discovered and still used in anti-malaria drugs
today. The three major types of inherited genetic resistance to malaria (sickle-cell
disease, thalassaemias, and glucose-6-phosphate dehydrogenase deficiency)
were all present in the Mediterranean world 2,000 years ago, at the peak of the
Roman Empire. The role of epidemics and disease in the ultimate decline and fall
of the Roman Empire has been largely overlooked by Epidemiology researchers.
47.
48. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
2 Smallpox Viral
Biological
Disease
The history of smallpox holds a unique place in medical history. One of the
deadliest viral diseases known to man, it is the first disease to be treated by
vaccination - and also the only disease to have been eradicated from the
face of the earth by vaccination. Smallpox plagued human populations for
thousands of years. Researchers who examined the mummy of Egyptian
pharaoh Ramses V (died 1157 BCE) observed scarring similar to that from
smallpox on his remains. Ancient Sanskrit medical texts, dating from about
1500 BCE, describe a smallpox-like illness. Smallpox was most likely
present in Europe by about 300 CE. – although there are no unequivocal
records of smallpox in Europe before the 6th century CE. It has been
suggested that it was a major component of the Plague of Athens that
occurred in 430 BCE, during the Peloponnesian Wars, and was described
by Thucydides. A recent analysis of the description of clinical features
provided by Galen during the Antonine Plague that swept through the
Roman Empire and Italy in 165–180, indicates that the probable cause was
smallpox. In 1796, after noting Smallpox immunity amongst milkmaids –
Edward Jenner carried out his now famous experiment on eight-year-old
James Phipps, using Cow Pox as a vaccine to confer immunity to Smallpox.
Some estimates indicate that 20th century worldwide deaths from smallpox
numbered more than 300 million. The last known case of wild smallpox
occurred in Somalia in 1977 – until recent outbreaks in Pakistan and Syria.
49. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
3 Bubonic
Plague
Bacterial
Biological
Disease
The Bubonic Plague – or Black Death – was one of the most devastating
pandemics in human history, killing an estimated 75 to 200 million people
and peaking in Europe in the years 1348–50 CE. The Bubonic Plague is a
bacterial disease – spread by fleas carried by Asian Black Rats - which
originated in or near China and then travelled to Italy, overland along the Silk
Road, or by sea along the Silk Route. From Italy the Black Death spread
onwards through other European countries. Research published in 2002
suggests that the Black Death began in the spring of 1346 in the Russian
steppe region, where a plague reservoir stretched from the north-western
shore of the Caspian Sea into southern Russia. Although there were
several competing theories as to the etiology of the Black Death, analysis of
DNA from victims in northern and southern Europe published in 2010 and
2011 indicates that the pathogen responsible was the Yersinia pestis
bacterium, possibly causing several forms of plague. The first recorded
epidemic ravaged the Byzantine Empire during the sixth century, and was
named the Plague of Justinian after emperor Justinian I, who was infected
but survived through extensive treatment. The epidemic is estimated to have
killed approximately 50 million people in the Roman Empire alone. During
the Late Middle Ages (1340–1400) Europe experienced the most deadly
disease outbreak in history when the Black Death, the infamous pandemic
of bubonic plague, peaked in 1347, killing one third of the human population.
50. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
4 Syphilis Bacterial
Biological
Disease
Syphilis - the exact origin of syphilis is unknown. There are two primary
hypotheses: one proposes that syphilis was carried from the Americas to
Europe by the crew of Christopher Columbus, the other proposes that
syphilis previously existed in Europe but went unrecognized. These are
referred to as the "Columbian" and "pre-Columbian" hypotheses. In late 2011
newly published evidence suggested that the Columbian hypothesis is valid.
The appearance of syphilis in Europe at the end of the 1400s heralded
decades of death as the disease raged across the continent. The first
evidence of an outbreak of syphilis in Europe were recorded in 1494/1495
in Naples, Italy, during a French invasion. First spread by returning French
troops, the disease was known as the “French Pox”, and it was not until
1530 that the term "syphilis" was first applied by the Italian physician and
poet Girolamo Fracastoro. By the 1800s it had become endemic, carried by
as many as 10% of men in some areas - in late Victorian London this may
have been as high as 20%. Invariably fatal, associated with extramarital sex
and prostitution, syphilis was accompanied by enormous social stigma. The
secretive nature of syphilis helped it spread - disgrace was such that many
sufferers hid their symptoms, while others carrying the latent form of the
disease were unaware they even had it. Treponema pallidum, the syphilis
causal organism, was first identified by Fritz Schaudinn and Erich Hoffmann
in 1905. The first effective treatment (Salvarsan) was developed in 1910
by Paul Ehrlich which was followed by the introduction of penicillin in 1943.
51. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
5 Tuberculosis Bacterial
Biological
Disease
Tuberculosis - the evolutionary origins of the Mycobacterium tuberculosis
indicates that the most recent common ancestor was a human-specific
pathogen, which encountered an evolutionary bottleneck leading to
diversification. Analysis of mycobacterial interspersed repetitive units has
allowed dating of this evolutionary bottleneck to approximately 40,000 years
ago, which corresponds to the period subsequent to the expansion of Homo
sapiens out of Africa. This analysis of mycobacterial interspersed repetitive
units also dated the Mycobacterium bovis lineage as dispersing some 6,000
years ago. Tuberculosis existed 15,000 to 20,000 years ago, and has been
found in human remains from ancient Egypt, India, and China. Human
bones from the Neolithic show the presence of the bacteria, which may be
linked to early farming and animal domestication. Evidence of tubercular
decay has been found in the spines of Egyptian mummies, and TB was
common both in ancient Greece and Imperial Rome. Tuberculosis reached
its peak the 18th century in Western Europe with a prevalence as high as
900 deaths per 100,000 - due to malnutrition and overcrowded housing with
poor ventilation and sanitation. Although relatively little is known about its
frequency before the 19th century, the incidence of Scrofula (consumption)
“the captain of all men of death” is thought to have peaked between the end
of the 18th century and the end of the 19th century. With advent of HIV there
has been a dramatic resurgence of tuberculosis with more than 8 million
new cases reported each year worldwide and more than 2 million deaths.
52. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
6 Cholera Bacterial
Biological
Disease
Cholera is a severe infection in the small intestine caused by the bacterium
vibrio cholerae, contracted by drinking water or eating food contaminated
with the bacterium. Cholera symptoms include profuse watery diarrhoea and
vomiting. The primary danger posed by cholera is severe dehydration, which
can lead to rapid death. Cholera can now be treated with re-hydration and
prevented by vaccination. Cholera outbreaks in recorded history have
indeed been explosive and the global proliferation of the disease is seen by
most scholars to have occurred in six separate pandemics, with the seventh
pandemic still rampant in many developing countries around the world. The
first recorded instance of cholera was described in 1563 in an Indian medical
report. In modern times, the story of the disease begins in 1817 when it
spread from its ancient homeland of the Ganges Delta in the bay of Bengal
in North East India - to the rest of the world. The first cholera pandemic
raged from 1817-1823, the second from 1826-1837 The disease reached
Britain during October 1831 - and finally arrived in London in 1832 (13,000
deaths) with subsequent major outbreaks in 1841, 1848 (21,000 deaths)
1854 (15,000 deaths) and 1866. Surgeon John Snow – by studying the
outbreak cantered around the Broad Street well in 1854 – traced the source
of cholera to drinking water which was contaminated by infected human
faeces – ending the “miasma” or “bad air” theory of cholera transmission.
53. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
7 Poliomyelitis Viral
Biological
Disease
The history of poliomyelitis (polio) infections extends into prehistory.
Ancient Egyptian paintings and carvings depict otherwise healthy people
with withered limbs, and children walking with canes at a young age.[3] It is
theorized that the Roman Emperor Claudius was stricken as a child, and this
caused him to walk with a limp for the rest of his life. Perhaps the earliest
recorded case of poliomyelitis is that of Sir Walter Scott. At the time, polio
was not known to medicine. In 1773 Scott was said to have developed "a
severe teething fever which deprived him of the power of his right leg." The
symptoms of poliomyelitis have been described as: Dental Paralysis,
Infantile Spinal Paralysis, Essential Paralysis of Children, Regressive
Paralysis, Myelitis of the Anterior Horns and Paralysis of the Morning.
In 1789 the first clinical description of poliomyelitis was provided by the
British physician Michael Underwood as "a debility of the lower extremities”.
Although major polio epidemics were unknown before the 20th century, the
disease has caused paralysis and death for much of human history. Over
millennia, polio survived quietly as an endemic pathogen until the 1880s
when major epidemics began to occur in Europe; soon after, widespread
epidemics appeared in the United States. By 1910, frequent epidemics
became regular events throughout the developed world, primarily in cities
during the summer months. At its peak in the 1940s and 1950s, polio would
maim, paralyse or kill over half a million people worldwide every year
54. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
8 Typhus Bacterial
Biological
Disease
Typhoid fever (jail fever) is an acute illness associated with a high fever that
is most often caused by the Salmonella typhi bacteria. Typhoid may also be
caused by Salmonella paratyphi, a related bacterium that usually leads to a
less severe illness. The bacteria are spread via deposition in water or food
by a human carrier. An estimated 16–33 million cases of typhoid fever occur
annually. Its incidence is highest in children and young adults between 5 and
19 years old. These cases as of 2010 caused about 190,000 deaths up from
137,000 in 1990. Historically, in the pre-antibiotic era, the case fatality rate of
typhoid fever was 10-20%. Today, with prompt treatment, it is less than 1%.
9 Dysentery Bacterial /
Parasitic
Biological
Disease
Dysentery (the Flux or the bloody flux) is a form of gastroenteritis – a type
inflammatory disorder of the intestine, especially of the colon, resulting in
severe diarrhea containing blood and mucus in the feces accompanied by
fever, abdominal pain and rectal tenesmus (feeling incomplete defecation),
caused by any kind of gastric infection. Conservative estimates suggest
that 90 million cases of Bacterial Dysentery (Shigellosis) are contracted
annually, killing at least 100,000. Amoebic Dysentery (Amebiasis) infects
some 50 million people each year, with over 50,000 cases resulting in death.
55. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
10 Spanish
Flu
Viral
Biological
Disease
In the United States, the Spanish Flu was first observed in Haskell County,
Kansas, in January 1918, prompting a local doctor, Loring Miner to warn the
U.S. Public Health Service's academic journal. On 4th March 1918, army cook
Albert Gitchell reported sick at Fort Riley, Kansas. A week later on 11th March
1918, over 100 soldiers were in hospital and the Spanish Flu virus had now
reached Queens New York. Within days, 522 men had reported sick at the
army camp. In August 1918, a more virulent strain appeared simultaneously
in Brest, Brittany-France, in Freetown, Sierra Leone, and in the U.S, in Boston,
Massachusetts. It is estimated that in 1918, between 20-40% of the worlds
population became infected by Spanish Flu - with 50 million deaths globally.
11 HIV / AIDS Viral
Biological
Disease
AIDS was first reported in America in 1981 – and provoked reactions which
echoed those associated for so long with syphilis. Many of the earliest cases
were among homosexual men - creating a climate of prejudice and moral
panic. Fear of catching this new and terrifying disease was also widespread
among the public. The observed time-lag between contracting HIV and the
onset of AIDS, coupled with new drug treatments, changed perceptions.
Increasingly it was seen as a chronic but manageable disease. The global
story was very different - by the mid-1980s it became clear that the virus had
spread, largely unnoticed, throughout the rest of the world. The nature of this
global pandemic varies from region to region, with poorer areas hit hardest. In
parts of sub-Saharan Africa nearly 1 in 10 adults carries the virus - a statistic
which is reminiscent of the spread of syphilis in parts of Europe in the 1800s.
56. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
12 Ebola Haemorrhagic
Viral
Biological
Disease
Ebola is a highly lethal Haemorrhagic Viral Biological Disease, which has
caused at least 16 confirmed outbreaks in Africa between 1976 and 2015.
Ebola Virus Disease (EVD) is found in wild great apes and kills up to 90% of
humans infected - making it one of the deadliest diseases known to man. It is
so dangerous that it is considered to be a potential Grade A bioterrorism agent
– on a par with anthrax, smallpox, and bubonic plague. The current outbreak
of EVD has seen confirmed cases in Guinea, Liberia and Sierra Leone,
countries in an area of West Africa where the disease has not previously
occurred. There were also a handful of suspected cases in neighbouring Mali,
but these patients were found to have contracted other diseases
For each epidemic, transmission was quantified in different settings (illness in
the community, hospitalization, and traditional burial) and predictive analytics
simulated various epidemic scenarios to explore the impact of medical control
interventions on an emerging epidemic. A key medical parameter was the
rapid institution of control measures. For both epidemic profiles identified,
increasing the rate of hospitalization reduced the predicted epidemic size.
Over 4000 suspected cases of EVD have been recorded, with the majority of
them in Guinea. The current outbreak has currently resulted in over 2000
deaths. These figures will continue to rise as more patients die and as test
results confirm that they were infected with Ebola.
57. Pandemic Black Swan Event Types
Ebola is a highly lethal Haemorrhagic Viral Biological Disease, which has
caused at least 16 confirmed outbreaks in Africa between 1976 and 2015.
58. Pandemic Black Swan Event Types
Type Force Epidemiology Black Swan Event
13 Future
Bacterial
Pandemic
Infections
Bacterial
Biological
Disease
Bacteria were most likely the real killers in the 1918 H1N1 Flu Pandemic - the
vast majority of deaths in the 1918–1919 influenza pandemic resulted directly
from secondary bacterial pneumonia, caused by common upper respiratory-
tract bacteria. Less substantial data from the subsequent 1957 and 1968 Flu
pandemics are consistent with these findings. If severe pandemic influenza is
largely a problem of viral-bacterial co-pathogenesis, pandemic planning needs
to go beyond addressing the viral cause alone (influenza vaccines and
antiviral drugs). The diagnosis, prophylaxis, treatment and prevention of
secondary bacterial pneumonia - as well as stockpiling of antibiotics and
bacterial vaccines – should be high priorities for future pandemic planning.
14 Future
Viral
Pandemic
infections
Viral
Biological
Disease
What was Learned from Reconstructing the 1918 Spanish Flu Virus
Comparing pandemic H1N1 influenza viruses at the molecular level yields key
insights into pathogenesis – the way animal viruses mutate to cross species.
The availability of these two H1N1 virus genomes separated by over 90 years,
provided an unparalleled opportunity to study and recognise genetic properties
associated with virulent pandemic viruses - allowing for a comprehensive
assessment of emerging influenza viruses with human pandemic potential.
There are only four to six mutations required within the first three days of viral
infection in a new human host, to change an animal virus to become highly
virulent and infectious to human beings. Candidate viral gene pools for future
possible Human Pandemics include Anthrax, Ebola, Lassa Fever, Rift Valley
Fever, SARS, MIRS, H1N1 Swine Flu (2009) and H7N9 Avian / Bat Flu (2013).
59.
60. Clustering in “Big Data”
“A Cluster is a group of the same or similar data elements
which are aggregated – or closely distributed – together”
Clustering is a technique used to explore content and
understand information in every business sector and scientific
field that collects and processes very large volumes of data
Clustering is an essential tool for any “Big Data” problem
61. Multiple Factor Regression Analysis
In a multivariate regression case, where
there are two or more independent
variables, then the resultant regression
plane cannot be visualised within the
constraints of a two dimensional plane…..
62. Multiple Factor Regression Analysis
In a multivariate regression case, where there are two
or more independent variables, then the resultant
regression plane cannot be visualised within the
constraints of a two dimensional plane…..
63. Data Visualisation - Tufte in R
"The idea behind Tufte in R is to use R - the easiest and most powerful
open-source statistical analysis programming language - to replicate
the excellent data visualisation practices developed by Edward Tufte“
- Diego Marinho de Oliveira - Lead Data Scientist / Ph.D. candidate
64. • “Big Data” refers to vast aggregations (super sets) consisting of numerous individual
datasets (structured and unstructured) - whose size and scope is beyond the capability of
conventional transactional (OLTP) or analytics (OLAP) Database Management Systems
and Enterprise Software Tools to capture, store, analyse and manage. Examples of “Big
Data” include the vast and ever changing amounts of data generated in social networks
where we maintain Blogs and have conversations with each other, news data streams,
geo-demographic data, internet search and browser logs, as well as the ever-growing
amount of machine data generated by pervasive smart devices - monitors, sensors and
detectors in the environment – captured via the Smart Grid, then processed in the Cloud –
and delivered to end-user Smart Phones and Tablets via Intelligent Agents and Alerts.
• Data Set Mashing and “Big Data” Global Content Analysis – drives Horizon Scanning,
Monitoring and Tracking processes by taking numerous, apparently un-related RSS and
other Information Streams and Data Feeds, loading them into Very large Scale (VLS)
DWH Structures and Document Management Systems for Real-time Analytics – searching
for and identifying possible signs of relationships hidden in data (Facts/Events)– in order to
discover and interpret previously unknown Data Relationships driven by hidden Clustering
Forces – revealed via “Weak Signals” indicating emerging and developing Application
Scenarios, Patterns and Trends - in turn predicating possible, probable and alternative
global transformations which may unfold as future “Wild Card” or “Black Swan” events.
“Big Data”
65. Clustering in “Big Data”
• The profiling and analysis of
large aggregated datasets in
order to determine a ‘natural’
structure of groupings provides
an important technique for many
statistical and analytic
applications. Cluster analysis
on the basis of profile similarities
or geographic distribution is a
method where no prior
assumptions are made
concerning the number of
groups or group hierarchies and
internal structure. Geo-
demographic techniques are
frequently used in order to
profile and segment populations
by ‘natural’ groupings - such as
common behavioural traits,
Clinical Trial, Morbidity or
Actuarial outcomes - along with
many other shared
characteristics and common
factors.....
66. Clustering in “Big Data”
• "BIG DATA” ANALYTICS – PROFILING, CLUSTERING and 4D GEOSPATIAL ANALYSIS •
• The profiling and analysis of large aggregated datasets in order to determine a ‘natural’
structure of data relationships or groupings, is an important starting point forming the basis of
many mapping, statistical and analytic applications. Cluster analysis of implicit similarities -
such as time-series demographic or geographic distribution - is a critical technique where no
prior assumptions are made concerning the number or type of groups that may be found, or
their relationships, hierarchies or internal data structures. Geospatial and demographic
techniques are frequently used in order to profile and segment populations by ‘natural’
groupings. Shared characteristics or common factors such as Behaviour / Propensity or
Epidemiology, Clinical, Morbidity and Actuarial outcomes – allow us to discover and explore
previously unknown, concealed or unrecognised insights, patterns, trends or data relationships.
• PREDICTIVE ANALYITICS and EVENT FORECASTING •
• Predictive Analytics and Event Forecasting uses Horizon Scanning, Tracking and Monitoring
methods combined with Cycle, Pattern and Trend Analysis techniques for Event Forecasting
and Propensity Models in order to anticipate a wide range of business. economic, social and
political Future Events – ranging from micro-economic Market phenomena such as forecasting
Market Sentiment and Price Curve movements - to large-scale macro-economic Fiscal
phenomena using Weak Signal processing to predict future Wild Card and Black Swan Events
- such as Monetary System shocks.
67.
68. Digital Healthcare - Patient Experience and Journey
• The last decade has seen an unprecedented explosion in mobile platforms
as the internet and mobile worlds came of age. It is no longer acceptable
just to have a bricks-and-mortar clinical presence only – patient-focused
healthcare providers are now expected to deliver their Patient Experience
and Journey via internet websites, mobile phones and more recently tablets.
69. Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume Data Flows
– Mobile Enterprise Platforms (MEAP’s)
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
– Data Delivery and Consumption
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Presentation and Display
Excel
Web
Mobile
– Data Management Processes
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
– Performance Acceleration
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast data replication
– Data Management Tools
DataFlux
Embarcadero
Informatica
Talend
– Info. Management Tools
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now EMC2)
Extreme Data xdg
– Data Warehouse Appliances
Ab Initio
Ascential
Genio
Orchestra
Social Intelligence – The Emerging Big Data Stack
70. GIS MAPPING and SPATIAL DATA ANALYSIS
• A Geographic Information System (GIS) integrates hardware, software and
digital data capture devices for acquiring, managing, analysing, distributing and
displaying all forms of geographically dependant location data – including
machine generated data such as Computer-aided Design (CAD) data from land
and building surveys, Global Positioning System (GPS) terrestrial location data -
as well as all kinds of data streams - HDCCTV, aerial and satellite image data.....
71. GIS Mapping and Spatial Analysis
• GIS MAPPING and SPATIAL DATA ANALYSIS •
• A Geographic Information System (GIS) integrates hardware, software and
digital data capture devices for acquiring, managing, analysing, distributing and
displaying all forms of geographically dependant location data – including machine
generated data such as Computer-aided Design (CAD) data from land and
building surveys, Global Positioning System (GPS) terrestrial location data - as
well as all kinds of data streams - HDCCTV, aerial and satellite image data.....
• Spatial Data Analysis is a set of techniques for analysing 3-dimensional spatial
(Geographic) data and location (Positional) object data overlays. Software that
implements spatial analysis techniques requires access to both the locations of
objects and their physical attributes. Spatial statistics extends traditional statistics
to support the analysis of geographic data. Spatial Data Analysis provides
techniques to describe the distribution of data in the geographic space (descriptive
spatial statistics), analyse the spatial patterns of the data (spatial pattern or cluster
analysis), identify and measure spatial relationships (spatial regression), and
create a surface from sampled data (spatial interpolation, usually categorized as
geo-statistics).
• The results of spatial data analysis are largely dependent upon the type,
quantity, distribution and data quality of the spatial objects under analysis.
73. Geo-demographic Clustering in “Big Data”
• GEODEMOGRAPHIC PROFILING – CLUSTERING IN“BIG DATA” •
• The profiling and analysis of large aggregated datasets in order to determine a
‘natural’ or implicit structure of data relationships or groupings where no prior
assumptions are made concerning the number or type of groups discovered or group
relationships, hierarchies or internal data structures - in order to discover hidden data
relationships - is an important starting point forming the basis of many statistical and
analytic applications. The subsequent explicit Cluster Analysis as of discovered data
relationships is a critical technique which attempts to explain the nature, cause and
effect of those implicit profile similarities or geographic distributions. Demographic
techniques are frequently used in order to profile and segment populations using
‘natural’ groupings - such as common behavioural traits, Clinical, Morbidity or Actuarial
outcomes, along with many other shared characteristics and common factors – and
then attempt to understand and explain those natural group affinities and geographical
distributions using methods such as Causal Layer Analysis (CLA).....
74. GIS Mapping and Spatial Analysis
• A Geographic Information System (GIS) integrates hardware, software and digital
data capture devices for acquiring, managing, analysing, distributing and displaying all
forms of geographically dependant location data – including machine generated data
such as Computer-aided Design (CAD) data from land and building surveys, Global
Positioning System (GPS) terrestrial location data - as well as all kinds of data
streams - HDCCTV, aerial and satellite image data.....
• Spatial Data Analysis is a set of techniques for analysing spatial (Geographic)
location data. The results of spatial analysis are dependent on the locations of
the objects being analysed. Software that implements spatial analysis techniques
requires access to both the locations of objects and their physical attributes.
• Spatial statistics extends traditional statistics to support the analysis of geographic
data. Spatial Data Analysis provides techniques to describe the distribution of data in
the geographic space (descriptive spatial statistics), analyse the spatial patterns of the
data (spatial pattern or cluster analysis), identify and measure spatial relationships
(spatial regression), and create a surface from sampled data (spatial interpolation,
usually categorized as geo-statistics).
78. Targeting – Map / Reduce
Consume – End-User Data
Data Acquisition – High-Volume
– Mobile Enterprise Platforms (MEAP’s)
– Data Delivery and Consumption
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Management Processes
– Performance Acceleration
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
Data Extract, Transform, Load
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast data replication
– Data Presentation and Display
– Data Management Tools
– Info. Management Tools
– Data Warehouse Appliances
Excel
Web
Mobile
DataFlux
Embarcadero
Informatica
Talend
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now EMC2)
Extreme Data xdg
Zybert Gridbox
Ab Initio
Ascential
Genio
Orchestra
79. Clustering Phenomena in “Big Data”
“A Cluster is a group of profiled data similarities aggregated closely together”
• Cluster Analysis is a technique which is used to explore very large volumes of
structured and unstructured data - transactional, machine generated (automatic)
social media and internet content and geo-demographic information - in order to
discover previously unknown, unrecognised or hidden logical data relationships.
80. Event Clusters and Connectivity
A
B
C
D
E
G
H
F
The above is an illustration of Event relationships - how Events might be connected. Any detailed,
intimate understanding of the connection between Events may help us to answer questions such as: -
• If Event A occurs does it make Event B or H more or less likely to occur ?
• If Event B occurs what effect does it have on Events C,D,E, F and G ?
Answering questions such as these allows us to plan our Event Management approach and Risk
mitigation strategy – and to decide how better to focus our Incident / Event resources and effort…..
81. Event Clusters and Connectivity
• Aggregated Event includes coincident, related, connected and interconnected Event: -
• Coincident - two or more Events appear simultaneously in the same domain –
but they arise from different triggers (unrelated causal events)
• Related - two more Events materialise in the same domain sharing common
Event features or characteristics (may share a possible hidden common trigger or
cause – and so are candidates for further analysis and investigation)
• Connected - two more Events materialise in the same domain due to the same
trigger (common cause)
• Interconnected - two more Events materialise together in a Event cluster, series
or “storm” - the previous (prior) Event event triggering the subsequent (next) event
in an Event Series…..
• A series of Aggregated Events may result in a significant cumulative impact - and are
therefore frequently identified incorrectly as Wild-card or Black Swan Events - rather
than just simply as event clusters or event “storms”.....
82. Event Clusters and Connectivity
1
2
3
4
5
7
8
6
The above is an illustration of Event relationships - how Risk Events might be connected. A detailed and
intimate understanding of Event clusters and the connection between Events may help us to understand: -
• What is the relationship between Events 1 and 8, and what impact do they have on Events 2 - 7 ?
• Events 2 - 5 and Events 6 and 7 occur in clusters – what are the factors influencing these clusters ?
Answering questions such as these allows us to plan our Risk Event management approach and mitigation
strategy – and to decide how to better focus our resources and effort on Risk Events and fraud management.
Claimant 1
Risk Event
Claimant 2
Residence
Vehicle
Event
Cluster
83. Aggregated Event Types
ATrigger A
Coincident Events
BTrigger B
Event
Event
CTrigger 1
Related Events
DTrigger 2
Event
Event
E
Trigger
Connected Events
Event
EventF
GTrigger
Inter-connected Events
Event Event
H
85. From sports to scientific research, a surprising range
of industries will begin to find value in big data.....
86. Big Data – Products
The MapReduce technique has spilled over into many other disciplines that process vast
quantities of information including science, industry, and systems management. The Apache
Hadoop Library has become the most popular implementation of MapReduce – with
framework implementations from Cloudera, Hortonworks and MAPR
87. “Big Data” Applications
• Science and Technology
– Pattern, Cycle and Trend Analysis
– Horizon Scanning, Monitoring and Tracking
– Weak Signals, Wild Cards, Black Swan Events
• Multi-channel Retail Analytics
– Customer Profiling and Segmentation
– Human Behaviour / Predictive Analytics
• Global Internet Content Management
– Social Media Analytics
– Market Data Management
– Global Internet Content Management
• Smart Devices and Smart Apps
– Call Details Records
– Internet Content Browsing
– Media / Channel Selections
– Movies, Video Games and Playlists
• Broadband / Home Entertainment
– Call Details Records
– Internet Content Browsing
– Media / Channel Selections
– Movies, Video Games and Playlists
• Smart Metering / Home Energy
– Energy Consumption Details Records
• Civil and Military Intelligence
Digital Battlefields of the Future – Data Gathering
Future Combat Systems - Intelligence Database
Person of Interest Database – Criminal Enterprise,
Political organisations and Terrorist Cell networks
Remote Warfare - Threat Viewing / Monitoring /
Identification / Tracking / Targeting / Elimination
HDCCTV Automatic Character/Facial Recognition
• Security
Security Event Management - HDCCTV, Proximity
and Intrusion Detection, Motion and Fire Sensors
Emergency Incident Management - Response
Services Command, Control and Co-ordination
• Biomedical Data Streaming
Care in the Community
Assisted Living at Home
Smart Hospitals and Clinics
• Internet of Things (IOT)
SCADA Remote Sensing, Monitoring and Control
Smart Grid Data (machine generated data)
Vehicle Telemetry Management
Intelligent Building Management
Smart Homes Automation
88. Comparing Data in RDBMS, Appliances and Hadoop
RDBMS DWH DWH Appliance Hadoop Cluster
Data size Gigabytes Terabytes Petabytes
Access Interactive and batch Interactive and batch Batch
Structure Fixed schema Fixed schema Unstructured schema
Language SQL SQL Non-procedural Languages
(NoSQL, Hive, Pig, etc)
Data Integrity High High Low
Architecture Shared memory - SMP Shared nothing - MPP Hadoop DFS
Virtualisation Partitions / Regions MPP / Nodal MPP / Clustered
Scaling Nonlinear Nodal / Linear Clustered / Linear
Updates Read and write Write once, read many Write once, read many
Selects Row-based Set-based Column-based
Latency Low – Real-time Low – Near Real-time High – Historic Information
Figure 1: Comparing RDBMS to MapReduce
89. “Big Data” – Analysing and Informing
• “Big Data” is now a torrent raging through every aspect of the global economy – both the
public sector and private industry. Global enterprises generate enormous volumes of
transactional data – capturing trillions of bytes of information from the internal and
external environment. Data Sources include Social Media, Internet Content, Remote
Sensors, Monitors and Controllers, and transactions from their own internal business
operations – global markets. supply chain, business partners, customers and suppliers.
1. SENSE LAYER – Remote Monitoring and Control Devices – WHAT and WHEN?
2. COMMUNICATION LAYER – Mobile Enterprise Platforms (3G / WiFi + 4G / LTE) – VIA ?
3. SERVICE LAYER – 4D Geospatial / Real-time / Predictive Analytics – WHY?
4. GEO-DEMOGRAPHIC LAYER – Social Media, People and Places – WHO and WHERE ?
5. INFORMATION LAYER – “Big Data” and Internet Content data set “mashing” – HOW ?
6. INFRASTRUCTURE LAYER – Cloud Services / Hadoop Clusters / GPGPUs / SSDs
90. “Big Data” – Analysing and Informing
COMMUNICATION LAYER – Mobile Enterprise Platforms (3G / WiFi + 4G / LTE)
Biomedical Smart Apps – VIA ?
SERVICE LAYER – 4D Geospatial / Real-time / Predictive Analytics – HOW ?
INFORMATION LAYER – “Big Data” Analytics MapReduce / Data Set “mashing”
Data Science / Causal Layer Analysis – WHY ?
INFRASTRUCTURE LAYER – Cloud Service Platforms
Hadoop Clusters / GPGPUs / SSDs
SENSE LAYER – Remote Monitoring and Control Devices – WHAT and WHEN ?
GEO-DEMOGRAPHIC LAYER – People and Places – WHO and WHERE?
91. “Big Data” – Analysing and Informing
• SENSE LAYER – Remote Monitoring and Control – WHAT and WHEN?
– Remote Sensing – Sensors, Monitors, Detectors, Smart Appliances / Devices
– Remote Viewing – Satellite. Airborne, Mobile and Fixed HDCCTV
– Remote Monitoring, Command and Control – SCADA
• GEO-DEMOGRAPHIC LAYER – People and Places – WHO and WHERE?
– Person and Social Network Directories - Personal and Social Media Data
– Location and Property Gazetteers - Building Information Models (BIM)
– Mapping and Spatial Analysis - Topology, Landscape, Global Positioning Data
• COMMUNICATION LAYER – Mobile Enterprise Platforms and the Smart Grid
– Connectivity - Smart Devices, Smart Apps, Smart Grid
– Integration - Mobile Enterprise Application Platforms (MEAPs)
– Backbone – Wireless and Optical Next Generation Network (NGE) Architectures
92. “Big Data” – Analysing and Informing
SERVICE LAYER – 4D Geospatial / Real-time / Predictive Analytics – WHY?
COMMUNICATION LAYER – Mobile Enterprise Platforms (3G / WiFi + 4G / LTE)
Biomedical Smart Apps – VIA ?
Market
Survey DataTV Set-top Box
Channel Selections
Smart App
Playlists
Geographic &
Demographic
Survey Data
EntertainmentFactory Office &
Warehouse
Wearable &
Personal
Technology
Transport Public Buildings Smart
Homes
Public house
Mall, Shop,
Store
Smart
Kiosks &
Cubicles
Mobile
Smart
Apps
CCTV /
ANPR
Social
Intelligence
Campaign
Management
e-Business
Smart Apps
Big Data Analytics
The Pyramid™
Customer Loyalty
& Brand Affinity
The Pyramid™
Analytics
Smart Apps
INFRASTRUCTURE LAYER – Cloud Services
Hadoop Clusters / GPGPUs / SSDs
SENSE LAYER – Remote Monitoring, Data and Control Devices – WHAT and WHEN ?
93. “Big Data” – Analysing and Informing
• SERVICE LAYER – Real-time Analytics – WHY?
– Global Mapping and Spatial Analysis
– Service Aggregation, Intelligent Agents and Alerts
– Data Analysis, Data Mining and Statistical Analysis
– Optical and Wave-form Analysis and Recognition, Pattern and Trend Analysis
– Big Data - Hadoop Clusters / GPGPUs / SSDs
• INFORMATION LAYER – “Big Data” and Data Set “mashing” – HOW?
– Content – Structured and Unstructured Data and Content
– Information – Atomic Data, Aggregated, Ordered and Ranked Information
– Transactional Data Streams – Smart Devices, EPOS, Internet, Mobile Networks
• INFRASTRUCTURE LAYER – Cloud Service Platforms
– Cloud Models – Public, Private, Mixed / Hybrid, Enterprise, Secure and G-Cloud
– Infrastructure – Network, Storage and Servers
– Applications – COTS Software, Utilities, Enterprise Services
– Security – Principles, Policies, Users, Profiles and Directories, Data Protection
94. “DATA SCIENCE” – my own special area of Business expertise
Targeting – Split / Map / Shuffle / Reduce
Consume – End-User Data
Data Provisioning – High-Volume Data Flows
– Mobile Enterprise Platforms (MEAP’s)
Apache Hadoop Framework
HDFS, MapReduce, Metlab “R”
Autonomy, Vertica
Smart Devices
Smart Apps
Smart Grid
Clinical Trial, Morbidity and Actuarial Outcomes
Market Sentiment and Price Curve Forecasting
Horizon Scanning,, Tracking and Monitoring
Weak Signal, Wild Card and Black Swan Event Forecasting
– Data Delivery and Consumption
News Feeds and Digital Media
Global Internet Content
Social Mapping
Social Media
Social CRM
– Data Discovery and Collection
– Analytics Engines - Hadoop
– Data Presentation and Display
Excel
Web
Mobile
– Data Management Processes
Data Audit
Data Profile
Data Quality Reporting
Data Quality Improvement
Data Extract, Transform, Load
– Performance Acceleration
GPU’s – massive parallelism
SSD’s – in-memory processing
DBMS – ultra-fast data replication
– Data Management Tools
DataFlux
Embarcadero
Informatica
Talend
– Info. Management Tools
Business Objects
Cognos
Hyperion
Microstrategy
Biolap
Jedox
Sagent
Polaris
Teradata
SAP HANA
Netezza (now IBM)
Greenplum (now Pivotal)
Extreme Data xdg
Zybert Gridbox
– Data Warehouse Appliances
Ab Initio
Ascential
Genio
Orchestra
The Emerging “Big Data” Stack
Information Management Strategy
Data Acquisition Strategy
95. Big Data – Process Overview
Analytics
Big Data
Management
Big Data
Provisioning
Big Data
Platform
Big Data
Consumption
Data Stream
Data ScientistsData Architects
Data Analysts
Big Data
Administration
Revenue Stream
Data Administrators
Data Managers
Hadoop Platform
Engineering Team
Insights
97. Apache Hadoop Component Stack
HDFS
MapReduce
Pig
Zookeeper
Hive
HBase
Oozie
Mahoot
Hadoop Distributed File System (HDFS)
Scalable Data Applications Framework
Procedural Language – abstracts low-level MapReduce operators
High-reliability distributed cluster co-ordination
Structured Data Access Management
Hadoop Database Management System
Job Management and Data Flow Co-ordination
Scalable Knowledge-base Framework
98. Data Management Component Stack
Informatica
Drill
Millwheel
Informatica Big Data Edition / Vibe Data Stream
Data Analysis Framework
Data Analytics on-the-fly + Extract – Transform – Load Framework
Flume
Sqoop
Scribe
Extract – Transform - Load
Extract – Transform - Load
Extract – Transform - Load
Talend Extract – Transform - Load
Pentaho Extract – Transform – Load Framework + Data Reporting on-the-fly
99. Big Data Storage Platforms
Autonomy
Vertica
MongoDB
HP Unstructured Data DBMS
HP Columnar DBMS
High-availability DBMS
CouchDB
Couchbase Database Server for Big Data with NoSQL / Hadoop
Integration
Pivotal Pivotal Big Data Suite – GreenPlum, GemFire, SQLFire, HAWQ
Cassandra
Cassandra Distributed Database for Big Data with NoSQL and
Hadoop Integration
NoSQL NoSQL Database for Oracle, SQL/Server, Couchbase etc.
Riak
Basho Technologies Riak Big Data DBMS with NoSQL / Hadoop
Integration
100. Big Data Analytics Engines and Appliances
Alpine
Karmasphere
Kognito
Alpine Data Studio - Advanced Big Data Analytics
Karmasphere Studio and Analyst – Hadoop Customer Analytics
Kognito In-memory Big Data Analytics MPP Platform
Skytree
Redis
Skytree Server Artificial Intelligence / Machine Learning Platform
Redis is an open source key-value database for AWS, Pivotal etc.
Teradata Teradata Appliance for Hadoop
Neo4j Crunchbase Neo4j - Graphical Database for Big Data
InfiniDB Columnar MPP open-source DB version hosted on GitHub
Big Data Analytics Engines / Appliances
101. Big Data Analytics and Visualisation Platforms
Tableaux Tableaux - Big Data Visualisation Engine
Eclipse Symentec Eclipse - Big Data Visualisation
Mathematica Mathematical Expressions and Algorithms
StatGraphics Statistical Expressions and Algorithms
FastStats Numerical computation, visualization and programming toolset
MatLab
R
Data Acquisition and Analysis Application Development Toolkit
“R” Statistical Programming / Algorithm Language
Revolution Revolution Analytics Framework and Library for “R”
102. Hadoop / Big Data Extended Infrastructure Stack
SSD Solid State Drive (SSD) – configured as cached memory / fast HDD
CUDA CUDA (Compute Unified Device Architecture)
GPGPU GPGPU (General Purpose Graphical Processing Unit Architecture)
IMDG IMDG (In-memory Data Grid – extended cached memory)
Vibe
Splunk
High Velocity / High Volume Machine / Automatic Data Streaming
High Velocity / High Volume Machine / Automatic Data Streaming
Ambari High-availability distributed cluster co-ordination
YARN Hadoop Resource Scheduling
Big Data Extended Architecture Stack
103. Cloud-based Big-Data-as-a-Service and Analytics
AWS
Amazon Web Services (AWS) – Big Data-as-a-Service (BDaaS)
Elastic Compute Cloud (ECC) and Simple Storage Service (S3)
1010 Data Big Data Discovery, Visualisation and Sharing Cloud Platform
SAP HANA SAP HANA Cloud - In-memory Big Data Analytics Appliance
Azure Microsoft Azure Data-as-a-Service (DaaS) and Analytics
Anomaly 42 Anomaly 42 Smart-Data-as-a-Service (SDaaS) and Analytics
Workday Workday Big-Data-as-a-Service (BDaaS) and Analytics
Google Cloud
Google Cloud Platform – Cloud Storage, Compute Platform,
Firebrand API Resource Framework
Apigee Apigee API Resource Framework
104. Data Warehouse Appliance / Real-time
Analytics Engine Price Comparison
Manufacturer
Server
Configuration
Cached Memory
Server
Type
Software
Platform
Cost (est.)
SAP HANA 32-node (4
Channels x 8 CPU)
1.3 Terabytes SMP Proprietary $ 6,000,,000
Teradata 20-node (2
Channels x 10 CPU)
1 Terabyte MPP Proprietary $ 1,000,000
Netezza
(now IBM)
20-node (2
Channels x 10 CPU)
1 Terabyte MPP Proprietary $ 180,000
IBM ex5 (non-HANA
configuration)
32-node (4
Channels x 8 CPU)
1.3 Terabytes SMP Proprietary $ 120,000
Greenplum (now
Pivotal)
20-node (2
Channels x 10 CPU)
1 Terabyte MPP Open Source $ 20,000
XtremeData xdb
(BO BW)
20-node (2
Channels x 10 CPU)
1 Terabyte MPP Open Source $ 18,000
Zybert Gridbox 48-node (4
Channels x 12 CPU)
20 Terabytes SMP Open Source $ 60,000
105. Apache Hadoop - Framework Distributions
FEATURE Hortonworks Teradata
Hadoop
Cloudera MAPR Pivotal
Open Source Hadoop Library Hcatalog (Hortonworks) Impala MAPR HD
Support Yes Yes Yes Yes Yes
Professional Services Yes Yes Yes Yes Yes
Catalogue Extensions Yes Yes Yes Yes Yes
Management Extensions Yes Yes Yes
Architecture Extensions Yes Yes
Infrastructure Extensions Yes Yes
Teradata Cloudera MAPR Pivotal HD
Library
Support
Services
Catalogue
Management
Library
Support
Services
Catalogue
Library
Support
Services
Catalogue
Management
Resilience
Availability
Performance
Library
Support
Services
Catalogue
Management
Resilience
Availability
Performance
Library
Support
Services
Catalogue
Hortonworks
Cloudera with Impala
EMC Pivotal HD distribution
Hortonworks Hcatalog System
MAPR with MAPR Control System
109. Apache Hadoop – Cloud Hadoop Platforms
FEATURE HP HAVEn AWS EMR SAP HANA Mono-Clustered
Big Data Cloud Solution
Open Source Hadoop Library HP HAVEn Elastic
MapReduce
SAP HANA
Support Yes Yes Yes
Professional Services Yes Yes Yes
Catalogue Extensions Yes Yes Yes
Management Extensions Yes
Architecture Extensions Yes
Infrastructure Extensions Yes
AWS EMR SAP HANA
Library
Support
Services
Catalogue
HP HAVEn
HP HAVEn
AWS EMR
SAP HANA Mono-Clustered
Big Data Cloud Solution
117. Turing Institute
• In his Budget announcement, the chancellor, George Osborne pledged government
support for the Turing Institute, a specialist centre named after the great computer
pioneer Alan Turing – which will provide a British home for studying Data Science and
Big Data Analytics. Clustering and Wave-form algorithms in Big Data are the key to
unlocking Cycles, Patterns and Trends in complex (non-linear) systems – Cosmology,
Climate and Weather, Economics and Fiscal Policy – in order to forecast future trends,
outcomes and events with far greater accuracy.
• The chancellor, George Osborne has announced a £42m Alan Turing Institute is to be
founded to ensure that Britain leads the way in Data Science, Big Data Analytics for
studying complex (non-linear) systems - Clustering and Wave-form algorithmic research
in both Deterministic (human activity) and Stochastic (random, chaotic) processes.
• Drawing on the name of the famous British mathematician and computer pioneer Alan
Turing - who led the Enigma code-breaking work during the second world war at
Bletchley Park - the institute is intended to help British companies by bringing together
expertise and experience in tackling the challenges of understanding both deterministic
and stochastic systems – such as Weather, Climate, Economics, Econometrics and the
impact of Fiscal Policy – which require massive data sets and computational power.
119. Turing Institute
• The Turing Institute comes at a time when Data Science, Big Data Analytics and
complex system algorithm research is front and centre on the commercial stage. The
Turing Institute will be the first step to realising the UKs’ digital innovation potential.
Exploitation of big data by applying analytical methods - statistical analysis, predictive
and quantitative modelling - provides deeper insights and achieves brighter outcomes.
• The UK needs a centre of excellence capable of nurturing the talent required to make
British Data Science and Big Data Technology world-class. The cornerstone for the
new digital technologies isn’t just infrastructure, but the talent that’s needed to found,
innovate and grow technology firms and create a knowledge-based digital economy.
• The tender to house the institute will be produced this year. It may be a brand-new
facility or use existing facilities and space in a university, a Treasury spokesman said.
Its funding will come from the Department for Business, Innovation and Skills, and its
chief will report to the science minister, David Willetts. Executive appointments and
establishment numbers for the Turing Institute have yet to be announced.
• "The intention is for this work to benefit British companies to take a critical advantage
in the field of Data Science – algorithms, analytics and big data," said the spokesman.
121. Turing Institute
• Alan Turing was a pivotal figure in mathematics and computing and has long been
recognised as such by fellow mathematicians and computer scientists for his ground-
breaking work on Computational Theory. There already exists a Turing Institute at
Glasgow University, and an Alan Turing Institute in the Netherlands, as well as the Alan
Turing building at the Manchester Institute for Mathematical Sciences.
• Alan Turing’s code-breaking work using “the Bombe” - an electromechanical decryption
system - led to the de-ciphering of the German "Enigma" codes, which used very highly
complex encryption. His crypto-analysis work is claimed to have saved hundreds or even
thousands of lives and shortened WWII by as much as two years. Turing later formalised
Computational Theory which underpins modern computer science by the separation of
data from algorithms – sequences of instructions – in computer. programming languages.
• Osborne's announcement marks further official rehabilitation of a scientist who many see
as having been badly treated by the British establishment after his work during WWII.
Turing, who was homosexual, was convicted of indecency in March 1952, and lost his
security clearance with GCHQ - the successor to Bletchley Park. Turing killed himself in
June 1954 - but was only given an official pardon by the UK government in December
2013 after a series of public campaigns for recognition of his achievements.
125. Proof-of-concept and Prototype
The Patient Pyramid™ approach is lean, agile, smart and creative: -
• We start by providing a custom Pyramid™ Enterprise Application as a proof of concept.
We then work with client key stakeholders to scope a detailed brief which articulates a
business problem domain that the Patient Pyramid™ can help understand and resolve.
• We then harvest all current and past patient records along with any other available internal
and public domain biomedical data – in order to establish a baseline Patient Pyramid™.
• This is augmented by overlaying external data - Social Intelligence and other live
streamed Patient Lifestyle / Biomedical data that drives our new real-time Patient
Pyramid™ view describing the six primitives - who / what / why / where / when and how.
• Finally, we exploit social intelligence for Patient Lifestyle understanding – creating new
actionable insights to inform creative medical campaign solutions against the agreed brief.
• Post proof-of-concept, we then agree a Pyramid™ Enterprise Application fixed term
licence along with Patient Pyramid™ consulting, mentoring, training and support – on-
line, on-site, on-demand - whenever and wherever required.