1. Big data and
Knowledge Engineering
for Health
Prof. Anthony J Brookes: University of Leicester, UK
May 2012, London
Eduserv Symposium 2012: Big Data, Big Deal?
2. Different or No Big Data problems?
- Changing or stable rate of data generation / availability
- Changing or stable complexity of data
- Changing or stable requirement to use the data
- Changing or stable tooling to use the data
- Changing or stable mass of ‘useless’ data (vs knowledge)
3. ‘KNOWLEDGE ENGINEERING’
for HEALTH
Knowledge engineering was first defined in 1983 as “an
engineering discipline that involves integrating
knowledge into computer systems in order to solve
complex problems normally requiring a high level of
human expertise”
(Feigenbaum and McCorduck, 1983).
5. Building and engaging with the community:
- presentation & discussion at many international meetings and forums
- 1/2 day workshop as satellite to ESHG (6 invited speakers)
- workshop session at MIE2011 (3 invited speakers, audience discussion)
- I-Health 2011 workshop in Brussels, 3-4 Oct 2011
- growing community, currently >150 academics, companies, healthcare providers
8. Data
RESEARCH HEALTHCARE
BIO-INFORMATICS MED-INFORMATICS
ACADEMICS COMPANIES
Data
Biobanks
Registries
9.
10. I 4 H E A L T H
HEALTHCARE
RESEARCH
‘KNOWLEDGE ENGINEERING’
for HEALTH
11.
12.
13. RESEARCH WORLD
‘KNOWLEDGE GENERATION’
...make sense of these entities
‘KNOWLEDGE ENGINEERING’
...identify & use the bits you understand
CLINICAL WORLD
14. STANDARDS
• Semantic Standards (to allow unambiguous understanding of the data)
– Terminologies, Ontologies, Vocabularies, Coding systems
– Need cross-mapping between semantic standards, and across languages
• Syntactic Standards (to make data structures interoperable)
– Data and Metadata object models, and Exchange formats
– Minimal content specifications, harmonised across domains
– Robust core requirements, with general principles that bring flexibility
• Technical Standards (to build a system that works efficiently)
– Database models, Search systems, and User interfaces (e.g., browsers)
– Web-service specifications, Web 2.0 technologies
– ID solutions for data, databases, publications, biobanks, researchers
– Technologies for controlling data access and user permissions
– Ethical and Legal policies, implementation, and recognition-rewards structures
• Quality Standards (to match data to needs)
– measuring and representing quality in a meaningful way
– Important role here for metadata
– Recording and standardising SOPs
16. Electronic Healthcare Records
Recording Expressiveness
Terminology Precision/rigour
Collection Searchability
Models Comparability
Best Practice
Search and
Retrieval Models EHR Classifications
Utility
Categorisation
Decision Secondary use
Making
Information
Communication Model
Models Registration Structure
and Location Detail
Search
Models Storage
Interoperability
Notify, Find
17. Data sharing
- Incentive/reward systems
- 3 categories of risk, with ‘speed pass’ access control
- Compulsion/sanctions
- Researcher IDs (ORCID)
- Open data discovery (e.g., Cafe Variome)
- Remote pooled analysis (e.g., Data Shield, EU-ADR/EMIF)
25. Decision
BioScience Text & Computer
Support
& Omics Web pages Modalities Models
Systems data Systems
Databases Biosensors EHR
Feedback / Optimisation
DISORGANISED DIGITAL INFORMATION RELEVANT TO PERSONALIZED
HEALTHCARE
Self-
Emerging architectural Concept
Optimising
26. Data
Imaging Instrumentation Omics Clinical
+
Information
+ Personal Population Models
Knowledge
Knowledge
Portals
Health Care Optimised
Utility Healthcare
27. Big Data can mainly stay at ‘source’, feeding the Knowledge Extraction process
Knowledge Extraction/Distillation filters therefore need to be created
28. Policy and Strategy
- To kick start the field: Put money into research, development, and
application projects based upon the Knowledge Engineering concept
- To create the needed expertise: Cross-train people who have a talent
for engineering in computer science + bioscience + healthcare
- To ensure interoperability across the total system: Organise activities
on a middle-out basis, rather than the usual top-down or bottom-up
approaches
- To ensure innovation and sustainability: explore ways to get academic
and commercial players working together
- To start bringing the system to life: Emphasize knowledge ‘filtration’,
‘distillation’, and ‘provision’ from sources of (Big) Data
30. Acknowledgments
• GEN2PHEN Partners
• My team:
Robert Free, Rob Hastings, Adam Webb, Tim Beck, Sirisha
Gollapudi, Gudmundur Thorisson, Owen Lancaster
• Some key discussants:
Søren Brunak, Debasis Dash, Carlos Diaz, Norbert Graf, Johan
van der Lei, Heinz Lemke, Ferran Sanz
“Data-to-Knowledge-for-Practice”
(D2K4P) Center
This work received funding from the European
Community's Seventh Framework Programme
(FP7/2007-2013) under grant agreement number
200754 - the GEN2PHEN project.