This document provides an overview of a book on enabling data ecosystems for intelligent systems. It discusses key concepts like digital twins, physical-cyber-social computing, and mass personalization. It also outlines the architecture of a real-time linked dataspace platform that supports pay-as-you-go data integration and sharing for applications and intelligent systems. The platform is designed to handle streaming data from sensors and integrate it with contextual data sources using approximate semantic matching techniques.
Generative AI on Enterprise Cloud with NiFi and Milvus
From Data Platforms to Dataspaces: Enabling Data Ecosystems for Intelligent Systems
1. From Data Platforms to Dataspaces:
Enabling Data Ecosystems for Intelligent Systems
Edward Curry
Insight @ NUI Galway
edward.curry@nuigalway.ie
2. Open Access Book
Contents
Part I: Fundamentals and Concepts
Part II: Data Support Services
Part III: Stream and Event Processing Services
Part IV: Intelligent Systems and Applications
Part V: Future Directions
Team
http://dataspaces.info
Web:dataspaces.info
4. Data Driven Innovations
Digital Twins: A digital replica of physical
assets (car), processes (value-chain), systems,
or physical environments (building). The
digital representation (i.e. simulation
modelling or data-driven models) provided by
the digital twin can be analysed to optimise
the operation of the “physical twin”.
Physical-Cyber-Social (PCS): A computing
paradigm that supports a richer human
experience with a holistic data-rich view of
the smart environment that integrates,
correlates, interprets, and provides
contextually relevant abstractions to humans.
Mass Personalisation: More human-centric
thinking in the design of systems where users
have growing expectations for highly
personalised digital services for the “Market
of One”.
Data Network Effects: As more systems/users
join and contribute data to the smart
environment, a “network effect” can take
place, resulting in the overall data available
becoming more valuable.
http://dataspaces.info
5. Real World Digital World
Sensors Orient
DecideActuators Act
Observe
Physical Twin
(Asset-centric)
Digital Twin
(System-centric)
Digital
Twins
http://dataspaces.info 5
8. Data Management Challenges
• Pay-as-you-go Data Integration, Accessibility, and Sharing
– Standard data syntax, semantics, and linkage: Facilitate integration and sharing, ideally with open standards
and non-proprietary approaches.
– Single-point data discoverability and accessibility: Allow the organisation and access to datasets and
metadata through a single location.
– Incremental data management: Enable a low barrier to entry and a pay-as-you-go paradigm to minimise
costs.
• Secure Access Control: Support data access rights to preserve the security of data and privacy of
users in the smart environment.
• Real-time Data Processing and Historical Querying
– Real-time data processing: Including ingestion, aggregation, and pattern detection within event streams
originating from sensors and things in the smart environment.
– Unified querying of real-time data and historical data: Provide applications and end-users with a holistic
queryable state of the smart environment at a latency suitable for user interaction.
• Entity-centric Data Views
– Entity management: The storage, linkage, curation, and retrieval of entity data, such as users, zones, and
locations.
– Event enrichment: Enhancement of sensor/things streams with contextual data (e.g. entities) to make the
stream data more encapsulated and useful in downstream processing.
http://dataspaces.info
9. The “gold mining” metaphor applied to data processing
http://dataspaces.info
10. Traditional Approaches to Data Integration
Low
High
High
Frequency
of use
Cost of administration &
semantic integration using
traditional approaches
Popularity/Use
Number of data sources, entities, attributes
http://dataspaces.info
11. Data is Key to AI…Data Platforms will Fuel AI Decisions
Data Generation
and Analysis
(including IoT)
Data Platforms
(Access and Portability)
AI and Decision Platformshttp://dataspaces.info
12. IoT-Enablement
Layer 1 - Communication and Sensing
IPv6, Wi-Fi, RFID, CoAP, AVB, etc.
Layer 3 - Data
Schema, Entities, Catalog, Sharing, Access/Control, etc.
Layer 4 – Intelligent Apps, Analytics, and Users
Datasets
Things / Sensors
Contextual Data Sources
(including legacy systems)
Predictive
Analytics
Situation
Awareness
Decision
Support
Digital
Twin
Machine
Learning
Users
Layer 2 - Middleware
Peer-to-Peer, Events, Pub/Sub, SOA, SDN, etc.
A Data Sharing Layer is needed….
Adapted from: L. Atzori, A. Iera, and G. Morabito, “The
Internet of Things: A survey,” Comput. Networks, vol. 54,
no. 15, pp. 2787–2805, Oct. 2010.http://dataspaces.info
13. Cost of Data Management Solutions
http://dataspaces.info
Administrative Proximity:
– With close control many assumptions
can hold concerning guarantees such
as data quality and consistency.,
– Far control refers to a loosely coupled
environment and a lack of
coordination on the data sources.
Semantic Integration
– Degree to which data schemas are
matched up (types, attributes, and
names).
– All data conform to an agreed-upon
schema vs. no schema information.
This dimension is relevant to how
much semantically rich querying can
be done. 13
Halevy, A., Franklin, M. and Maier, D. 2006. Principles of dataspace
systems. 25th ACM SIGMOD-SIGACT-SIGART symposium on Principles of
database systems - PODS ’06 (New York, New York, USA, 2006), 1–9.
14. (Real-time Linked) Dataspace
Principles: (adapted from by Halevy et al.)
• Must deal with many different formats of
streams and events.
• Does not subsume the stream and event
processing engines; they still provide
individual access via their native interfaces.
• Queries in are provided on a best-effort
and approximate basis.
• Must provide pathways to improve the
integration among the data sources,
including streams and events, in a pay-as-
you-go fashion.
14http://dataspaces.info
Dataspace
“Dataspaces are not a data integration
approach; rather, they are more of a data co-
existence approach. The goal of dataspace
support is to provide base functionality over
all data sources, regardless of how integrated
they are.” (Halevy, A., Franklin, M. and Maier, D. 2006.)
Real-time Linked Dataspace (RLD)
Enabling platform for data management for
intelligent systems within smart environments
that combines the pay-as-you-go paradigm of
dataspaces, linked data, and knowledge
graphs with entity-centric real-time query
capabilities.
15. Approximate and Best Effort Approaches
Low
High
High
Frequency
of use Approximate &
best-effort
approaches
Cost of administration &
semantic integration using
traditional approaches
Popularity/Use
Number of data sources, entities, attributes
http://dataspaces.info
16. Architecture of Real-time Linked Dataspace
• Support Platform: Responsible for providing
the functionalities and services essential for
managing the dataspace.
• Things / Sensors: Produce real-time data
streams that need to be processed & managed.
• Data Sources: Available in a wide variety of
formats and accessible through different
systems interfaces.
• Managed Entities: Actively managed entities
including their relationship to participating
things, data sources, and other entities.
• Intelligent Applications, Analytics, & Users:
Leverage RLDs data and services to provide
data analytics, decision support tools, user
interfaces, and data visualisations. 16http://dataspaces.info
17. Pay-as-you-Go Tiered Data Model
http://dataspaces.info 17
• Provides flexibility by reducing
the initial cost and barriers to
joining the dataspace.
• Specialisation of the 5 star
scheme defined by
Tim Berners-Lee.
• Over time the level of integration
with the support services can be
improved in an incremental
manner on an as-needed basis.
• The more investment made to
integrate with the support
services; the better integration is
achievable in the dataspace.
19. Part II: Data Support Services
http://dataspaces.info
20. Part III: Stream and Event Processing Services
http://dataspaces.info
21. Data Self-Management
http://dataspaces.info 21
Techniques for:
• Self-Configuration
• Self-Healing
• Self-Optimizing
Automatic Source
Selection
• Source Selection
• Source Replacement
• Model Selection
• Model Training
• Parameterization
22. Entity Data Management and Humans in the Loop
http://dataspaces.info
Enables Users in the Smart
Environment to participate in
data management tasks
• Collection & Enrichment
• Mapping & Matching
• Operator Evaluation
• Feedback & Refinement
• Citizen Actuation
Key HIL Challenges
• Task Specification (simplicity)
• Interaction Mechanism
• Task Assignment (Geospatial,
expertise) 22
23. Semantic Approximation Matching of Streams
http://dataspaces.info
Challenges
• Heterogeneity in Event
Semantics (000s schema)
• Heterogeneity in processing
Rules (000s of rule tied to
schema)
Approx. Semantic Event Matcher
• Sub-symbolic Distributional
Event Semantics
• Enables pay-as-you-go event
matching for data streams
• Replaced 48,000 exact rules with
100 approximate rules with
around 85% accuracy
23
24. Part IV: Intelligent Systems and Applications
http://dataspaces.info
LOCATION
Airport Office Home Mixed Use School
LINATE AIRPORT,
MILAN, ITALY
INSIGHT,
GALWAY, IRELAND
HOUSES,
THERMI, GREECE
ENGINEERING,
NUI GALWAY
COLÁISTE NA
COIRIBE, IRELAND
TARGETUSERS
• Corporate users
• ~9.5 million
passengers
• Utilities
management
• Maintenance
staff
• Environmental
managers
• 130 staff
• Office consumers
• Operations
managers
• Utility providers
• Building
managers
• Domestic
consumers
(adults, young
adults and
children)
• Utility providers
• Mixed/Public
consumers
• Building
managers
• 100 staff
• 1000 students
(ages 18 to 24)
• Mixed/Public
consumers
• School
management
• Maintenance
staff
• 500 students
(ages 12 to 18)
• 40 teachers
INFRASTRUCTURE
• Safety critical
• 10 km water
network
• Multiple
buildings
• Water meters
• Energy meters
• Legacy systems
• 2190 m2 space
• 22 offices + 160
open plan spaces
• Conference room
• 4 meeting rooms
• 3 kitchens
• Data centre
• 30 person café
• Energy meters
• 10 households
• Typical variety of
domestic settings
including kitchen,
showers, baths,
living room,
bedrooms, and
garden
• Water meters
• Water meters
• Energy meters
• Rainwater
harvesting
• Café
• Weather station
• Wet labs
• Showers
• Water meters
• Energy meters
• Rainwater
harvesting
India (OK)India (OK)India (OK)
Smart Water
and Energy
Management
Pilots
25. Smart School
CnaC School in
Galway, Ireland
Mixed Use
Galway, Ireland
Building
Manager
University Students
Smart Airport
Milan Linate,
Italy
Corporate
Staff
Passengers
Smart Homes
Municipality of
Thermi, Greece
Smart Office
Galway, Ireland
Families
Operational
Staff
Researchers
Application
Developers
Teaching Staff School Students
Data
Scientist
Need to target different Target Users
http://dataspaces.info
26. IoT-enabled
Digital Twins
and
Intelligent
Applications
Real-time Linked Dataspace
DatasetsThings / Sensors
Entity Management Service
Catalog &
Access Control
Service
Personal DashboardPublic Dashboards
Decision Analytics and
Machine Learning
Notifications Apps
Alerts
Orient Decide
Act
Search & Query
Service
Entity-Centric
Real-Time Query
Service
Complex Event
Processing Service
Digital Twin
CEP
D
Human Task Service
Human Task
Service
Observe
http://dataspaces.info
“OODA” Loop
28. Experiences and Lessons Learnt from Dataspaces
http://dataspaces.info
• Developer education need for stream processing and approximate results
• Incremental data management can support agile software development
• Build the business case for data-driven innovation
• Integration with legacy data is a significant cost in smart environments
• The 5 star pay-as-you-go model simplified communication with non-technical
users
• A secure canonical source for entity data simplifies application development
• Data quality with things and sensors is challenging in an operational
environment
• Working with three pipelines add overhead (LAMBDA + Entity Layer)
28
29. Part V: Future Directions
http://dataspaces.info 29
Large-scale Decentralised Support Services
• Enhanced Supported Services
• Scaling Entity Management
• Maintenance and Operation Cost
Multimedia/Knowledge-Intensive Event
Processing
• Support Services for Multimedia Data
• Placement of Multimedia Data and
Workloads
• Adaptive Training of Classifiers
• Complex Multimedia Event Processing
Trusted Data Sharing
• Trusted Platforms
• Usage Control
• Personal/ Industrial Dataspaces
Ecosystem Governance and Economic
Models
• Decentralised Data Governance
• Economic Models
Incremental Intelligent Systems
Engineering Cognitive Adaptability
• Pay-as-you-go Systems
• Cognitive Adaptability
Towards Human-centric Systems
• Explainable Artificial Intelligence
and Data Provenance
• Human-in-the-loop
30. Some final thoughts on
Impacts, Influence, and Future Funding
http://dataspaces.info
31. Data Sharing Spaces – Position Paper
Key Recommendations
Create the conditions for the
development of a trusted European
data sharing framework
Incorporate data sharing at the core
of the data lifecycle to enable greater
access to data.
Provide supportive measures for
European businesses to safely
embrace new technologies, practices
and policies.
Assemble a European-wide digital
skills strategy to equip the workforce
for the new data economy.
32. A European Strategy for Data
BDVA Meeting
26 February 2020
Yvo Volman
Head of Unit G1 - Data Policy and Innovation
DG CNECT, European Commission
33. European Strategy for Data
Data can flow within the
EU and across sectors
European rules and values
are fully respected
Rules for access and use of data are
fair, practical and clear & clear data
governance mechanisms are in place
A common European data space, a single market for data
Availability of high quality data
to create and innovate
34. Rich pool of data
(varying degree of
accessibility)
Free flow of data
across sectors and
countries
Full respect of GDPR
Health
Industrial &
Manufacturing Agriculture Finance Mobility Green Deal Energy
−Technical tools for data pooling and sharing
−Standards & interoperability (technical,
semantic)
− Sectoral Data Governance (contracts,
licenses, access rights, usage rights)
− IT capacity, including cloud storage,
processing and services
Horizontal
framework for data
governance and data
access
Common European data spaces
Public
Administration Skills