SlideShare a Scribd company logo
1 of 23
ANALYSIS OF OPEN HEALTH DATA QUALITY USING
DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION:
INSIGHTS FROM A LATVIAN CONTEXT
13th Multi Conference on Computer Science and Information Systems
11th International Conference on e-Health
17 – 19 July 2019, Porto, Portugal
Anastasija Nikiforova
Faculty of Computing, University of Latvia
Anastasija.Nikiforova@lu.lv
(The New York Times, The Economist, WIRED)
Def. I: «Open data» are data that anyone can access, use and share.
 The popularity of open data continuously increases.
 For instance, European Data Portal collects
more than 800 thousand data sets.
OPEN DATA
The aggregate economic impact from applications
based on open data across the EU27 economy is
estimated to be €140 billion annually.
Open Government Data (OGD):
 impact economic growth,
 improving government services,
 reducing fraud,
 reducing wastes.
The McKinsey Global Institute report estimated
that open data could add over $3 trillion
annually in total value to the global economy.
The list of researches indicates the existence of
data quality problems in open data:
 Ferney et al., 2017;
 Kerr et al., 2007;
 Kuk and Davies, 2011;
 Martin, 2014;
 Nikiforova, 2018a, 2018b;
 Nikiforova and Bicevskis, 2019;
 Vetrò et al., 2016
 etc..
8 PRINCIPLES OF OPEN DATA
 OGD: the quality aspect takes only the 4th place by popularity after policy,
benefit and risk, although quality can impact these aspects. (Klein et al., 2018)
 Data quality appears as one of most problematical dimensions for
open data portals.
Def. II: «Quality» is a desirable goal to be achieved through management of the production process.
Def. III: «Data quality» is a relative concept, largely dependent on specific requirements resulting from the data use.
(SunlightFoundation, 2007), (European Data Portal, 2018)
Open data must be:
1. complete 3. primary
2. timely 4. accessible
7. machine-processable
5. non-discriminatory
6. licence-free 8. non-proprietary
And what about data quality*???
*
Latvia:
 is one of 70 countries participating in the
Open Government Partnership - an
international platform for domestic reformers that
committed to making their governments more open,
accountable, and responsive to citizens;
 is the fast-tracker (among beginners,
followers, fast-trackers, trend-setters);
-Open Data Maturity report
 has the highest rate of open data maturity in
comparison with neighbourhoods from
Baltic States and Scandinavian countries.
THE STATE OF OPEN DATA IN LATVIA
 In 2017 the Latvian Ministry of Environmental Protection and Regional Development
has launched the new Latvian Open Data Portal:
 The state of the quality for Latvia is the worst aspect among
impact, policy, portal, and quality (only 62% while the average
is 71%), compared with the average rate for all analysed countries.
Open data maturity of Latvian open
data portal:
• in 2016 - 31st,
• in 2017 – 20th,
• in 2018 - 12th.
As for the quality aspect – 11th place
with just 370 out of 520 points.
at the moment of its launch
33 data sets
from 13 data publishers
in July of 2018
139 data sets
from 41 publishers
in June of 2019
228 data sets
from 62 publishers.
OPEN HEALTH(CARE) DATA I
 Aims and possible uses of open health(care) data can be very different, since
health data and information are characterized by multiple number of possible
applications, uses and users.
 The volume of health(care) data continuously
increasing, and it is expected to grow
dramatically in the years ahead.
 Open health(care) data is one of the most popular categories of open data.
(Cabitza and Batini, 2016)
Health and healthcare data are very broad concepts*,
this research focuses on one subdomain - open health
data.
*Def. IV: «Health care data» are items of knowledge
about an individual patient or a group of patients.
*Def. V: «Health data» are any representation of facts
related to the health of single individuals or entire
populations and that is suitable for communication,
interpretation or processing by manual or electronic
means;
(World Health Organization, 2003)
Abdelhak M, Grostick S, Hanken MA, 2012)
Healthcare is characterized by highly complex labor-
and skill intensive services where the actors involved
still rely primarily on paper tools, their own cognition
(competencies and memory), and
other traditional methods.
(Cabitza & Batini, 2016) HUMAN FACTOR!!!
OPEN HEALTH(CARE) DATA II
 Between 56% and 79% of Internet users seek
health information online:
 - 35%,
 - 42%,
with the lowest proportion in the Southern
countries:
 - 30%,
 - 23%.
(Andreassen et al., 2007)
 Open health(care) data must be of high quality, as they:
 are needed for health(care) planning and administrative purposes:
 can be useful searching data on medications, their dose, contraindications and other
information available for the wide audience.
• provide a sampling frame for medical
research,
• facilitate quality assurance of the
health(care) services,
• etc.
• form the basis for health and medicines authority’s
hospital statistics, or health economic calculations,
• provide authorities with data to support hospital
planning,
• monitor the frequency of various diseases and
treatments,
The list of researches discussing quality of health(care) data in many
countries comes to the one conclusion –
health(care) data have
data quality problems.
Assumption: as the level of details of “open” data
might be lower in comparison with “closed” data
stored in databases, quality checks can be simpler.
open data are usually used by wide audience that
may not have deep knowledge in IT or data quality areas
a solution should be simple enough
ensuring particular users with possibility to take part in
the analysis of «third-party» open data
for their own purposes
OPEN [HEALTH] DATA QUALITY
Solution: previously proposed user-oriented data object-driven
approach
(Bicevskis, Bicevska, Nikiforova, Oditis, 2018), (Nikiforova, 2019)
!!! The same data may be
sufficiently qualitative in one case
BUT
completely useless under other
circumstances.
 General studies on data and information quality - define different dimensions of quality and their
groupings.
✘ The key data quality dimensions are not universally*;
✘ There is no agreement on their meanings and usability **;
✘ Each dimension can be supplied with one or more metrics that varies from one solution
to another;
✘ The number of different data quality dimensions, their definitions and grouping are often
useful for only particular solution.
Question: How to relate particular dimension (and which one?) to a particular use-
case???
RELATED RESEARCHES
Problem: necessity to involve data quality experts at every stage of
data quality analysis process.
Solution: data object-driven approach to data quality evaluation.
(Bicevskis, Bicevska, Nikiforova, Oditis, 2018), (Nikiforova, Bicevskis, 2019)
* «… This state of affairs has led to much confusion within the data
quality community and is even more bewildering for those who are
new to the discipline and more importantly to business
stakeholders…» (DAMA UK, 2018)
** In different proposals, dimensions of the same name can have
different semantics and vice versa. (Batini, 2016)
Example I: (Kerr, et al., 2007):
New Zealand’s healthcare data:
 6 data quality dimensions,
 24 characteristics
 69 data quality criteria.
Example II: (Dahbi et al., 2018; Weiskopf et al.,
2013):
 2 data quality dimensions:
accuracy and completeness
TDQM data quality lifecycle
Data quality
definition
Data quality
measuring
Data quality
analysis
Data quality
improvement
MAIN PRINCIPLES OF THE
PROPOSED SOLUTION
 Each specific application can have its own specific DQ checks;
 DQ requirements can be formulated on several levels:
 DQ can be checked in various stages of the data processing;
 DQ definition language is graphical DSL:
• the diagrams are easy to read, create, understand and edit even by
non-IT and non-DQ experts;
• syntax and semantics can be easily applied to any new IS.
from informal text
in natural
language
to an automatically executable
model,
SQL statements or program code;
!!! All three components are
defined by using a graphical
domain specific language
(DSL)**
**Three DSL families were developed as graphic languages based on
the possibilities of the modelling platform DIMOD
1. DATA OBJECT (DO) - the set of values of the parameters that characterize a real-life object
 primary data object - the initial DO which quality is analysed;
 secondary data object – DO that determines the context for analysis of the primary DO.
* Many objects of the same structure form class of data objects
2. DATA QUALITY REQUIREMENTS - conditions that must be met in order a data object is
considered of high quality.
** May contain: informal or formalized implementation-independent descriptions of conditions
3. DATA QUALITY MEASURING PROCESS - procedures should be performed to
evaluate the data object’s quality.
DATA QUALITY MODEL
instead of dimensions
 15 data sets from 7 different data publishers;
 15 primary data objects, 11 secondary data objects were involved in
data quality analysis and applied on 35 parameters of primary data objects;
 The most popular and frequently occurred data quality issues:
✘ contextual data quality issues;
✘ empty values (completeness);
✘ multiple notation for the same object in scope of one data object and even
parameter;
✘ issues in interrelated parameters.
DATA QUALITY ANALYSIS OF
OPEN HEALTH(CARE) DATA
✘ only 6 out of 15 data sets are
updated as frequently as it is
promised;
✘ only 8 out of 15 data sets are
supplied with explanation of
parameters;
✔ almost all available data sets are
provided in machine-readable format:
 the most popular open data format - .xlsx
(53.3%), while 26.7% in .zip, including data
sets in .xlsx and .csv format,
✘ 1 data set cannot be considered open data.
Medicinal_Product
ISO3
varchar
ISO2
varchar
OfficialName
varchar
ShortName
varchar
Country
Code (ISO-3166-1)
varchar
ShortName_LV
varchar
OfficialName_LV
varchar
pharmaceutical_form
varchar
original_name
varchar
product_id
varchar
exp_country_en
varchar
marketing_authorisation_holder
varchar
exp_country_lv
varchar
atc_code
varchar
authorisation_procedure
enumerable {Eiropas centralizētā
reģistrācijas procedūra, Nacionālā
reģistrācijas procedūra, ...}
summary_of_product_
characteristics
varchar - pattern
Country_LV
ATC
ATC_code
varchar
 Data object is platform-independent.
 The checking of parameter values is local and
formal process.
 The quality checking for one of the DO
parameters values is an examination of
properties of the individual values, e.g. whether:
 a text string may serve as a value of the field Name,
 value of the field Address is a correct address.
 Can be formulated at different levels of abstraction:
 from the formal language grammar
 to definitions of variables in programming
languages.
DATA OBJECT
Secondary DO
Primary DO
SendMessage Assess Field "product_id"
checkValueExists(product_id)
Assess Field "original_name"
checkValueExists(original_name)
Assess Field "pharmaceutical_form"
checkValueExists(pharmaceutical_form)
SendMessage
SendMessage
SendMessage Assess Field "marketing_authorisation_holder"
checkValueExists(marketing_authorisation_holder)
Assess Field "exp_country_en"
checkValueExists(exp_country_en)
Assess Field "exp_country_lv"
checkValueExists(exp_country_lv)
Assess Field "atc_code"
checkValueExists(atc_code)
SendMessage
SendMessage
SendMessage
Assess Field "authorisation_procedure"
checkValueExists(authorisation_procedure)
checkValueEnumerable(authorisation_procedure)
Assess Field "summary_of_product_ characteristics"
checkValueExists(summary_of_product_ characteristics)
checkValueSummary_of_product_
characteristics(Summary_of_product_ characteristics,
'https://www.zva.gov.lv/zalu-registrs/attachments/
pdf.php?id=%'+'&src=description')
SendMessage
SendMessage
ISO3
ISO2
OfficialName
checkMarketing_authorisation
_holderName(Country,
marketing_authorisation_holder)
checkExp_country_enName
(Country, exp_country_en)
checkExp_country_lvName
(Country_LV, exp_country_lv)
checkAtc_codeName (ATC,
atc_code)
ShortName
ATC_code
ShortName_LV
Code (ISO-3166-1)
OfficialName_LV
OK
OK
OK
NO
NO
NO
OK
OK
NO
OK
NO
NO
NO
OK
OK
NO
NO
OK
 Quality conditions are defined only for the
primary data object.
 DQ requirements are defined by using logical
expressions.
 The names of DO attributes/ fields serve as
operands in the logical expressions.
 Both syntactical and semantical data quality
can be analysed according to unified principles.
DATA QUALITY SPECIFICATION
Secondary DO
Link between
primary and
secondary DOs
(informal rule)
DATA QUALITY MEASURING
PROCESS
The activities to be taken to select data object values from data sources.
One or more steps to evaluate the quality of the data, each of which describes one
test for the compliance of the data object with a specific quality specification.
+
Gather values of the secondary DOs from the data sources if the parameter
indicating the secondary DO’s value in scope of defined quality condition is true:
1. read/ write operations from data source into database,
2. connection of primary and secondary data objects via appropriate
parameters
The steps to improve data quality automatically or manually triggering changes
in the data source.
For contextual
checks
 The language describing the quality evaluation
process involves verification activities for a
particular DO that can be defined:
 informally as a natural language text,
 using UML activity diagrams,
 in the own DSL.
 Additionally, processing of DO classes
instances may require looping constructions,
similar to iterator used in C#.
 A concrete DO or a class of DO is used as
an input for a quality verification process.
 The quality verification process creates a
test protocol.
In case of SQL:
 SELECT statement specifies the target DO
 WHERE clause specifies quality requirements
+
 JOIN clause link primary and secondary DOs
DATA QUALITY MEASURING
PROCESS
Read data from data sources and write into DB
"Medicinal_Product"
Read data from data
sources and write into
DB "Country"
SendMessage
Assess Field "product_id"
SELECT * from [dbo].[Medicinal_product] WHERE [ product_id] IS
NULL
Assess Field "original_name"
SELECT * from [dbo].[Medicinal_product] WHERE [original_name]
IS NULL
Assess Field "pharmaceutical_form"
SELECT * from [dbo].[Medicinal_product] WHERE
[pharmaceutical_form] IS NULL
SendMessage
SendMessage
SendMessage
Assess Field "marketing_authorisation_holder"
select * from [dbo].[Medicinal_product] LEFT JOIN [dbo].[country] ON
[dbo].[country].[Short name] = (right(marketing_authorisation_holder,
charindex(',', reverse(marketing_authorisation_holder)) - 2)) OR
[dbo].[country].[Official name] = (right(marketing_authorisation_holder,
charindex(',', reverse(marketing_authorisation_holder)) - 2)) OR
[dbo].[country].[ISO3] = (right(marketing_authorisation_holder,
charindex(',', reverse(marketing_authorisation_holder)) - 2)) WHERE
[dbo].[country].[Short name] IS NULL AND [dbo].[country].[Official
name] IS NULL AND [dbo].[country].[ISO3] IS NULL
Assess Field "exp_country_en"
select * from [dbo].[Medicinal_product] LEFT JOIN [dbo].[country] ON
[dbo].[country].[Short name] = (exp_country_en) OR
[dbo].[country].[Official name] = (exp_country_en) OR
[dbo].[country].[ISO3] = (exp_country_en) WHERE
[dbo].[country].[Short name] IS NULL AND [dbo].[country].[Official
name] IS NULL AND [dbo].[country].[ISO3] IS NULL
Assess Field "exp_country_lv"
select * from [dbo].[Medicinal_product] LEFT JOIN [dbo].[country_lv] ON
[dbo].[country_lv].[Code (ISO-3166-1)] = (exp_country_lv) OR
[dbo].[country_lv].[ShortName_LV] = (exp_country_lv) OR
[dbo].[country_lv].[LongName_LV] = (exp_country_lv) WHERE
[dbo].[country_lv].[ Code (ISO-3166-1)] IS NULL AND
[dbo].[country_lv].[ShortName_LV] IS NULL AND [dbo].[country_lv].[
LongName_LV] IS NULL
Assess Field "atc_code"
SELECT product_id, REPLACE(SUBSTRING(atc_code,
CHARINDEX(';', atc_code), LEN(atc_code)), ';', '') as atc1,
LEFT(atc_code, CHARINDEX(';', atc_code) - 1) as atc2 into
#atc_divided FROM [dbo].[Medicinal_product] WHERE
LEFT(atc_code, CHARINDEX(';', atc_code) - 0) NOT LIKE '';
SELECT product_id FROM [dbo].[Medicinal_product] LEFT
JOIN [dbo].[ATC] ON [dbo].[ATC].[ATC_code] =
[dbo].[Medicinal_product].[atc_code] WHERE
[dbo].[ATC].[ATC_code] IS NULL EXCEPT SELECT
product_id FROM #atc_divided
SendMessage
SendMessage
SendMessage
Read data from data
sources and write into
DB "Country_LV"
Read data from data sources
and write into DB "ATC"
Assess Field "authorisation_procedure"
SELECT * from [dbo].[Medicinal_product] WHERE
authorisation_procedure IS NULL OR authorisation_procedure
NOT LIKE 'Eiropas centralizētā reģistrācijas procedūra' AND
authorisation_procedure NOT LIKE 'Nacionālā reģistrācijas
procedūra' AND ... AND authorisation_procedure NOT LIKE
'Decentralizētā reģistrācijas procedūra'
Assess Field "summary_of_product_ characteristics"
SELECT * from [dbo].[Medicinal_product] WHERE where
summary_of_product_characteristics IS NULL OR
summary_of_product_characteristics NOT LIKE
'https://www.zva.gov.lv/zalu-registrs/attachments/
pdf.php?id=%'+'&src=description'
SendMessage
SendMessage
OK
OK
OK
NO
NO
NO
OK
OK
NO
OK
NO
NO
NO
OK
OK
NO
NO
OK
Publisher Dataset Context
issues/
context total
Empty/
Total
Multiple
notation/
Total
Clean/
Total
Centre for Disease Prevention
and Control
Incidence of 2nd type diabetes in Latvia
- 0/6 0/6 (0) 6/6
Ministry of Welfare
Distribution of persons receiving tech aid by AT 2/2 (100%) 3/7 (43%) 0/7 (0) 2/7
Number of social service providers
2/2 (100%) 22/27
(82%)
10/27 (37%) 4/27
Persons with disabilities by the severity of the
disability and AT
2/2 (100%) 0/23 (0) 0/23 (0) 20/23
Number of children with disabilities by AT 2/2 (100%) 0/10 (0) 0/10 (0) 8/10
State labour inspectorate
Accidents at work
(0-1/1)
(0-100%)
1/10 (10%) 0/10 (0) 8/10
Occupational diseases confirmed 4/5 (80%) 2/11 (18%) 1/11 (0.09%) 9/11
National Blood Donor Centre
Statistics
National Blood Donor Center Statistics - 0/4 (0) 0/4 (0) 4/4
State Agency of medicines
Register of licensed pharmaceutical companies
1/2 (50%) 17/38
(45%)
0/38 (0) 19/38
Medicines consumption statistics 3/3 (100%) 5/8 (63%) 2/8 (25%) 0/8
Medicinal Product Register of Latvia
4/9 (44%) 21/41
(51%)
1/41 (2%) 14/41
Food and veterinary service
Food supplements register
2/2 (100%) 30/35
(86%)
4/35 (11%) 5/35
Dietary foodstuffs register
2/2 (100%) 19/22
(87%)
4/22 (18%) 3/22
APPROBATION. RESULTS
DATA QUALITY ANALYSIS OF OPEN
HEALTH(CARE) DATA: CONTEXTUAL ISSUES
 Only 1 data set out of 12 (8.3%) didn’t had any data quality issues
(“Accidents at work”), however, some manipulations were needed in order to
achieve this result.
 In total 25 out of 35 parameters (71.4%) had at least few
data quality issues.
Data set “Accidents at work”
Value: «88.3332-03»
«88.3332-03»
Data set «Work codes»
Value I: “8332” AND value II: “03”
Value I: “8332”
AND
Value II: “03”
=
Example II: 4 data sets published by the
Ministry of Welfare:
 [ATTU code] and [City, county] parameters are
supposed to store the code of the administrative
territory and city that must correspond to the
secondary data object “Classification of
Administrative Territories and Territorial Units”;
✘ 3 values are invalid – aren’t available in the secondary
data set: “Total”, “Abroad” and “Address isn’t specified”.
 Possibly, the data publisher is aware of this, as the
appropriate values make sense;
BUT!!!
!!! This data quality problem can be easily
unnoticed and can lead to inaccurate data analysis
results.
 Example I: “Number of social service providers” data set: 3 parameters: [Service with
accommodation] and [Service without accommodation] and [Service with and without
accommodation];
BUT!!! For 95 records this assumption is not in force.
 Example II: “Number of children with disabilities by administrative territory” data set:
For 121 records this assumption is not in force.
At least two possible explanations:
1) there are data quality problems;
2) these field aren’t interconnected, and the sum of values of the first two parameters not necessarily should
be equal with the value of the 3rd parameter.
From the users’ viewpoint:
[Service with and without accommodation] = [Service with accommodation] + [Service without
accommodation]
DATA QUALITY ANALYSIS OF OPEN
HEALTH(CARE) DATA: CONTEXTUAL ISSUES
Another problem for 4 out of 15 data sets (26.7%) - different
number of interrelated values that may appear in different ways:
(a) values in different languages,
(b) ID number and name,
(c) name and supplementary data such as type, country, phone
number of representatives.
which of these
options???
Dataset Context
issues/
context
total
Incidence of 2nd type diabetes in Latvia 0/0
Distribution of persons receiving tech aid by AT 2/2 (100%)
Number of social service providers 2/2 (100%)
Persons with disabilities by the severity of the
disability …
2/2 (100%)
Number of children with disabilities by AT 2/2 (100%)
Accidents at work
(0-1/1)
(0-100%)
Occupational diseases confirmed 4/5 (80%)
National Blood Donor Center Statistics 0/0
Register of licensed pharmaceutical companies 1/2 (50%)
Medicines consumption statistics 3/3 (100%)
Medicinal Product Register of Latvia 4/9 (44%)
Food supplements register 2/2 (100%)
Dietary foodstuffs register 2/2 (100%)
Veterinary medicinal product register 1/3 (33%)
[1# group] = [18-29 years 1# group] + [30-44 years 1# group] + … + [>=65 years 1# group];
[2# group] = [18-29 years 2# group] + [30-44 years 2# group] + … + [>=65 years 2# group];
[3# group] = [18-29 years 3# group] + [30-44 years 3# group] + … + [>=65 years 3# group]
!!! Data publishers must provide a brief explanation of the parameters and how numerical data was
gotten
DATA QUALITY ANALYSIS OF OPEN
HEALTH(CARE) DATA: COMPLETENESS
 For 136 out of 167 (81.4%) analysed parameters at least one value was empty.
 The number of empty values per parameter varies from 1 to all values of a certain
parameter.
 The total number of empty values in analysed data sets is 15%.
 Problem of empty values appears even for the primary data of the data sets:
 Example: “Dietary foodstuffs register”data set:
✘ 4 records don’t have [Name] and [ProducerName].
 This issue is almost “traditional” in many sectors and
countries.
 However, some researches demonstrate a high level of data
completeness can be achieved.
(Schmidt et al., 2015)
(Oliveira, 2016)
(Wanner et al.,
2018) (Tomic, 2015)
(Yi, 2019)
(Sigurdardottir, 2012) (Larsen, 2009)
Dataset Empty/
Total
Incidence of 2nd type diabetes in Latvia 0/6
Distribution of persons receiving tech aid by AT 3/7 (43%)
Number of social service providers
22/27
(82%)
Persons with disabilities by the severity of the
disability …
0/23 (0)
Number of children with disabilities by AT 0/10 (0)
Accidents at work 1/10 (10%)
Occupational diseases confirmed 2/11 (18%)
National Blood Donor Center Statistics 0/4 (0)
Register of licensed pharmaceutical companies
17/38
(45%)
Medicines consumption statistics 5/8 (63%)
Medicinal Product Register of Latvia
21/41
(51%)
Food supplements register
30/35
(86%)
Dietary foodstuffs register
19/22
(87%)
Veterinary medicinal product register
16/26
(62%)
NOTE: 28 of 136 detected empty values may not be considered
as quality issues, however, while there are no any notes from the
data publisher regarding their nullability, there is no certainty,
that there are no any problems there, as
empty values may have different
interpretations.
DATA QUALITY ANALYSIS OF OPEN
HEALTH(CARE) DATA: MULTIPLE NOTATIONS
FOR A SINGLE OBJECT
 Multiple notations for a single object within a single data set and even a
parameter:
✘ in 6 out of 15 data sets (40%) in 22 out of 167 parameters (13.2%).
May appear in different ways such as a different name:
• This problem is also widely spread for many
sectors and even countries.
OGD of the UK (Kuk and Davies, 2011).
for one country
for instance,
(a) USA vs. United States
vs. United States of
America;
(b) Northern Ireland vs.
Republic of Ireland
vs. Ireland;
(c) Scotland vs. Scotland
UK, etc.
different patterns for
one value
for instance, phone or
registration number: with or
without (1) code or (2)
delimiter; type of delimiter
etc.
different notations indicating
the absence of a value: NULL
and ‘0’**
Do both NULL and ‘0’ values have
the same meaning???
‘0’ can point out to the value that is
equal to zero, while NULL can mean
that the value isn’t known.
**often called “heterogeneity”
for the type of
preparation, ingredient or
unit size
for instance, (a) singular, (b)
plural, (c) shortened form, (d)
with a spelling mistake, etc.
Dataset Multiple
notation/
Total
Incidence of 2nd type diabetes in Latvia 0/6 (0)
Distribution of persons receiving tech aid by AT 0/7 (0)
Number of social service providers 10/27 (37%)
Persons with disabilities by the severity of the
disability …
0/23 (0)
Number of children with disabilities by AT 0/10 (0)
Accidents at work 0/10 (0)
Occupational diseases confirmed
1/11
(0.09%)
National Blood Donor Center Statistics 0/4 (0)
Register of licensed pharmaceutical companies 0/38 (0)
Medicines consumption statistics 2/8 (25%)
Medicinal Product Register of Latvia 1/41 (2%)
Food supplements register 4/35 (11%)
Dietary foodstuffs register 4/22 (18%)
Veterinary medicinal product register 0/26 (0)
In 5 out 8 cases it could be solved, involving
the mechanisms, controlling the list of
permissible values.
 Despite the importance of data quality, the quality of open data is not always one of the main
areas of analysis and evaluation of open data.
 Open health(care) data have a high number of different data quality problem, however, data
publishers (who provides data used in their IS), probably, don’t even aware of them. The most
frequently occurred are:
✘ contextual data quality issues;
✘ empty values even for primary data;
✘ multiple denotation for the same object within one data object and even a parameter;
✘ issues on interrelated parameters.
RESULTS I
 Such an analysis and use of a data object-driven approach to data quality evaluation can be applied
not only to open health(care) data but also to other structured and semi-structured data - this solution
is effective in many domains.
 The advantages of the used approach:
 it can be applied to “third-party” data sets without any information on how data were accrued and processed – it is an
external mechanism with a higher level of abstraction,
 it can be used even by users without IT and DQ knowledge.
 The use of open data brings significant benefits data providers as because of multiple number of possible use-cases,
data users address various challenges that can rarely be solved by data providers alone.
This can improve data quality not only at the national level, but also at the international level.
RESULTS II
THANK YOU!
For more information, see ResearchGate
See also anastasijanikiforova.com
For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv
Article: Nikiforova, A. (2019). Analysis of open health data quality using data object-driven approach to data
quality evaluation: insights from a Latvian context. In IADIS International Conference e-Health (pp. 119-126).

More Related Content

What's hot

Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Anastasija Nikiforova
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesDr. Amarjeet Singh
 
Augmenting Open Government Data with Social Media Data
Augmenting Open Government Data with Social Media DataAugmenting Open Government Data with Social Media Data
Augmenting Open Government Data with Social Media DataEvangelos Kalampokis
 
Open Government Data - updates from around the world
Open Government Data - updates from around the worldOpen Government Data - updates from around the world
Open Government Data - updates from around the worldAndrew Stott
 
NIC Linked Data: the OHIO project
NIC Linked Data:   the OHIO projectNIC Linked Data:   the OHIO project
NIC Linked Data: the OHIO projectMichael Wilkinson
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONIJDKP
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Michel Dumontier
 
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...eraser Juan José Calderón
 
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...eraser Juan José Calderón
 
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)UN Global Pulse
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in WikidataElena Simperl
 
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Big Data Spain
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationKan Yuenyong
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesMichel Dumontier
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Jyotindra Zaveri
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsKan Yuenyong
 
Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report UN Global Pulse
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )IJDKP
 
The linked open government data and metadata lifecycle
The linked open government data and metadata lifecycleThe linked open government data and metadata lifecycle
The linked open government data and metadata lifecycleOpen Data Support
 

What's hot (20)

Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
Invited talk "Open Data as a driver of Society 5.0: how you and your scientif...
 
A Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: ChallengesA Survey on Big Data Analytics: Challenges
A Survey on Big Data Analytics: Challenges
 
Augmenting Open Government Data with Social Media Data
Augmenting Open Government Data with Social Media DataAugmenting Open Government Data with Social Media Data
Augmenting Open Government Data with Social Media Data
 
LOD2 Open Government Data Stakeholder Survey, Michael Martin and Martin Kalte...
LOD2 Open Government Data Stakeholder Survey, Michael Martin and Martin Kalte...LOD2 Open Government Data Stakeholder Survey, Michael Martin and Martin Kalte...
LOD2 Open Government Data Stakeholder Survey, Michael Martin and Martin Kalte...
 
Open Government Data - updates from around the world
Open Government Data - updates from around the worldOpen Government Data - updates from around the world
Open Government Data - updates from around the world
 
NIC Linked Data: the OHIO project
NIC Linked Data:   the OHIO projectNIC Linked Data:   the OHIO project
NIC Linked Data: the OHIO project
 
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTIONA SURVEY OF LINK MINING AND ANOMALIES DETECTION
A SURVEY OF LINK MINING AND ANOMALIES DETECTION
 
Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...Acclerating biomedical discovery with an internet of FAIR data and services -...
Acclerating biomedical discovery with an internet of FAIR data and services -...
 
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
Predicting Big Data Adoption in Companies With an Explanatory and Predictive ...
 
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
#StopBigTechGoverningBigTech . More than 170 Civil Society Groups Worldwide O...
 
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)
Risks, Harms and Benefits Assessment Tool (Updated as of Jan 2019)
 
Loops of humans and bots in Wikidata
Loops of humans and bots in WikidataLoops of humans and bots in Wikidata
Loops of humans and bots in Wikidata
 
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
Monitoring world geopolitics through Big Data by Tomasa Rodrigo and Álvaro Or...
 
Multipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendationMultipleregression covidmobility and Covid-19 policy recommendation
Multipleregression covidmobility and Covid-19 policy recommendation
 
Developing and assessing FAIR digital resources
Developing and assessing FAIR digital resourcesDeveloping and assessing FAIR digital resources
Developing and assessing FAIR digital resources
 
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
Knowledge graphs dedicated to the memory of amrapali zaveri 3388748
 
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMsNG2S: A Study of Pro-Environmental Tipping Point via ABMs
NG2S: A Study of Pro-Environmental Tipping Point via ABMs
 
Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report Data privacy and security in ICT4D - Meeting Report
Data privacy and security in ICT4D - Meeting Report
 
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )International Journal of Data Mining & Knowledge Management Process ( IJDKP )
International Journal of Data Mining & Knowledge Management Process ( IJDKP )
 
The linked open government data and metadata lifecycle
The linked open government data and metadata lifecycleThe linked open government data and metadata lifecycle
The linked open government data and metadata lifecycle
 

Similar to Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context

hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdfAkuhuruf
 
Data Science in Healthcare
Data Science in HealthcareData Science in Healthcare
Data Science in Healthcareijtsrd
 
How much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuationHow much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuationSean Manion PhD
 
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...ijistjournal
 
Open Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 ConferenceOpen Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 ConferenceGSDI Association
 
Open Government Data: What it is, Where it is Going, and the Opportunities fo...
Open Government Data: What it is, Where it is Going, and the Opportunities fo...Open Government Data: What it is, Where it is Going, and the Opportunities fo...
Open Government Data: What it is, Where it is Going, and the Opportunities fo...OECD Governance
 
Application of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic ReviewApplication of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic ReviewIRJET Journal
 
Benefits of Big Data in Health Care A Revolution
Benefits of Big Data in Health Care A RevolutionBenefits of Big Data in Health Care A Revolution
Benefits of Big Data in Health Care A Revolutionijtsrd
 
Mdds sundararaman 12th meeting
Mdds  sundararaman 12th meetingMdds  sundararaman 12th meeting
Mdds sundararaman 12th meetingPankaj Gupta
 
Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceVarsha Khodiyar
 
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...Innovation Enterprise
 
Open Data and the transparency of the lists of beneficiaries of EU Regional P...
Open Data and the transparency of the lists of beneficiaries of EU Regional P...Open Data and the transparency of the lists of beneficiaries of EU Regional P...
Open Data and the transparency of the lists of beneficiaries of EU Regional P...OpenCoesione
 
2016 CRI Year-in-Review
2016 CRI Year-in-Review2016 CRI Year-in-Review
2016 CRI Year-in-ReviewPeter Embi
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Big Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBig Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBYTE Project
 
Impact of DDOD on Data Quality - White House 2016
Impact of DDOD on Data Quality -  White House 2016Impact of DDOD on Data Quality -  White House 2016
Impact of DDOD on Data Quality - White House 2016David Portnoy
 

Similar to Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context (20)

Big Data, Better Understanding, Better Care
Big Data, Better Understanding, Better CareBig Data, Better Understanding, Better Care
Big Data, Better Understanding, Better Care
 
Innovative project1
Innovative project1Innovative project1
Innovative project1
 
Big Data Analytics in Health Care: A Review Paper
Big Data Analytics in Health Care: A Review PaperBig Data Analytics in Health Care: A Review Paper
Big Data Analytics in Health Care: A Review Paper
 
hariri2019.pdf
hariri2019.pdfhariri2019.pdf
hariri2019.pdf
 
Data Science in Healthcare
Data Science in HealthcareData Science in Healthcare
Data Science in Healthcare
 
How much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuationHow much is that data in the window : Healthcare data valuation
How much is that data in the window : Healthcare data valuation
 
Big Data technology
Big Data technologyBig Data technology
Big Data technology
 
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...
A BIG DATA REVOLUTION IN HEALTH CARE SECTOR: OPPORTUNITIES, CHALLENGES AND TE...
 
Open Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 ConferenceOpen Source & Open Data Session report from imaGIne 2014 Conference
Open Source & Open Data Session report from imaGIne 2014 Conference
 
Open Government Data: What it is, Where it is Going, and the Opportunities fo...
Open Government Data: What it is, Where it is Going, and the Opportunities fo...Open Government Data: What it is, Where it is Going, and the Opportunities fo...
Open Government Data: What it is, Where it is Going, and the Opportunities fo...
 
Application of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic ReviewApplication of Data Analytics to Improve Patient Care: A Systematic Review
Application of Data Analytics to Improve Patient Care: A Systematic Review
 
Benefits of Big Data in Health Care A Revolution
Benefits of Big Data in Health Care A RevolutionBenefits of Big Data in Health Care A Revolution
Benefits of Big Data in Health Care A Revolution
 
Mdds sundararaman 12th meeting
Mdds  sundararaman 12th meetingMdds  sundararaman 12th meeting
Mdds sundararaman 12th meeting
 
Digital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data scienceDigital transformation to enable a FAIR approach for health data science
Digital transformation to enable a FAIR approach for health data science
 
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
Big Data Analytics - Opportunities, Enablers, Challenges and Risks to Conside...
 
Open Data and the transparency of the lists of beneficiaries of EU Regional P...
Open Data and the transparency of the lists of beneficiaries of EU Regional P...Open Data and the transparency of the lists of beneficiaries of EU Regional P...
Open Data and the transparency of the lists of beneficiaries of EU Regional P...
 
2016 CRI Year-in-Review
2016 CRI Year-in-Review2016 CRI Year-in-Review
2016 CRI Year-in-Review
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Big Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case StudiesBig Data Socio-Economic Externalities – the BYTE Case Studies
Big Data Socio-Economic Externalities – the BYTE Case Studies
 
Impact of DDOD on Data Quality - White House 2016
Impact of DDOD on Data Quality -  White House 2016Impact of DDOD on Data Quality -  White House 2016
Impact of DDOD on Data Quality - White House 2016
 

More from Anastasija Nikiforova

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Anastasija Nikiforova
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Anastasija Nikiforova
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Anastasija Nikiforova
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Anastasija Nikiforova
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Anastasija Nikiforova
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Anastasija Nikiforova
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Anastasija Nikiforova
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Anastasija Nikiforova
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Anastasija Nikiforova
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...Anastasija Nikiforova
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Anastasija Nikiforova
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISAnastasija Nikiforova
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...Anastasija Nikiforova
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Anastasija Nikiforova
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...Anastasija Nikiforova
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...Anastasija Nikiforova
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesAnastasija Nikiforova
 
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...Anastasija Nikiforova
 
A step towards a data quality theory
 A step towards a data quality theory A step towards a data quality theory
A step towards a data quality theoryAnastasija Nikiforova
 

More from Anastasija Nikiforova (20)

Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
Data Quality for AI or AI for Data quality: advances in Data Quality Manageme...
 
Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...Towards High-Value Datasets determination for data-driven development: a syst...
Towards High-Value Datasets determination for data-driven development: a syst...
 
Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...Public data ecosystems in and for smart cities: how to make open / Big / smar...
Public data ecosystems in and for smart cities: how to make open / Big / smar...
 
Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?Artificial Intelligence for open data or open data for artificial intelligence?
Artificial Intelligence for open data or open data for artificial intelligence?
 
Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...Overlooked aspects of data governance: workflow framework for enterprise data...
Overlooked aspects of data governance: workflow framework for enterprise data...
 
Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...Data Quality as a prerequisite for you business success: when should I start ...
Data Quality as a prerequisite for you business success: when should I start ...
 
Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...Framework for understanding quantum computing use cases from a multidisciplin...
Framework for understanding quantum computing use cases from a multidisciplin...
 
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
Data Lake or Data Warehouse? Data Cleaning or Data Wrangling? How to Ensure t...
 
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...Putting FAIR Principles in the Context of Research Information: FAIRness for ...
Putting FAIR Principles in the Context of Research Information: FAIRness for ...
 
Open data hackathon as a tool for increased engagement of Generation Z: to h...
Open data hackathon as a tool for increased engagement of Generation Z:  to h...Open data hackathon as a tool for increased engagement of Generation Z:  to h...
Open data hackathon as a tool for increased engagement of Generation Z: to h...
 
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
Barriers to Openly Sharing Government Data: Towards an Open Data-adapted Inno...
 
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRISCombining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
Combining Data Lake and Data Wrangling for Ensuring Data Quality in CRIS
 
The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...The role of open data in the development of sustainable smart cities and smar...
The role of open data in the development of sustainable smart cities and smar...
 
Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...Data security as a top priority in the digital world: preserve data value by ...
Data security as a top priority in the digital world: preserve data value by ...
 
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
ShoBeVODSDT: Shodan and Binary Edge based vulnerable open data sources detect...
 
Atvērto datu potenciāls
Atvērto datu potenciālsAtvērto datu potenciāls
Atvērto datu potenciāls
 
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
ATVĒRTO DATU SAVLAICĪGUMS NACIONĀLAJOS ATVĒRTO DATU PORTĀLOS AR PANDĒMIJU SAI...
 
Towards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business ProcessesTowards a Concurrence Analysis in Business Processes
Towards a Concurrence Analysis in Business Processes
 
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
DATA QUALITY MODEL-BASED TESTING OF INFORMATION SYSTEMS: THE USE-CASE OF E-SC...
 
A step towards a data quality theory
 A step towards a data quality theory A step towards a data quality theory
A step towards a data quality theory
 

Recently uploaded

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context

  • 1. ANALYSIS OF OPEN HEALTH DATA QUALITY USING DATA OBJECT-DRIVEN APPROACH TO DATA QUALITY EVALUATION: INSIGHTS FROM A LATVIAN CONTEXT 13th Multi Conference on Computer Science and Information Systems 11th International Conference on e-Health 17 – 19 July 2019, Porto, Portugal Anastasija Nikiforova Faculty of Computing, University of Latvia Anastasija.Nikiforova@lu.lv
  • 2. (The New York Times, The Economist, WIRED) Def. I: «Open data» are data that anyone can access, use and share.  The popularity of open data continuously increases.  For instance, European Data Portal collects more than 800 thousand data sets. OPEN DATA The aggregate economic impact from applications based on open data across the EU27 economy is estimated to be €140 billion annually. Open Government Data (OGD):  impact economic growth,  improving government services,  reducing fraud,  reducing wastes. The McKinsey Global Institute report estimated that open data could add over $3 trillion annually in total value to the global economy.
  • 3. The list of researches indicates the existence of data quality problems in open data:  Ferney et al., 2017;  Kerr et al., 2007;  Kuk and Davies, 2011;  Martin, 2014;  Nikiforova, 2018a, 2018b;  Nikiforova and Bicevskis, 2019;  Vetrò et al., 2016  etc.. 8 PRINCIPLES OF OPEN DATA  OGD: the quality aspect takes only the 4th place by popularity after policy, benefit and risk, although quality can impact these aspects. (Klein et al., 2018)  Data quality appears as one of most problematical dimensions for open data portals. Def. II: «Quality» is a desirable goal to be achieved through management of the production process. Def. III: «Data quality» is a relative concept, largely dependent on specific requirements resulting from the data use. (SunlightFoundation, 2007), (European Data Portal, 2018) Open data must be: 1. complete 3. primary 2. timely 4. accessible 7. machine-processable 5. non-discriminatory 6. licence-free 8. non-proprietary And what about data quality*??? *
  • 4. Latvia:  is one of 70 countries participating in the Open Government Partnership - an international platform for domestic reformers that committed to making their governments more open, accountable, and responsive to citizens;  is the fast-tracker (among beginners, followers, fast-trackers, trend-setters); -Open Data Maturity report  has the highest rate of open data maturity in comparison with neighbourhoods from Baltic States and Scandinavian countries. THE STATE OF OPEN DATA IN LATVIA  In 2017 the Latvian Ministry of Environmental Protection and Regional Development has launched the new Latvian Open Data Portal:  The state of the quality for Latvia is the worst aspect among impact, policy, portal, and quality (only 62% while the average is 71%), compared with the average rate for all analysed countries. Open data maturity of Latvian open data portal: • in 2016 - 31st, • in 2017 – 20th, • in 2018 - 12th. As for the quality aspect – 11th place with just 370 out of 520 points. at the moment of its launch 33 data sets from 13 data publishers in July of 2018 139 data sets from 41 publishers in June of 2019 228 data sets from 62 publishers.
  • 5. OPEN HEALTH(CARE) DATA I  Aims and possible uses of open health(care) data can be very different, since health data and information are characterized by multiple number of possible applications, uses and users.  The volume of health(care) data continuously increasing, and it is expected to grow dramatically in the years ahead.  Open health(care) data is one of the most popular categories of open data. (Cabitza and Batini, 2016) Health and healthcare data are very broad concepts*, this research focuses on one subdomain - open health data. *Def. IV: «Health care data» are items of knowledge about an individual patient or a group of patients. *Def. V: «Health data» are any representation of facts related to the health of single individuals or entire populations and that is suitable for communication, interpretation or processing by manual or electronic means; (World Health Organization, 2003) Abdelhak M, Grostick S, Hanken MA, 2012) Healthcare is characterized by highly complex labor- and skill intensive services where the actors involved still rely primarily on paper tools, their own cognition (competencies and memory), and other traditional methods. (Cabitza & Batini, 2016) HUMAN FACTOR!!!
  • 6. OPEN HEALTH(CARE) DATA II  Between 56% and 79% of Internet users seek health information online:  - 35%,  - 42%, with the lowest proportion in the Southern countries:  - 30%,  - 23%. (Andreassen et al., 2007)  Open health(care) data must be of high quality, as they:  are needed for health(care) planning and administrative purposes:  can be useful searching data on medications, their dose, contraindications and other information available for the wide audience. • provide a sampling frame for medical research, • facilitate quality assurance of the health(care) services, • etc. • form the basis for health and medicines authority’s hospital statistics, or health economic calculations, • provide authorities with data to support hospital planning, • monitor the frequency of various diseases and treatments, The list of researches discussing quality of health(care) data in many countries comes to the one conclusion – health(care) data have data quality problems.
  • 7. Assumption: as the level of details of “open” data might be lower in comparison with “closed” data stored in databases, quality checks can be simpler. open data are usually used by wide audience that may not have deep knowledge in IT or data quality areas a solution should be simple enough ensuring particular users with possibility to take part in the analysis of «third-party» open data for their own purposes OPEN [HEALTH] DATA QUALITY Solution: previously proposed user-oriented data object-driven approach (Bicevskis, Bicevska, Nikiforova, Oditis, 2018), (Nikiforova, 2019) !!! The same data may be sufficiently qualitative in one case BUT completely useless under other circumstances.
  • 8.  General studies on data and information quality - define different dimensions of quality and their groupings. ✘ The key data quality dimensions are not universally*; ✘ There is no agreement on their meanings and usability **; ✘ Each dimension can be supplied with one or more metrics that varies from one solution to another; ✘ The number of different data quality dimensions, their definitions and grouping are often useful for only particular solution. Question: How to relate particular dimension (and which one?) to a particular use- case??? RELATED RESEARCHES Problem: necessity to involve data quality experts at every stage of data quality analysis process. Solution: data object-driven approach to data quality evaluation. (Bicevskis, Bicevska, Nikiforova, Oditis, 2018), (Nikiforova, Bicevskis, 2019) * «… This state of affairs has led to much confusion within the data quality community and is even more bewildering for those who are new to the discipline and more importantly to business stakeholders…» (DAMA UK, 2018) ** In different proposals, dimensions of the same name can have different semantics and vice versa. (Batini, 2016) Example I: (Kerr, et al., 2007): New Zealand’s healthcare data:  6 data quality dimensions,  24 characteristics  69 data quality criteria. Example II: (Dahbi et al., 2018; Weiskopf et al., 2013):  2 data quality dimensions: accuracy and completeness
  • 9. TDQM data quality lifecycle Data quality definition Data quality measuring Data quality analysis Data quality improvement MAIN PRINCIPLES OF THE PROPOSED SOLUTION  Each specific application can have its own specific DQ checks;  DQ requirements can be formulated on several levels:  DQ can be checked in various stages of the data processing;  DQ definition language is graphical DSL: • the diagrams are easy to read, create, understand and edit even by non-IT and non-DQ experts; • syntax and semantics can be easily applied to any new IS. from informal text in natural language to an automatically executable model, SQL statements or program code;
  • 10. !!! All three components are defined by using a graphical domain specific language (DSL)** **Three DSL families were developed as graphic languages based on the possibilities of the modelling platform DIMOD 1. DATA OBJECT (DO) - the set of values of the parameters that characterize a real-life object  primary data object - the initial DO which quality is analysed;  secondary data object – DO that determines the context for analysis of the primary DO. * Many objects of the same structure form class of data objects 2. DATA QUALITY REQUIREMENTS - conditions that must be met in order a data object is considered of high quality. ** May contain: informal or formalized implementation-independent descriptions of conditions 3. DATA QUALITY MEASURING PROCESS - procedures should be performed to evaluate the data object’s quality. DATA QUALITY MODEL instead of dimensions
  • 11.  15 data sets from 7 different data publishers;  15 primary data objects, 11 secondary data objects were involved in data quality analysis and applied on 35 parameters of primary data objects;  The most popular and frequently occurred data quality issues: ✘ contextual data quality issues; ✘ empty values (completeness); ✘ multiple notation for the same object in scope of one data object and even parameter; ✘ issues in interrelated parameters. DATA QUALITY ANALYSIS OF OPEN HEALTH(CARE) DATA ✘ only 6 out of 15 data sets are updated as frequently as it is promised; ✘ only 8 out of 15 data sets are supplied with explanation of parameters; ✔ almost all available data sets are provided in machine-readable format:  the most popular open data format - .xlsx (53.3%), while 26.7% in .zip, including data sets in .xlsx and .csv format, ✘ 1 data set cannot be considered open data.
  • 12. Medicinal_Product ISO3 varchar ISO2 varchar OfficialName varchar ShortName varchar Country Code (ISO-3166-1) varchar ShortName_LV varchar OfficialName_LV varchar pharmaceutical_form varchar original_name varchar product_id varchar exp_country_en varchar marketing_authorisation_holder varchar exp_country_lv varchar atc_code varchar authorisation_procedure enumerable {Eiropas centralizētā reģistrācijas procedūra, Nacionālā reģistrācijas procedūra, ...} summary_of_product_ characteristics varchar - pattern Country_LV ATC ATC_code varchar  Data object is platform-independent.  The checking of parameter values is local and formal process.  The quality checking for one of the DO parameters values is an examination of properties of the individual values, e.g. whether:  a text string may serve as a value of the field Name,  value of the field Address is a correct address.  Can be formulated at different levels of abstraction:  from the formal language grammar  to definitions of variables in programming languages. DATA OBJECT Secondary DO Primary DO
  • 13. SendMessage Assess Field "product_id" checkValueExists(product_id) Assess Field "original_name" checkValueExists(original_name) Assess Field "pharmaceutical_form" checkValueExists(pharmaceutical_form) SendMessage SendMessage SendMessage Assess Field "marketing_authorisation_holder" checkValueExists(marketing_authorisation_holder) Assess Field "exp_country_en" checkValueExists(exp_country_en) Assess Field "exp_country_lv" checkValueExists(exp_country_lv) Assess Field "atc_code" checkValueExists(atc_code) SendMessage SendMessage SendMessage Assess Field "authorisation_procedure" checkValueExists(authorisation_procedure) checkValueEnumerable(authorisation_procedure) Assess Field "summary_of_product_ characteristics" checkValueExists(summary_of_product_ characteristics) checkValueSummary_of_product_ characteristics(Summary_of_product_ characteristics, 'https://www.zva.gov.lv/zalu-registrs/attachments/ pdf.php?id=%'+'&src=description') SendMessage SendMessage ISO3 ISO2 OfficialName checkMarketing_authorisation _holderName(Country, marketing_authorisation_holder) checkExp_country_enName (Country, exp_country_en) checkExp_country_lvName (Country_LV, exp_country_lv) checkAtc_codeName (ATC, atc_code) ShortName ATC_code ShortName_LV Code (ISO-3166-1) OfficialName_LV OK OK OK NO NO NO OK OK NO OK NO NO NO OK OK NO NO OK  Quality conditions are defined only for the primary data object.  DQ requirements are defined by using logical expressions.  The names of DO attributes/ fields serve as operands in the logical expressions.  Both syntactical and semantical data quality can be analysed according to unified principles. DATA QUALITY SPECIFICATION Secondary DO Link between primary and secondary DOs (informal rule)
  • 14. DATA QUALITY MEASURING PROCESS The activities to be taken to select data object values from data sources. One or more steps to evaluate the quality of the data, each of which describes one test for the compliance of the data object with a specific quality specification. + Gather values of the secondary DOs from the data sources if the parameter indicating the secondary DO’s value in scope of defined quality condition is true: 1. read/ write operations from data source into database, 2. connection of primary and secondary data objects via appropriate parameters The steps to improve data quality automatically or manually triggering changes in the data source. For contextual checks  The language describing the quality evaluation process involves verification activities for a particular DO that can be defined:  informally as a natural language text,  using UML activity diagrams,  in the own DSL.  Additionally, processing of DO classes instances may require looping constructions, similar to iterator used in C#.
  • 15.  A concrete DO or a class of DO is used as an input for a quality verification process.  The quality verification process creates a test protocol. In case of SQL:  SELECT statement specifies the target DO  WHERE clause specifies quality requirements +  JOIN clause link primary and secondary DOs DATA QUALITY MEASURING PROCESS Read data from data sources and write into DB "Medicinal_Product" Read data from data sources and write into DB "Country" SendMessage Assess Field "product_id" SELECT * from [dbo].[Medicinal_product] WHERE [ product_id] IS NULL Assess Field "original_name" SELECT * from [dbo].[Medicinal_product] WHERE [original_name] IS NULL Assess Field "pharmaceutical_form" SELECT * from [dbo].[Medicinal_product] WHERE [pharmaceutical_form] IS NULL SendMessage SendMessage SendMessage Assess Field "marketing_authorisation_holder" select * from [dbo].[Medicinal_product] LEFT JOIN [dbo].[country] ON [dbo].[country].[Short name] = (right(marketing_authorisation_holder, charindex(',', reverse(marketing_authorisation_holder)) - 2)) OR [dbo].[country].[Official name] = (right(marketing_authorisation_holder, charindex(',', reverse(marketing_authorisation_holder)) - 2)) OR [dbo].[country].[ISO3] = (right(marketing_authorisation_holder, charindex(',', reverse(marketing_authorisation_holder)) - 2)) WHERE [dbo].[country].[Short name] IS NULL AND [dbo].[country].[Official name] IS NULL AND [dbo].[country].[ISO3] IS NULL Assess Field "exp_country_en" select * from [dbo].[Medicinal_product] LEFT JOIN [dbo].[country] ON [dbo].[country].[Short name] = (exp_country_en) OR [dbo].[country].[Official name] = (exp_country_en) OR [dbo].[country].[ISO3] = (exp_country_en) WHERE [dbo].[country].[Short name] IS NULL AND [dbo].[country].[Official name] IS NULL AND [dbo].[country].[ISO3] IS NULL Assess Field "exp_country_lv" select * from [dbo].[Medicinal_product] LEFT JOIN [dbo].[country_lv] ON [dbo].[country_lv].[Code (ISO-3166-1)] = (exp_country_lv) OR [dbo].[country_lv].[ShortName_LV] = (exp_country_lv) OR [dbo].[country_lv].[LongName_LV] = (exp_country_lv) WHERE [dbo].[country_lv].[ Code (ISO-3166-1)] IS NULL AND [dbo].[country_lv].[ShortName_LV] IS NULL AND [dbo].[country_lv].[ LongName_LV] IS NULL Assess Field "atc_code" SELECT product_id, REPLACE(SUBSTRING(atc_code, CHARINDEX(';', atc_code), LEN(atc_code)), ';', '') as atc1, LEFT(atc_code, CHARINDEX(';', atc_code) - 1) as atc2 into #atc_divided FROM [dbo].[Medicinal_product] WHERE LEFT(atc_code, CHARINDEX(';', atc_code) - 0) NOT LIKE ''; SELECT product_id FROM [dbo].[Medicinal_product] LEFT JOIN [dbo].[ATC] ON [dbo].[ATC].[ATC_code] = [dbo].[Medicinal_product].[atc_code] WHERE [dbo].[ATC].[ATC_code] IS NULL EXCEPT SELECT product_id FROM #atc_divided SendMessage SendMessage SendMessage Read data from data sources and write into DB "Country_LV" Read data from data sources and write into DB "ATC" Assess Field "authorisation_procedure" SELECT * from [dbo].[Medicinal_product] WHERE authorisation_procedure IS NULL OR authorisation_procedure NOT LIKE 'Eiropas centralizētā reģistrācijas procedūra' AND authorisation_procedure NOT LIKE 'Nacionālā reģistrācijas procedūra' AND ... AND authorisation_procedure NOT LIKE 'Decentralizētā reģistrācijas procedūra' Assess Field "summary_of_product_ characteristics" SELECT * from [dbo].[Medicinal_product] WHERE where summary_of_product_characteristics IS NULL OR summary_of_product_characteristics NOT LIKE 'https://www.zva.gov.lv/zalu-registrs/attachments/ pdf.php?id=%'+'&src=description' SendMessage SendMessage OK OK OK NO NO NO OK OK NO OK NO NO NO OK OK NO NO OK
  • 16. Publisher Dataset Context issues/ context total Empty/ Total Multiple notation/ Total Clean/ Total Centre for Disease Prevention and Control Incidence of 2nd type diabetes in Latvia - 0/6 0/6 (0) 6/6 Ministry of Welfare Distribution of persons receiving tech aid by AT 2/2 (100%) 3/7 (43%) 0/7 (0) 2/7 Number of social service providers 2/2 (100%) 22/27 (82%) 10/27 (37%) 4/27 Persons with disabilities by the severity of the disability and AT 2/2 (100%) 0/23 (0) 0/23 (0) 20/23 Number of children with disabilities by AT 2/2 (100%) 0/10 (0) 0/10 (0) 8/10 State labour inspectorate Accidents at work (0-1/1) (0-100%) 1/10 (10%) 0/10 (0) 8/10 Occupational diseases confirmed 4/5 (80%) 2/11 (18%) 1/11 (0.09%) 9/11 National Blood Donor Centre Statistics National Blood Donor Center Statistics - 0/4 (0) 0/4 (0) 4/4 State Agency of medicines Register of licensed pharmaceutical companies 1/2 (50%) 17/38 (45%) 0/38 (0) 19/38 Medicines consumption statistics 3/3 (100%) 5/8 (63%) 2/8 (25%) 0/8 Medicinal Product Register of Latvia 4/9 (44%) 21/41 (51%) 1/41 (2%) 14/41 Food and veterinary service Food supplements register 2/2 (100%) 30/35 (86%) 4/35 (11%) 5/35 Dietary foodstuffs register 2/2 (100%) 19/22 (87%) 4/22 (18%) 3/22 APPROBATION. RESULTS
  • 17. DATA QUALITY ANALYSIS OF OPEN HEALTH(CARE) DATA: CONTEXTUAL ISSUES  Only 1 data set out of 12 (8.3%) didn’t had any data quality issues (“Accidents at work”), however, some manipulations were needed in order to achieve this result.  In total 25 out of 35 parameters (71.4%) had at least few data quality issues. Data set “Accidents at work” Value: «88.3332-03» «88.3332-03» Data set «Work codes» Value I: “8332” AND value II: “03” Value I: “8332” AND Value II: “03” = Example II: 4 data sets published by the Ministry of Welfare:  [ATTU code] and [City, county] parameters are supposed to store the code of the administrative territory and city that must correspond to the secondary data object “Classification of Administrative Territories and Territorial Units”; ✘ 3 values are invalid – aren’t available in the secondary data set: “Total”, “Abroad” and “Address isn’t specified”.  Possibly, the data publisher is aware of this, as the appropriate values make sense; BUT!!! !!! This data quality problem can be easily unnoticed and can lead to inaccurate data analysis results.
  • 18.  Example I: “Number of social service providers” data set: 3 parameters: [Service with accommodation] and [Service without accommodation] and [Service with and without accommodation]; BUT!!! For 95 records this assumption is not in force.  Example II: “Number of children with disabilities by administrative territory” data set: For 121 records this assumption is not in force. At least two possible explanations: 1) there are data quality problems; 2) these field aren’t interconnected, and the sum of values of the first two parameters not necessarily should be equal with the value of the 3rd parameter. From the users’ viewpoint: [Service with and without accommodation] = [Service with accommodation] + [Service without accommodation] DATA QUALITY ANALYSIS OF OPEN HEALTH(CARE) DATA: CONTEXTUAL ISSUES Another problem for 4 out of 15 data sets (26.7%) - different number of interrelated values that may appear in different ways: (a) values in different languages, (b) ID number and name, (c) name and supplementary data such as type, country, phone number of representatives. which of these options??? Dataset Context issues/ context total Incidence of 2nd type diabetes in Latvia 0/0 Distribution of persons receiving tech aid by AT 2/2 (100%) Number of social service providers 2/2 (100%) Persons with disabilities by the severity of the disability … 2/2 (100%) Number of children with disabilities by AT 2/2 (100%) Accidents at work (0-1/1) (0-100%) Occupational diseases confirmed 4/5 (80%) National Blood Donor Center Statistics 0/0 Register of licensed pharmaceutical companies 1/2 (50%) Medicines consumption statistics 3/3 (100%) Medicinal Product Register of Latvia 4/9 (44%) Food supplements register 2/2 (100%) Dietary foodstuffs register 2/2 (100%) Veterinary medicinal product register 1/3 (33%) [1# group] = [18-29 years 1# group] + [30-44 years 1# group] + … + [>=65 years 1# group]; [2# group] = [18-29 years 2# group] + [30-44 years 2# group] + … + [>=65 years 2# group]; [3# group] = [18-29 years 3# group] + [30-44 years 3# group] + … + [>=65 years 3# group] !!! Data publishers must provide a brief explanation of the parameters and how numerical data was gotten
  • 19. DATA QUALITY ANALYSIS OF OPEN HEALTH(CARE) DATA: COMPLETENESS  For 136 out of 167 (81.4%) analysed parameters at least one value was empty.  The number of empty values per parameter varies from 1 to all values of a certain parameter.  The total number of empty values in analysed data sets is 15%.  Problem of empty values appears even for the primary data of the data sets:  Example: “Dietary foodstuffs register”data set: ✘ 4 records don’t have [Name] and [ProducerName].  This issue is almost “traditional” in many sectors and countries.  However, some researches demonstrate a high level of data completeness can be achieved. (Schmidt et al., 2015) (Oliveira, 2016) (Wanner et al., 2018) (Tomic, 2015) (Yi, 2019) (Sigurdardottir, 2012) (Larsen, 2009) Dataset Empty/ Total Incidence of 2nd type diabetes in Latvia 0/6 Distribution of persons receiving tech aid by AT 3/7 (43%) Number of social service providers 22/27 (82%) Persons with disabilities by the severity of the disability … 0/23 (0) Number of children with disabilities by AT 0/10 (0) Accidents at work 1/10 (10%) Occupational diseases confirmed 2/11 (18%) National Blood Donor Center Statistics 0/4 (0) Register of licensed pharmaceutical companies 17/38 (45%) Medicines consumption statistics 5/8 (63%) Medicinal Product Register of Latvia 21/41 (51%) Food supplements register 30/35 (86%) Dietary foodstuffs register 19/22 (87%) Veterinary medicinal product register 16/26 (62%) NOTE: 28 of 136 detected empty values may not be considered as quality issues, however, while there are no any notes from the data publisher regarding their nullability, there is no certainty, that there are no any problems there, as empty values may have different interpretations.
  • 20. DATA QUALITY ANALYSIS OF OPEN HEALTH(CARE) DATA: MULTIPLE NOTATIONS FOR A SINGLE OBJECT  Multiple notations for a single object within a single data set and even a parameter: ✘ in 6 out of 15 data sets (40%) in 22 out of 167 parameters (13.2%). May appear in different ways such as a different name: • This problem is also widely spread for many sectors and even countries. OGD of the UK (Kuk and Davies, 2011). for one country for instance, (a) USA vs. United States vs. United States of America; (b) Northern Ireland vs. Republic of Ireland vs. Ireland; (c) Scotland vs. Scotland UK, etc. different patterns for one value for instance, phone or registration number: with or without (1) code or (2) delimiter; type of delimiter etc. different notations indicating the absence of a value: NULL and ‘0’** Do both NULL and ‘0’ values have the same meaning??? ‘0’ can point out to the value that is equal to zero, while NULL can mean that the value isn’t known. **often called “heterogeneity” for the type of preparation, ingredient or unit size for instance, (a) singular, (b) plural, (c) shortened form, (d) with a spelling mistake, etc. Dataset Multiple notation/ Total Incidence of 2nd type diabetes in Latvia 0/6 (0) Distribution of persons receiving tech aid by AT 0/7 (0) Number of social service providers 10/27 (37%) Persons with disabilities by the severity of the disability … 0/23 (0) Number of children with disabilities by AT 0/10 (0) Accidents at work 0/10 (0) Occupational diseases confirmed 1/11 (0.09%) National Blood Donor Center Statistics 0/4 (0) Register of licensed pharmaceutical companies 0/38 (0) Medicines consumption statistics 2/8 (25%) Medicinal Product Register of Latvia 1/41 (2%) Food supplements register 4/35 (11%) Dietary foodstuffs register 4/22 (18%) Veterinary medicinal product register 0/26 (0) In 5 out 8 cases it could be solved, involving the mechanisms, controlling the list of permissible values.
  • 21.  Despite the importance of data quality, the quality of open data is not always one of the main areas of analysis and evaluation of open data.  Open health(care) data have a high number of different data quality problem, however, data publishers (who provides data used in their IS), probably, don’t even aware of them. The most frequently occurred are: ✘ contextual data quality issues; ✘ empty values even for primary data; ✘ multiple denotation for the same object within one data object and even a parameter; ✘ issues on interrelated parameters. RESULTS I
  • 22.  Such an analysis and use of a data object-driven approach to data quality evaluation can be applied not only to open health(care) data but also to other structured and semi-structured data - this solution is effective in many domains.  The advantages of the used approach:  it can be applied to “third-party” data sets without any information on how data were accrued and processed – it is an external mechanism with a higher level of abstraction,  it can be used even by users without IT and DQ knowledge.  The use of open data brings significant benefits data providers as because of multiple number of possible use-cases, data users address various challenges that can rarely be solved by data providers alone. This can improve data quality not only at the national level, but also at the international level. RESULTS II
  • 23. THANK YOU! For more information, see ResearchGate See also anastasijanikiforova.com For questions or any other queries, contact me via email - Anastasija.Nikiforova@lu.lv Article: Nikiforova, A. (2019). Analysis of open health data quality using data object-driven approach to data quality evaluation: insights from a Latvian context. In IADIS International Conference e-Health (pp. 119-126).