WSO2's API Vision: Unifying Control, Empowering Developers
Hcd wp-2012-better dataleadstobetteranalytics
1. Better Data Leads to Better Analytics:
Three Ways to Improve Healthcare Data Quality
in an EDW
Written by
Jason B. Buskirk
Chief Operating Officer
Health Care DataWorks
2.
2
Better Data Leads to Better Analytics:
Three Ways to Improve Healthcare Data Quality in an EDW
Too often, organizations embark on Enterprise Data Warehouse (EDW) projects with the notion
that all their data needs will be met once the implementation is complete. It is understandable
why this thinking becomes pervasive throughout the organization. Typically, organizations have
decided to take on such projects after lengthy and time-intensive meetings, presentations
and reviews to bring together the myriad interests of its key stakeholders, followed by the due
diligence necessary to secure the funding and select the technology partner. Expectations begin
to run very high.
While an EDW undoubtedly will empower organizations to do more with their data than ever before
and the investment will pay dividends in terms of the value it brings, an EDW is only as good as the
data that is fed into it. Every organization will encounter data quality issues during or leading up to
EDW implementation, and these issues can negatively affect the timeline of the implementation. If
there are issues with data quality, the organization will find that, when it comes time to extract the
data, it will not be as useful as expected. It is important to discover and address data quality issues
as early as possible. Not doing so becomes expensive, both in terms of the developers’ time and
the lack of trust that could occur within the organization. Think of it this way: If you put bad data in,
you get bad data out, and the sooner you find the bad data, the better off your project will be. This
white paper details three ways to improve data quality in an EDW.
Establish realistic expectations
Improving data quality starts with understanding the data challenges and proactively
communicating and working with stakeholders to address potential pitfalls. Taking these steps
will contribute to a successful, cost-efficient and relatively smooth implementation that can
achieve results at a quicker pace.
It is important that everyone in the organization knows that the EDW will only be as effective
as the data that goes into it. This will help manage expectations and reduce potential frustration.
Everyone wants access to data that is relevant, understandable and, ultimately, results in
actionable knowledge. But the reality is an organization will not know how bad its data is until
it begins the task of profiling the data that is to be extracted.
3.
3
Know the causes of data issues
Virtually all the data issues encountered with a data warehouse implementation are not
technological in nature – they are operational. These operational causes of data issues generally
fall into two broad categories:
• Data collection requirements. Organizations have multiple systems capturing and
storing their electronic medical records, financial records and human resource information.
But these systems tend to operate in silos. This often contributes to issues around when
and if data is collected. Some systems may require data elements to be populated, while
others may not make them mandatory for data capture. This leads to sparse data sets that
could have limited usefulness in the future.
• Lack of standardization. Because myriad systems are in use and individual departments
can track data in different ways, problems with standardization often arise and take many
forms. For example, two units within a health system track the same information – patient
gender. In one system, the information is input and categorized as “male” or “female.” In
the other system, gender is input as a “1” or “2.” Even though these issues can and
should be fixed during the extract process, the time needed to identify these issues and
decide how the data should be stored in the data warehouse is something the organization
needs to take into consideration when planning the data warehouse project.
Improve data quality
By taking the following steps before the implementation process begins, organizations can cleanse
and improve the quality of the data, positioning the organization for a successful enterprise data
warehouse project.
• Establish a governance body or data quality group to create consistent standards.
Most organizations do not have this in place prior to an EDW implementation. The body
or group should be comprised of stakeholders who know which data is being collected,
how it is being categorized, how and where it is stored, and all the other details critical
to establishing an organization-wide standard. The goal should be to identify “bad” and
non-standardized data. Doing this sooner rather than later can ensure the most
cost-efficient implementation.
4.
4
Organizations have
two options: They
can build their own
data model or buy
one. Health Care
DataWorks, for
instance, offers a
mature data model
that is proven over
many years of
effective use.
• Identify subject matter experts to play an ongoing role in the implementation
process. These should be individuals who understand the data and know how it can
be used. Make them part of the implementation team. They are valuable resources in
that they not only know the data, but also understand how existing operational systems
work. By including them on the team, you will identify data quality issues earlier in your
implementation. Their involvement will also help provide built-in credibility when it
comes time to go live.
It’s also important to remember that these subject matter expert resources be freed up
from a time commitment standpoint to devote the required attention to the implementation
process. It is an in-kind investment that is worthwhile because of the positive outcome
that will result.
• Standardize your data model up front. Having a
data model up front will not only accelerate the data
warehouse’s implementation timeline, it also will assist
the organization with the data issues mentioned earlier
by connecting multiple and disparate source systems.
Remember, data elements will be captured
inconsistently by different operational systems. When
the data model is populated, it will have a place to store
each data element regardless of the source system.
Data quality rules can be implemented to populate the
data based on data availability in each source system.
In the example of gender mentioned earlier, the same
data elements may be stored using different data values.
Possessing and populating a robust data model will
force an organization to standardize these data
elements and serve as a blueprint for how these data
elements should be handled. In this example, the data
model will have a conformed dimension to standardize
gender values.
5.
5
Organizations have two options for obtaining a data model: They can build their own data
model or buy one. Health Care DataWorks, for instance, offers a mature data model that
is proven over many years of effective use. Regardless of how the organization proceeds,
the data model needs to be in place up front in order for an organization to be ready for
the data quality issues that it should expect.
Conclusion
Organizations can expect data quality challenges when undertaking an EDW implementation. But
when they understand the potential pitfalls, remain committed to improving the quality of data, and
involve their internal experts and users in the process, they will be well on the way to adding value
to the entire organization in the most cost-effective and timely manner.
6.
6
About the Author
Jason Buskirk is responsible for managing the day-to-day operations of Health Care DataWorks
(HCD) and leading product strategies for the company's pre-built analytics applications. He is one
of the company's founders.
Prior to HCD, Buskirk worked for Deloitte Consulting, where he implemented analytic applications
built using Oracle's Business Intelligence Enterprise Edition. Buskirk also served as Manager of
the Information Warehouse and Research Information Systems at the Wexner Medical Center at
The Ohio State University.
Buskirk holds a bachelor's degree in computer information systems from DeVry University.
About Health Care DataWorks
Health Care DataWorks, Inc., a leading provider of business intelligence solutions, empowers
healthcare organizations to improve their quality of care and reduce costs. Through its pioneering
KnowledgeEdge™ product suite, including its enterprise data model, analytic dashboards,
applications, and reports, Health Care DataWorks delivers an Enterprise Data Warehouse
necessary for hospitals and health systems to effectively and efficiently gain deeper insights
into their operations. For more information, visit www.hcdataworks.com.