This document discusses the challenges of legacy data systems and provides recommendations for addressing them. It notes that customer expectations around data access and use are increasing due to digital transformation trends. Legacy data systems often have fragmented, disconnected data stored in multiple formats with limited documentation. The document recommends three basic requirements to unlock the value of legacy data: understand the data by finding and documenting it, manage the data by defining governance and access rules, and exploit the data by building new capabilities. It emphasizes that data needs clear ownership, documentation, understanding of usage, and well-defined architecture. The best approach is often to co-exist with legacy systems by implementing a common access layer instead of a risky, costly migration.
4. Introductions
4
David Jones
VP of Product Marketing
Nuxeo
@InstinctiveDave
2
Norman Wren
Former Technical and Operations Director
Santander
Digital Transformation
A Reality Check
5. Digital Transformation
in Financial
Services
Massive Spend
Average $42M in 2018
Rising to $45M in 2019
Purpose
66% Customer facing
innovations
Success?
88% - project delayed, reduced
scope, or cancelled
26% - Digital Transformation =
Insurmountable Task
Statistics courtesy of Couchbase
6. “Everyone who hears these words of mine, and doesn't do them
will be like a foolish man, who built his house on the sand. The rain
came down, the floods came, and the winds blew, and beat on that
house; and it fell—and great was its fall.”
— Matthew 7:24–27
9. Why is this
important?
Customer Expectation:
Customer in control
Unlimited data access
Always on 24 x 7
Real time
Added value services
Regulation :
Customer rights
Data portability
Open access to third
parties
Remediation of historic
practices
Internal Driver:
Exploit data assets
10. Legacy
Data
Challenges
• Fragmented data ; not real time; internal view;
unstructured data; access limited
• Multiple formats; limited documentation; Knowledge
gap
• Integrity within applications - not across
• Degradation over time
• Security and data leakage
• Obsolescence
• Compliance with Regulation
• Cost of Change; Time to market
• Consolidation of data stores.
• Architecture and technology compatibility
• BAU Running costs
• No scalability
• Access limitations
• Poor schema design
Access
Data Quality
and
Integrity
Risk
Cost
Performance
11. Value:
3 Basic
Requirements Find Data
Document attributes and meaning
Understand usage and context
Define Architecture
Set usage, access and security rules
Build governance and ownership
Organise around common Business
Purposes
Make accessible through common access
layer
Use Meta data to organise and add value
Build new Capabilities – Data Driven
Understand the
Data
Manage the
Data
Exploit the Data
3
1
2
12. Considerations Archaeology:
Find and document data and how used
Architecture:
Define data architecture and principles
Data ECO system:
Distributed data
Define data usage
Update; query; analytics
Common Layer:
To bridge technologies
Scaleability
Cloud processing; distributed data
Ownership
Governance and accountability
13. Key Points
Data as an Asset needs:
• Clear Ownership and Accountability
• Knowledge and documentation Meta data
• Clear Understanding of usage and value
• Data architects and engineers
• Architectural readiness
• Capability
• Data Strategy
14. Stick or
Twist? Obsolescence
Security
Maintenance
Cost
Risk
Cost
Risk
Data Quality
Integrity
Business case
Less Risk
Unlock assets
Bridge old and new
Create data eco system for the future
Do Nothing
Big Bang
Migration
Co-existence/
Common Layer
15. Summary Data knowledge and documentation
fundamental
• Hard work and time consuming
Clear target architecture addressing data
• technology stack and data usage
Common layer to separate out data from
business processing
Avoid migration if possible
• Avoid pitfall of access in situ
Define common Business purposes
Build road map
• Balance value, cost and risk
Invest in capability
Intro: Is this a problem or an opportunity. Want to show that management of data needs to be a priority and do nothing is not an option.
Data in itself is not adding value but combined with technology and customer and business insight it is game changer.
Data is the only constant
Data historically poor relation but world has changed:
The way data is used , managed and controlled has changed fundamentally in the digital age:
This has significant consequences for traditional businesses. There are three critical events which have changed the way we understand and use data:
The first is customer experience and expectation:
Financial systems traditionally have been developed for the benefit of the institution not customer: the institution controlled access eg branch hours; presented data to fit internal processes batch oriented today customers control access eg always on always available, real time performance and updates, want open access to data and to get added value from data.
Secondly Regulators have changed the game: Customers hve strong rights over their data. PSD2 champions data portability and opens access to third parties. The impact of this is significant and compliance costly. GDPR highlighted the difficulty in being able to find data and obscure it. PPI also demonstrated the cost and difficulty in remediating customers as data is often fragmented and difficult to use
Thirdly utilising new technologies such as Big data, analytics, AI, machine learning they can all bring significant advantage but all require access to massive amounts of data and significant computing power . They also need new ways to access and process data which highlight shortfalls in legacy architectures and technologies
So to make sense of legacy data and to be able to exploit the data for commercial advantage, meet customer expectations and be regulatory compliant there are some significant challenges to be overcome.
The first is around access to data: many systems limit access as they are reliant on batch update, the data is designed for internal use, it is fragmented and much is not accessible including vast amounts of images and pdf information
Data quality and data integrity issues become much more visible and have an instant impact eg customers report issues in social media and the reputation damage is instant. Presentation of data is hampered by multiple systems and formats, different rules and definitions, limited knowledge and documentation within the organisation and third party suppliers. Data integrity is generally within a system not across applications and finally data standards have changed and some data degrades in quality over time.
With the opening up of data and systems data related risks are becoming more prominent and likely. legacy systems and obsolescence contribute to data access breaches and data leakage with severe consequence both financial and reputationally. As technology cycles increase applications and infrastructure become obsolete quicker and the cost of maintaining, updating and upgrading become more costly. This combined with regulatory agenda of compliance, data security and protection risks and the question of obsolescence.
The costs of IT are rising; BAU running costs soar because access to leagcy data is inefficient and MIPS are costly. Re building architecture to be scaleable real time and always on requires significant investment. Exploiting the data requires further investment
Accessing legacy data in situ is inefficient and expensive. Performance suffers due to lack of scaleability, batch and real time issues, ensuring integrity and security in the new world place heavy overheads in a traditional environment
So big question- How to avoid being caught in a costly problem!.
Unfortunately no easy answer, except doing nothing is not an option.
Three basic requirements
1 Building detailed understanding of the data, where it is, what it means and how it is used. This is like archaeology: time consuming, hard work and if you are not careful you may not find anything of value
2 Be focused on managing the data. Have a clear enterprise wide architecture for data. Understand how it will be used, who can access it and how it will be kept secure and up to date
Implement clear ownership and governance around data. Treat it as an asset.
3 Be clear on the business purpose around exploitation and usage. Not all data is worth keeping or being available for everyone to access
Define the data eco system, and how to access it. make it scaleable, performant and up to date: Add value ie it is not enough to give access need to organise and add value – build meta data
Build skills and capability in the organisation. Be more data driven eg data architects, data engineers, data administrators. Build new skills and approaches
To get to the third level is challenging: trying to migrate all legacy data to new technology, architecture ,infrastructure, Or dump all the data into a massive data lake is cheap, easy, cheap or without risk. Migrating data, extracting transforming and loading into new environments and /or maintaining co-existence all require major programs and carry high risk as recent high profile examples have shown.
So if we assume doing nothing is not an option and we wish to either migrate or use co-existence to buy time and reduce risk as well as allowing islands of success there are some considerations that apply regardless
The first is that understanding what your data is, where it goes and how it is used is like archaeology as frequently the knowledge is not available and documentation is either limited, missing or out of data. No matter what route you go down you need to have this information. There is some good news in that there are an increasing range of tools to assist discovery, definition and creation of n
Meta data – but this is not to be underestimated.
Traditionally architectures have focussed on application and infrastructure but today it is critical that there is a defined data architecture and set of data principles around, definition, storage and usage
Big data was all the rage but I think increasingly as the scale of the data and the complexity of management and different usage profiles there is a need to define an Eco systemnwhich allows ability to utilise new cloud technologies and make co-existence simpler. Distributed data, cloud native processing and data mangement tools are critical
Rules need to be defined around how data is used and therefore how it is stored and utilised as well as how integrity an qaulity are maintained in this environment eg transaction updates are not the highest volume of demand, customer inquiries are this needs to be thought about in the design. Similarly to process massive amounts of data independent of the application use cannot impact on the standard customer or business processes.
In looking at the future consideration should be given to create a common layer which allows separation of data from process and makes use of meta data. This is critical to be able to a) access legacy data B) reduce the risk from migration and rapidly changing technology cycles and allowing a a staged migration approach
With the explosion of demand both internally, consumer and regulator combined by the visibility of any failure any architecture has to be scaleable and on demand as well as a need to move data to lower unit cost storage.
Finally until recently the business ownership has not been well defined and as an asset data needs to be managed and so needs clear ownership and accountability as well as sound governance
Developing on these points there are some fundamentals that need to be in place:
Business ownership for the data needs to be defined and accountability clear for the maintenance, integrity and accuracy of the data with appropriate governance needs to be in place
Investment needs to be in place to ensure that the data is understood which requires documentation being clear on definition and having ability to maintain the meta data in a cost effective manner.
Not all data needs the same attention and therefore being clear what needs to be kept and is accessible and clear use over usage are important
There is a need to upgrade skills around data: investment in Data architects and data engineers is a requirement
Architecture definition and rules need to be clear and finally a clear road map needs to be developed to set out how the data will be used, whether it will be left, co-exist or migrate and how and when this will happen
So big question _ does one stick or twist.
Unfortunately no easy answer except doing nothing is not an option.
Most organisations look at two options:
First option to migrate all data to new technology, infrastructure and applications. There was a desire to put everything into a single data lake but I think most organisations have found that to be too difficult. The challenges are: cost of managing a migration are significant because of the complexity of data compatibility, knowledge and quality as well as a desire to use data differently
Migration is a major program in its own right and carries high risk as recent experience of TSB has shown. Post migration their can be issues because of missing data and changes of rules that may not be discovered until post migration. These migrations mean that other changes cannot occur and as a result they demand significant management attention and time which in itself carries a cost
Most organisations are looking at a form of co-existence as a stepping stone to minimise the risk of a big bang change. But this means that organisations are still faced with the risks of obsolescence, security, access and bau running costs
So to summarise:
If you are to survive, meet consumer expectations, be compliant and get value from your data you will need to
Understand and document your data and make it available in simple understable way. Build nechanisms to maintain the data definitions and usage
Be clear what the architecture looks like for data, infrastructure and applications and how the new world will interface with the old
Have a road map to eat the elephant in manageable pieces
Build a common layer as a critical step on the route map.
Thank you