The document discusses how a data lake approach can help financial institutions address regulatory challenges more effectively than traditional ETL approaches. A data lake allows raw data to be ingested rapidly and indexed as needed for analysis, reducing preparation time. It also enables unified queries across all data sources and quick fusion of multiple sources. This significantly reduces operational complexity and costs while improving security, flexibility, and the ability to address evolving requirements. The data lake approach is well-suited for challenges involving streaming analytics, point-to-point data marts, or data-heavy ETL requirements. Booz Allen has successfully implemented this approach for government clients to prototype solutions around critical applications.
Data Lake-based Approaches to Regulatory-Driven Technology Challenges
1. Data Lake-based Approaches to Regulatory-
Driven Technology Challenges
How a Data Lake Approach Improves Accuracy and Cost
Effectiveness in the Extract, Transform, and Load Process
for Business and Regulatory Purposes
2. The concept of big data offers financial institutions an opportunity to build capabilities that both reduce costs
and produce better insight. In the area of regulatory compliance, the work required to prepare the organization
typically involves modifications to systems, process, and data to allow Collection, Alignment, Aggregation, and
Analysis (CA3) to occur. For example, new rules, such as Dodd Frank, over-the-counter (OTC) collateral, and
risk management requirements, rely on the same legal entity and customer data infrastructure that need to
be upgraded for Anti Money Laundering/Bank Secrecy Act, Sanctions, and Foreign Account Tax Compliance Act
(FATCA). Linking the data while limiting the modifications to the systems that underpin both the business and
compliance requirements improves performance for customer-facing platforms and regulatory compliance
systems alike.
The potential is real, but the volume, variety, and velocity of the data is growing so fast that it is outpacing the
ability of current tools to take full advantage of it. Much of the problem lies in the need to extensively prepare
the data before it can be analyzed. In parallel, the technologies and techniques underpinning Big Data have
matured to the point where they can address the challenge. While early uses focused on deriving insights from
very large pools of unstructured data, recent deployments have harnessed multiple tools, including advanced data
management, pattern recognition, and adaptive analytics, to address large-scale, high-accuracy, low-latency CA3 of
diverse, dispersed data.
Applying Robust Financial Intelligence and
Analytics to Stay Ahead
3. The Extract, Transform, and Load Challenge
For the past 30 years, traditional approaches to sharing and transferring data have all involved some type of
Extract, Transform, and Load (ETL) capability that extracts information from one format (database, silo, file, etc.)
and transforms it into another data format. The process then loads the data into the target system for use in a
set of predetermined analyses. While these approaches to handling data have served some organizations well in
the past, they have some notable drawbacks, which become more significant as the volume, variety, and velocity
of the data expands.
First and foremost, the process is resource intensive and requires
investments in high-cost tools to access the data. For example, each
time a new regulation is issued that calls for a new type of analytically
derived report, banks must initiate a dedicated IT project, often focused
on solving the data ingest issue. This portfolio of projects results in a
very large number of data warehouses, each with their own ETL process.
To use the diverse data warehouses calls for the creation of customized
Point-to-Point (PtP) solutions. These PtP solutions can certainly meet
the short-term goal, but often fail to scale up to meet longer-term
organizational goals. As banks move into the era of big data, this PtP
approach becomes overly complex and difficult to manage.
The Data Lake-based Approach
In stark contrast to the challenges presented by a point-to-point ETL approach, Booz Allen Hamilton, a leading
strategy and technology consulting firm, has found that a data lake-based approach to CA3 requirements is
scalable, extensible, and improves the range and sophistication of analyses that can be supported while providing
higher levels of data control and security.
A data lake-based approach takes advantage of the most recent developments in large-scale distributed
computing hardware/software to create an innovative way to ingest, index, and analyze massive amounts of
data in batch and real time that can scale to exabytes—without compromising integrity, cost-effectiveness,
or performance. The Data Lake Approach embeds business rules, often the result of policy and procedure
documentation for regulatory compliance, in the cell level data, allowing alignment, aggregation, and analysis to
occur rapidly and with far less upfront work by IT departments. With the data lake, an organization’s repository of
information—including structured and unstructured data—is consolidated in a single, large “table.” Every inquiry
can use the entire body of information stored in the data lake—and it is all available at the same time.
This approach, also referred to as “schema on read,” has five core features that can help banks address
increasingly demanding, constantly evolving regulatory requirements. In a data lake-based approach:
1. ETL is not done en-masse prior to the analysis. Data is ingested
rapidly in “raw” form, and the indices and relationships to support
the analysis are derived, enriched, and overlaid as needed—or
even executed at the time of the analysis, reducing the time to
operationalize data.
2. Unified queries can be created quickly to allow access across all
information sources, reducing the time and complexity involved in
creating and federating queries across multiple databases.
3. Multiple data sources can be more quickly fused to enable a very
high degree of data agility to compose new reports that meet
emerging requirements (e.g., new regulations).
4. Operations and management (O&M) complexity is significantly
reduced, with a corresponding drop in O&M costs, while creating
the basis for improved security and data management posture.
ETL
Transactions
FEDERATED QUERY
ETL
Transactions
ETL
Transactions
Tailored
Reporting
Tools
Transactions Transactions Transactions
Lightweight
Security
Tagging
Runtime
Creation of
Views
Figure 2. Advanced Data Lake-based Approaches
Figure 1. Traditional Point-to-Point Solution
4. 5. The low-cost, streamlined ingestion process can be performed in near real-time, making the Data Lake
Approach a viable alternative for some requirements that would typically be addressed by implementing
Straight Through Processing platforms—at far less cost and disruption to the revenue-generating operations
of the bank.
Putting the Data Lake to Work
With the Data Lake Approach, it now becomes practical—in terms of time, cost, and analytic ability—to turn big
data into a powerful tool to deal with escalating regulatory challenges while meeting business demands. We can
now ask more far-reaching and complex questions, and find the often hidden patterns and relationships that can
lead to game-changing knowledge and insight. The Data Lake Concept is particularly well suited for challenges
that have one or more of the following characteristics:
1. Streaming analytics are performed on large-scale data sets
2. PtP data mart solutions are involved
3. The ETL requirement is data, not process heavy
While applying a big data approach to financial regulatory requirements may be innovative, it would not
experimental—Booz Allen has created data lake-based systems for more than a dozen government clients. Each
time we addressed a new class of problem, (e.g., Homeland Security, Defense) we used a prototype approach to
build/test/tailor the Data Lake Approach. We are prepared to work with your leadership team in a similar manner
to introduce this capability.
To launch a prototype project, we work with clients to:
• Identify a small set of business and regulatory critical applications as the basis for the prototype—basically,
a subset of projects in process that can be executed quickly to yield results
• Set up design requirements for information reporting requirements for internal/external users
• Mirror a set of real-world scenarios to create an analytics platform (i.e., a data lake) that we will use to
demonstrate the schema on read process against the critical applications identified above
• Develop a results summary on multiple levels (speed, cost, accuracy) and test the data for internal validity
and defensibility
Booz Allen knows that a clean-sheet approach is not feasible; any viable solution approach must be able to deal
with a diverse base of legacy systems and select from the existing portfolio of regulatory IT project requirements.
While such conditions can be challenging, by creating an isolated, parallel analytics platform, we are be able to
work with live data with no risk to the bank’s production systems.
5. “With the Data Lake Approach, it now becomes practical—in terms of time, cost, and
analytic ability—to turn big data into a powerful tool to deal with escalating regulatory
challenges while meeting business demands.
”
6. www.boozallen.com
About Booz Allen
Booz Allen Hamilton has been at the forefront of strategy and technology consulting for nearly a century.
Today, Booz Allen is a leading provider of management and technology consulting services to the US government
in defense, intelligence, and civil markets, and to major corporations, institutions, and not-for-profit organizations.
In the commercial sector, the firm focuses on leveraging its existing expertise for clients in the financial services,
healthcare, and energy markets, and to international clients in the Middle East. Booz Allen offers clients
deep functional knowledge spanning strategy and organization, engineering and operations, technology, and
analytics—which it combines with specialized expertise in clients’ mission and domain areas to help solve
their toughest problems.
The firm’s management consulting heritage is the basis for its unique collaborative culture and operating model,
enabling Booz Allen to anticipate needs and opportunities, rapidly deploy talent and resources, and deliver
enduring results. By combining a consultant’s problem-solving orientation with deep technical knowledge and
strong execution, Booz Allen helps clients achieve success in their most critical missions—as evidenced by
the firm’s many client relationships that span decades. Booz Allen helps shape thinking and prepare for future
developments in areas of national importance, including cybersecurity, homeland security, healthcare, and
information technology.
Booz Allen is headquartered in McLean, Virginia, employs approximately 25,000 people, and had revenue of
$5.86 billion for the 12 months ended March 31, 2012. For over a decade, Booz Allen’s high standing as a
business and an employer has been recognized by dozens of organizations and publications, including Fortune,
Working Mother, G.I. Jobs, and DiversityInc. More information is available at www.boozallen.com. (NYSE: BAH)
For more information, contact
Thomas Sanzone
Senior Vice President
sanzone_thomas@bah.com
917-305-8003
James Newfrock
Vice President
newfrock_jim@bah.com
917-305-8037
Joshua Sullivan
Vice President
sullivan_joshua@bah.com
301-543-4611
Albert Belman
Principal
belman_albert@bah.com
917-305-8002
Michael Delurey
Principal
delurey_mike@bah.com
703-902-6858
03.078.13