1) The document discusses the challenges of automating Hadoop cluster deployment in a large banking ecosystem. It describes the journey of adopting Hadoop across ING from initial exploratory uses to standardizing deployment.
2) Early efforts involved manual installation using Ansible but lacked production capabilities. Subsequently, a file storage and analytics Hadoop pattern was implemented using the Nolio deployment tool but it had limitations.
3) Learning from private cloud experiences, the bank is evaluating Ansible for standardized application deployment and propagating changes across environments. The conclusion advocates for fully automating deployment with scriptable tools and owning the software stack.
3. Automate Hadoop Cluster Deployment in a Banking Ecosystem
3
The Goal
Prelude: Hadoop Patterns in ING
Chapter 1: First Steps
Chapter 2: Standardizing
Chapter 3: The Cloud
Conclusion
Questions
4. The Goal
IN WHICH we look at the challenges that a bank has to face in the 21st
century, and how this translates into decisions made in the IT landscape.
5. Market leaders Benelux
Growth markets
Commercial Banking
Challengers
The world of ING – Data Driven Since 1881
5
Customers
33 Million
Private, Corporate and
Institutional Customers
Countries
41
In Europe, Asia,
Australia, North and
South America
Employees
52,000
7. We accelerate through the Concept of One
7
Provide standardized and easy to use
global capabilities and services
Accelerate strategy and
concentrate on business value
Concept of One
8. Prelude: Hadoop Patterns in ING
8
IN WHICH we describe the journey of some interesting characters that set out to get Hadoop adopted
within a large, venerable institution, and across the world.
9. Data Lake and Advanced Analytics within ING
9
External and internal reporting for
own or regulatory purposes
Integrate all data sources within the
bank into one processing platform
• Batch data streams
• Live transactions
• Model building for customer
interaction
Better understand customer
needs in an increasingly digital world
Data can help us offering
tailored products and services
Empower data scientists and analysts
to get the best results with advanced
analytics tools and predictive models
Open source software where possible
– Hadoop as a core component
10. 1. File Storage
2. Deep Data
3. Analytical
Hadoop
4. Real Time
Hadoop Usage Patterns
10
11. Analytical Hadoop
• Our first use case
• Development and Production environments
• P environment has Production level security but Test level SLA
FileStore and Deep Data
• Completely automated
• Full DTAP street (Development, Test, Acceptance, Production)
Patterns and their maintenance
11
12. • Vendors give us tools to do a GUI based install
• Maintain several clusters in parallel, DTAP!
• Auditability!
• Not for us, we need to do automated installs
• APIs and scripting facilities do exist, but are often poorly tested and documented
Standard installation doesn’t cut it
13. • Layers – IaaS, PaaS, Application (we want IaaS not PaaS)
• Organizational divide: Platform team vs. Infra team
• Different privileges
• Different tool choices
• Trust and collaboration need to be actively built
• Convince security audit teams!
Organizational challenge
14. Chapter 1: First Steps
IN WHICH a first expedition ventures into uncharted territory,
encounters strange monsters and reconsiders their equipment.
15. • First take by Exploration teams (Analytical Pattern)
• Unusual Ops mode: No Production system (although we use production data)
• Install everything with Ansible
• YAML based, ssh based access
• All text files. Easy to put in git and to document
• The Power of Root
• Great power and flexibility
• Risk people and GUI users do not like it
• You are on your own
• We tried to learn from this!
Tooling part 1
16. Chapter 2: Standardizing
16
IN WHICH a larger party sets out with better equipment, reaches the
shores of a new world but finds that still, much is to be improved.
17. • Now we needed a Datalake integrated solution with full support
• Also need a full DTAP street
Infra team has legacy tooling (proprietary tools) but limited flexibility.
• Basically, we roll our specific configuration into homemade rpm packages.
Tool choice for application deployment: CA Lisa aka Nolio
• GUI based
• No version control (tagging added as an afterthought)
• Slow and awkward to use
• Dumbed down by organizational restrictions
Conclusion: Don’t go there!
Implementing the FileStore and DeepData patterns
18. • By then, we had a lot of structure to help us
• Standardized build server with GitBlit, Artifactory, Jenkings
• Agile Way of Working
• Now deployment is a split approach
• Infra parts use TEM (and Ambari blueprint) to deploy full Hadoop stack
• On top of the stack we deploy our own applications with Nolio
• Handovers CIO-Infra still hurt us
• We do have: Deployment on a given system at the press of a button
• We do have: Automatic propagation of Git changes into Artifactory via Jenkins
• We do not (yet) have: Automated propagation D->T->A->P via Jenkins
Implementing the FileStore and DeepData patterns
19. Chapter 3: The Cloud
IN WHICH our heroes learn from the cloud experience and from explorers around the world,
and make deployment a safe experience for everyone.
Chapter 3: The Cloud
20. • ING Private Cloud: is essentially Datacenter v2.0
• However, we get the chance to rethink our tooling
• Puppet integrates nicely with RH Satellite and is used to provision PaaS solutions
• Ansible is gaining ground in the internal discussion
• External Ansible community: Meetup grown a lot over the last year. Now more than
Puppet and Chef combined
• ING has an initiative to come up with a standardized way to deploy packaged software,
based on Ansible
The Cloud
22. • Be aware: Deployment of mostly prepackaged software is different from developing your
own software
• Full automation might not be needed because we do not change as quickly as e.g.
mobile app
• Use tools that are scriptable
• GUIs suck
• Own your stack
Conclusion
22
24. • Crane Gears by Kevin Utting is licensed under CC BY 2.0
• Hellmar in Nîmes / With Python in Mindanao, by the author
• Domtoren in het oranje licht by helena_is_here is licensed under CC BY 2.0
Attributions
24