This document discusses database virtualization and instant cloning technologies. It begins by outlining the challenges businesses face with growing databases and increasing demands for copies from developers, reporting teams, etc. It then covers three main parts:
1) Cloning technologies including physical cloning, thin provision cloning using file system snapshots, and database virtualization.
2) How these technologies can accelerate businesses by enabling faster development, testing, recovery and reporting.
3) Specific use cases like development acceleration through frequent, full clones; branching for rapid QA; recovery and testing capabilities; and enabling fast data refreshes for reporting.
2. Database Cloning Challenge
Business want data now.
Business don’t understand DBAs.
Databases getting bigger & harder to copy.
Developers want more copies.
Reporting wants more copies.
Everyone has storage constraints.
If you can’t satisfy the business
demands your process is broken.
10. The Production
‘Wall’
Classic problem is that queries that
run fast on subsets hit the wall in
production.
Developers are unable to test against
all data
12. Shared access = Poor Productivity
Developers and
tester get frustrated
Databases become old
and unrepresentative
of production.
Requires complex
scheduling and
management
14. Physical Copies
Time consuming
Time to make copies, days to weeks
RMAN backup, archive logs, copy data over, recover
Meetings , days to weeks
Admins: System, Storage ,Database ,Network,
manager coordination
Space consuming
40 devs x 2.5TB production = 100TB
20 report DBs x 40 TB = 800TB
=> bottlenecks
45. Incremental forever change collection
Database
Production
Instance
File system
Changes are collected
automatically forever
Data older than retention
widow freed
75. • Production vs Virtual
– invisible index on Prod
– Creating index on virtual
• Flashback vs Virtual
• Keep tests for compare
2 b) Upgrades, Patches, RAT, A/B
85. Review: Use Cases
1. Development
a) Full, Fresh, Fast , Self Serve
b) Branching
c) Federated
2. Recovery, Testing :
a) Forensics
b) Testing : A/B, upgrade, patch
c) Recovery: logical, physical
3. Reporting
a) Fast refresh
b) Temporal Data
c) Confidence testing
Kyle HaileyWork for a company called DelphixWe write software that enables companies toCopy their databases in 2 minutes with almost no storage overheadWe accomplish that by taking one initial copy and sharing the duplicate blocks Across all the clones
want data now.don’t understand DBAs.Db bigger and harder to copy.Devswant more copies.Reporting wants more copies.Everyone has storage constraints.If you can’t satisfy the business demands your process is broken
What are these technologiesbenefits and drawbacksTechnology is awesomeComing of ageClonedb 3 pres @ OOW SMU OEM 12c DBaaS 12c “clone” pluggable databases”
Databases are largeMoving data around is hard work.Moving them takestime, resources, equipment and experienceI spent ½ my time cloning. How much do you?How many of you copy databases ? How much time does you spend on it?----1) 10 years in support - No backups, hospital companies still don’t back up dev often dev is the new prod2) Full time DBA, half time copying3) OEM wanted me to do full phys cloning4) Was a consultant always wanted database to play on
Prod critical for businessPerformance of prod is top priorityProtect prod from load
2 options to create enough copies
Xxx spends 50% of time copying databases have to subset because not enough storagesubseting process constantly needs fixing modificationWhat happens when developers use subsets -- ****** -----
Stubhub (ebay) estimates that 20% of there production bugs arise from testing onSubsets instead of full database copies.
Wait orebay till next slide
Example at Ebay2 dozen developers have a massive shared copy of productionExample at DB3 development teams agree between themselves who is doing what testing this week as some runs destroy other teams data,If the database is shared it’s hard to get opportunity to refresh and a data get’s old
Only having enough equipment to support 2 or 3 environments causes massive delaysState of Colorado has a 100 projects support 3KLA tencor can only support 2 projects of a dozen
DB had databases which were not refreshed for 6 months+ due to refresh time and size
Slow downs mean bottlenecksThese bottlenecks cause failures in IT projectsI’m into eliminating bottlenecks (whether it is wait events, tuning sql or provisioning copies of dbs)
Development asks for a database it takes days or weeks.
90% of lost developer days at customer was due to waiting for environment builds
Happens both for dev and QA
Tightening constraining resourcesCascading affect on companies.The business doesn’t know or understand this DBA workDBAs are often the hardest resource for IT to justify because they are invisibleDBAs are already being asked to do a tremendous amountDBAs are often on call 24x7DBAs are foundational.
Delays cause failures*http://www.pcworld.com/article/246647/10_biggest_erp_software_failures_of_2011.html
MisguidedattributingRelax the constraints http://martinfowler.com/bliki/NoDBA.html
Fastest query is the query not run
Creating a thin clone on one lun easyBut how do you get it off production filer?How do you bring in new changes from the sourceHow do you purge old changes
Ask a customerHow long does it take to thin clone a database on Netapp?2-5 days!?2-4 hours if DBA, sys admin, storage admin were in the same room
Most implemented thin cloning technology I see is Netapp
http://partners.netapp.com/go/techontap/empower-dba.html?fmt=printCreate Luns, aggr, snapshots, clonesMirroring filesystemsExporting file systemsMounting file systems
Technology has existed 15+ years Why hasn’t there been more adoption ??
Requires expert storage admins specialized equipment scripting2-7 Days or 2-8 hours if everyone togetherCERN recently gave a presentation where they wrote almost 30,000 lines of code13k lines & 15k lines of PHP
Like the internetInternet existed before browserftp, bulliten boards, chat rooms, gopher, telnet etcDidn’t take off until the browserThin cloning didn’t take off until database virtualizaiton
Like the internet
Delphix GUI is what Oracle Enterprise Manager would look like if Apple had designed itI like at DelphixFrustratedSteve and Larry gave aweseom presentationsSteve jobs and Ellison ui combined forces now I have it
Database virtualization is to the data tier whatVMware is to the compute tier. On the compute tier VMware allows the same hardware to be shared by multiple machines. On the data tier virtualization allows the same datafiles to be shared by multiple clones allowing almost instantaneous creation of new copies of databases with almost no disk footprint.
In physical worldIf 3 Copies of a database
Software installs an any x86 hardware uses any storage supports any Oracle OS
Fast = Fresh Full = Quality Many = jet pack on development-
Self Service
Source Control for the database data
Source Control for the database data
Physically independent but logically correlatedCloning multiple source databases at the same time can be a daunting task
One example with our customers is InformaticaWho had a project to integrate 6 databases into one central databaseThe time of the project was estimated at 12 monthsWith much of that coming from trying to orchestratingGetting copies of the 6 databases at the same point in timeLike herding cats
Informatical had a 12 month project to integrate 6 databases.After installing Delphix they did it in 6 months.I delivered this earlyI generated more revenueI freed up money and put it into innovationwon an award with Ventana Research for this project
Production vs VirtualCreating invisible index on ProdCreating index on virtualFlashback vs Virtual
Multiple scripted dumps or RMAN backups are used to move data today. With application awareness, we only request change blocks—dramatically reducing production loads by as much as 80%. We also eliminate the need for DBAs to manage custom scripts, which are expensive to maintain and support over time.
Com - DBA dropped movie titles table, 8 hour to restore a backup PG - operator entered Euro instead of US $ Fid - Oracle bug caused logical corruption on Dataguard, wouldn't start,
HD - developer truncated 232 Million row table, wanted it back
Developer each get a copyFast, fresh, full, frequentSelf serviceQA branch from DevelopmentFederated cloning easyForensicsA/B testingRecovery : Logical and physical Development Provision and RefreshFullFreshFrequent (Many) Source control for code, data control for the database Data version per release version Federated cloning QA fork copies off to QA QA fork copies back to Dev Instant replay – set up and run destructive tests performance A/B Upgrade patching Recovery Backup 50 days in size of 1 copy, continuous data protection (use recent slide ob backup schedules full, incr,inrc,inrc, full) Restore logical recovery on prod logical recovery on Dev Debugging debug on clone instead of prod debug on data at the time of a problem Validate physical integrity (test for physical corruption)
Change mentality from few as possible to as many as accelerates the businessRemember Jinga ?
If every MB was an Inch 300,000 customers 12 copies on average 100 GB avg size PB TB GB 300000*12*100 = 360,000,000 300000*1*.3*100 = 9,000,000 351 PB e p t g 1,191,290,000 feet to moon, 132,000,000 feet around the earthe p t g m k b 15,133,979,520 inches to the moone p t g m k b 351,000,000,00015,133,979,520 inches to the moone p t g m k b 35100000000015133979520 inches to the moon
HD 720TB down to 8TB ( create 19 x 36TB VDBs )
Informatica – finished 2x fasterStubhub - 2 x as many releases a yearKLA-Tencore- 5 x as many projectsQA/QualityStubhub - 20% less bugs in production, found full table scan that would have been missed on subsets
Moral of this storyInstead of dragging behind enormous amounts of infrastructureand bureaucracy required to provide database copiesUses db virteliminates the drag and provides power and acceleration To your companyDefining moment CompetitorsServices
Once Last Thinghttp://www.dadbm.com/wp-content/uploads/2013/01/12c_pluggable_database_vs_separate_database.png
250 pdb x 200 GB = 50 TBEMC sells 1GB$1000Dell sells 32GB $1,000.terabyte of RAM on a Dell costs around $32,000terabyte of RAM on a VMAX 40k costs around $1,000,000.
http://www.emc.com/collateral/emcwsca/master-price-list.pdf These prices obtain on pages 897/898:Storage engine for VMAX 40k with 256 GB RAM is around $393,000Storage engine for VMAX 40k with 48 GB RAM is around $200,000So, the cost of RAM here is 193,000 / 208 = $927 a gigabyte. That seems like a good deal for EMC, as Dell sells 32 GB RAM DIMMs for just over $1,000. So, a terabyte of RAM on a Dell costs around $32,000, and a terabyte of RAM on a VMAX 40k costs around $1,000,000.2) Most DBs have a buffer cache that is less than 0.5% (not 5%, 0.5%) of the datafile size.
reduces storagealleviates DBA of repetitive focus on innovationAccelerates DevelopmentEliminate bottleneck more code faster and of better quality