2. Kellyn Pot’Vin-Gorman
Technical Intelligence Manager for the Office of CTO,
Delphix
• Multi-platform DBA, (Oracle, MSSQL, MySQL, Sybase,
PostgreSQL…..)
• Oracle ACE Director, (Alumni), Oak Table
• APEX Women in Technology Award, CTA
• STEM education with Raspberry Pi and Python
• Liaison for Denver SQL Server User Group
• President, Rocky Mountain Oracle User Group
• Author, blogger, (http://dbakevlar.com)
3. Management of Non-production Environments:
Virtualized sanity for the DBA Realist.
Provisioning: Patching, refreshing and if you
ask me one more time!
Cloud: Cloudy with a chance of failures.
Security: Yo Developer- Is that the SA password
taped to your monitor??
4. The Life of a DBA
Provision Databases
Refresh and provide data to reporting, testing
and development
Secure database environments
Optimize data access
Collaborate to solve business challenges
5. What is Copy Data Management, (CDM)
The management of all non-production databases.
Broad Term- Physical and virtual clones
Managed or unmanaged
Command line or User Interface, (or both)
Administrative, Infrastructure, security
6. Why Virtualize- This Scientific Reason
• The Economics of Data- the natural life of a database is growth. It’s only
going to get bigger.
• Von Newmann’s Bottleneck- the limiting factor of computing speed
dependent on where the data resides and how much data there is.
• Data Gravity- Jim McCrory coined this term for the gravitational pull of
applications and services to data.
These may be theories, these may be viewed as technology challenges to be
overcome another day, but physics an important consideration in technology.
7. Storage costs
Thin-provisioning storage avoidance
Data transfer costs
Far less data transferred during provisioning/refresh
operations
As opposed to the volume of data transferred using
traditional cloning techniques
Simplifies Provisioning vs. archaic processes to copy data
Why Use Virtualize- the Business
Reason
8. 8
▶▶▶
Virtualize and Deployed▶ ▶ ▶
80% of Environemnts are Repeat Data
Storage Pool for Delphix
QA
DEV PATCH TEST
PRODUCTION
Database/App Tier
1 TB
1 TB
0.6 TB
Read From Production
Spin a VIRTUAL database up a patch test, without having to remove a current development or test one.
TEST
11. Data Virtualization: From Prod to
Virtual
Validated Sync
Environment
Source Environment
SCSI/SSL
Any Storage
Create as many VDBs as needed!
12. Data Virtualization: Space Savings
Delphix Virtualization EngineValidated Sync Target
Dozen Virtual Database Copies,
(VDBs)
13. Data Virtualization: How is it possible?
Delphix Virtualization Engine
Any Storage
Virtual Database
14. • Using any storage and only fraction of space
• Syncs with native or third-party SQL Server backups
• Can maintain two weeks of data changes
• Managed just like any SQL Server database
• Users can instantly provision a read/write virtual copy
of a database
• Can be used for replication, mirroring, change data
capture (CDC), and maintenance.
This is Data Version Control
17. 17
Spin up a new
VIRTUAL DB and SQL
bin Files
Night Time ETL/Maintenance Challenges
Finance
Dev
Test
QA
ETL
BI
Reporting
DBCC’s
Scripted out to spin up new VDBs
to run nightly jobs, maintenance,
etc.
18. 18
Epiphany
e·piph·a·ny
əˈpifənē/
noun
a (1) : a usually sudden manifestation or perception of the essential nature or meaning of
something (2) : an intuitive grasp of reality through something (as an event) usually simple and
striking (3) : an illuminating discovery, realization, or disclosure
b : a revealing scene or moment
22. 22
Patching and Upgrading Databases
Each patch, would need to be applied to a
development database, requiring outage
to development teams and then tested
before applying to test, UAT and then
onto production.
This has to be performed to EACH
environment, every SQL Server, each
quarter.
Finance
HR
CRM
UAT Test DevProd
23. 23
Downtime for valuable resources.
DBAs working afterhours
Each database must have it done and…
The tedious task must be performed over and over
again.
Little opportunity for advanced learning.
Each database may experience different bugs.
Risks/Challenges Of This Approach
24. 24
Spin up a new
VIRTUAL DB and SQL
bin files and apply
patch to it.
Environment Virtualization, DB Style
CRM
Finance
HR
25. 25
No need to keep the
extra VDBs post
patch to prod.
After Testing, Apply to Production
CRM
Finance
HR
26. 26
The Compressed Copies
in the Delphix Engine
are Upgraded!
Environment Virtualization, DB Style
HR
Finance
CRM
29. 29
• I didn’t have to take away a valuable resource’s database environment to test
the patches.
• I didn’t have to apply the patches to subsequent environments, as they are
virtualized copies of the source, simply requiring a refresh from production,
post final patch.
• I save significant time that commonly has to be allocated to quarterly and
annual maintenance for patching.
• I apply the patch twice- once to test, once to production. I only need to
refresh my environments after I’m done.
• For releases, this can be “containerized”, simplifying release and if required,
rollback.
Patching and Upgrading with
Virtualization
33. Just copy data and applications into the cloud…
• Straightforward approach
• Inefficient, non-incremental for large environments
• Open-source “bcp” uses encryption, compresses,
and multi-threads
• Archaic processes recommended by vendors
Start with backups to IaaS storage, then populate re-hosted
applications by restoring from those backups
• Cloud backups are easy, known technology
How Are Companies Migrating to
the Cloud?
34. Cost Estimates for Azure
https://azure.microsoft.com/en-
us/pricing/details/storage/blobs/
Storage Capacity LRS
First 1 TB / Month $0.024 per GB
Next 49 TB (1 to 50 TB) / Month $0.0236 per GB
Next 450 TB (50 to 500 TB) / Month $0.0232 per GB
Next 500 TB (500 to 1,000 TB) / Month $0.0228 per GB
ZRS
$0.03 per GB
$0.0295 per GB
$0.029 per GB
$0.0285 per GB
36. Migration Complete…Not so Fast…
• What if you only want dev and test in the cloud?
• What about the application, support files and other data sources?
• Data is migrated, but this doesn’t count for ongoing data loads,
application connectivity across the network.
• To refresh will take considerable time to perform with traditional tools or
cloning methods.
• Rarely a consideration for the difference in cost structure for processing
large amounts of data from on-premise to the cloud.
38. https://docs.microsoft.com/en-us/azure/sql-database/sql-database-cloud-migrate
• Database must be MSSQL 2005 or higher, (easy)
• Ensure that the database is compatible with Azure SQL DB, (correct
any incompatible functions, etc.)
• Must have identified all performance issues that will be impacted
beforehand.
• Ensure there is as little physical distance between the cloud data
center and bacpac files to be used for migration.
• Disable management jobs that will hinder migration processing.
• Drop any objects or historical data that can impact migration time
and can be performed post migration.
40. Let’s Discuss Network Latency
• Network has been the bottleneck of every cloud project I’ve been a part of.
• There’s a reason that AWS has invested in the Snowball and Snowmachine.
• We can’t break the law of physics.
• Let’s talk about Shannon’s Law…
• In layman’s terms- the data is only going to go as fast as it can do so without hitting
a error threshold.
41. How We Migrate it All
Database server
Cloud Storage
Database server
42. Much Improved Option-
1. Data virtualization
Easy, secure, revolutionary
Simple movement via virtualized environments- movement of
storage encompassing approximately one environment,
no matter how many.
Extended time to “rehydrate” environments to physical if
desired.
Ability to containerize data sources, applications and support
files as Data Pods and move easily as one.
Migrating from on-prem to IaaS with
Virtualization
43. Data Virtualization, OnPrem & Data Pods
Source DB server
SQL Server 2008-2016
1 TB
Delphix Virtualization Engine
2 TB storage
Target DB server
SQL Server 1008-2016
No database storage
SCCI/SSL
Backups
via SMB
Bin files, flat files,
data sources
Virtualized and
now
containerized as
a Data Pod
44. Data virtualization: from on-premises
into the Cloud
Source DB server
SQL Server 2008-2016
2TB storage
Delphix Virtualization Engine
2 TB storage
Target DB server
SQL Server 2008-2016
No database storage
45. Optimized for the cloud in the first place…not after!
Different cost structures
Much smaller storage footprint, much less data-transfer
How Does Data Virtualization Enhance
this?
46. Traditional copy data management techniques
Developed without concerns about infrastructure chargeback
This corresponds to higher cost.
IaaS vendors monitor storage and data transfers
Help meet SLAs, garner profits
It’s not just the data that exists in the end, so transformations can
equal big money for cloud vendors.
Different cost structures
47. Know Thy Enemy…
• Tune SQL and Apps to perform efficiently as possible- natural
life of database is growth, (in processes, resources, etc.) before
migrating.
• The less network latency, the better- network tracing to
eliminate database blame is important.
• Many of the same tools and data provides value- DMVs provide
data internally to SQL Server.
• Look at management tools such as Cloudmonix, (formerly
AzureWatch) AppDynamics, Dynatrace, Zabbix or Logic Monitor.
48. For Non-Production Systems…
Change the way you’ve always performed tasks.
Performing common tasks the same way as previously might
end up costing more.
Secure Data
All IaaS alternatives promote encryption for data in-flight and
for data at-rest, but encryption may not be the right
answer…
Secondary Considerations
49. All IaaS solutions provide encryption in-flight and encryption at-rest
But encryption doesn’t protect data as much as it needs to be .
Europe already requires data masking, not just data encryption for any
confindential data, (GDPR):
http://ec.europa.eu/justice/data-protection/article-
29/documentation/opinion-
recommendation/files/2014/wp216_en.pdf
Confidential data
50. Encryption is reversible data obfuscation, which is very different from
masking data.
• Data masking is non-reversible.
It solves the issue at the data level.
Is authentication and authorization in non-production in compliance with
security goals?
All organizations will soon need to review if critical data in non-
production environments be accessible to developers, testers and
users.
Confidential data
51. Masking personally-identifiable, (PII, HIPPA, PCI, etc.) information
renders it useless from a security standpoint
Resolves both the technical and personal responsibility issue.
The data can be masked before it moves to non-production,
removing unnecessary risk.
Why Masking is Part of the Answer
53. Masking- All the Data
SQL Server
Validated Sync
Environment +
applications/flat
files
Delphix Masking Engine
Delphix Virtualization Engine
2TB storage
SQL Server Target
No database storage
54. Simple Masking and then to the Cloud
Delphix Virtualization Engine
2 TB storage
SQL Server Target
Storage Only for
Masked Data
Delphix Virtualization Engine
2TB storage
Delphix Masking Engine
SQL Server
Validated Sync
Environment
SQL Server Target
Storage Only for
Masked Data
56. Masked Data Pods Migrated to Cloud
Delphix Virtualization Engine
4 TB storage
SQL Server
Validated Sync
Environment
SQL Server Target
Storage Only for Masked
Data
SQL Server Target
Storage Only for Masked
Data
Delphix Virtualization Engine
4 TB storage
SQL Server Target
Storage Only for Masked
Data
Delphix Masking Engine
57. Virtualization Makes Management of non-
production environments simple.
Security with masking and encryption is best
Cloud migrations are more successful when
virtualized and planned accordingly.
Patching and Maintenance can be done with less
effort and resources.
58. Want to try it out, download the Delphix Azure Trial! https://www.delphix.com/products/free-trial-
request
Twittter: @DBAKevlar
Linked in: http://linkedin.com/in/kellynpotvin
Blog: http://dbakevlar.com
59. 59
Delphix with SQL Server- the Basics
https://docs.delphix.com/docs/delphix-administration/sql-server-environments-and-data-
sources/managing-sql-server-environments/overview-of-setting-up-sql-server-
environments
Delphix Upgrade Workflow: https://community.delphix.com/delphix/topics/tip-of-the-day-
upgrading-a-sql-server-dsource
Upgrading the Dsource after an Upgrade: https://docs.delphix.com/docs/delphix-
administration/sql-server-environments-and-data-sources/virtualizing-databases-using-
delphix-with-sql-server/managing-sql-server-dsources/additional-dsource-
topics/upgrading-a-dsource-after-a-sql-server-upgrade
Delphix in the Cloud
https://www.delphix.com/solutions/cloud-migration-virtual-data
References and Tips
Notas del editor
The real question is- should it be?
Let’s Talk about the future of the DBA with DevOps-
Learn other database platforms
Learn Shell, other than Powershell, learn Python and automation tools for DevOps
ETL, subsets of data, as well as physical and virtual clones, backup, replication.
Where is all that data going?
DBA 1.0/2.0? Does it translate?
Manage all those copies.
Data gravity suffers from the Von Newmann Bottleneck. It’s a basic limitation on how fast computers can be. Pretty simple, but states that the speed of where data resides and where it’s processed is the limiting factor in computing speed.
Microsoft researcher Jim Gray has spend most of his career looking at the economics of data, which is one of the most accurate terms of this area of technical study. He started working at Microsoft in 1995 and although passionate about many areas of technology, his research on large databases and transactional processing speeds is one of great respect in my world.
80-90% storage savings from traditional migration methods.
Data In flight can be significant cost for many cloud vendors
Network is the new bottleneck. You can avoid that with less copies- one golden copy- we call it the “validated sync environment”
We create virtualized environments for database, application, flat files and other data sources, we can containerize it, (which I’ll go into more later)
And allow you to create as many dev, test, reporting and patching environments you need.
You can create copies of your SQL Bin files and it’s heterogeneous, so we can do this for SQL Server, Oracle, MySQL, Hadoop, applications and flat files.
These are read write copies, so keep that in mind, that take little to no storage. They will have background processes and memory structures, along with a unique transaction log for each environment. Those are also written to the delphix engine.
This is an example- let’s say it’s 1TB
We take that 1TB and using the native backups, we create a validated sync copy, (a golden source) that in our delphix engine.
Really a Vmware Software appliance that is kept in a state of perpetual recovery. We use a Postgres DB on the backend that tracks the ongoing LSN and timestamps from the transaction log backups that applies to the golden source and keeps the source up to date with production. As we pull from the transaction log backups, there is little to no latency to the environment.
We then a re able to create as many VDB, (Virtual Databases) as we want from the source, (i.e. validated sync copy)
Using SCSI/SSL connectivity, very fast IO, as 80% or more of your data is the same across environments.
We can do this for
Now as you see all these copies, you might assume that production takes 1TB, so ten copies would take 10 TB, but not so…
As we spoke about earlier, the only blocks that are written per VDB and Data pod, (aka container) are the unique blocks for that environment. This saves extensive space and along with the dedup and compression on the golden source/validated sync, this means that all those copies only take up the approximate space of one copy.
We here about code and script version control, but what about data version control.
As the time flow for a virtualized environment is tracking each unique LSN and timestamp, it’s easy to recover to any point in time via a GUI with just the shift of a slider in the “timeflow”.
Two weeks is default for what we retain, but we have customers who retain much more and those that retain less. Just depends on the configuration.
Think about all the use cases for this, as we like to refer to this as a “swiss army knife solution”.
We’ve saved one customer in Denver due to a removed datafile scenario.
What you’re seeing here is the admin console.
It doesn’t look like a standard Vmware interface or storage/backup utility. This is focused on the DBA, so databases, applications and other sources are shown in the interface.
We can go through our “snapshots” and we have a timeflow slider at the bottom to toggle through each of them.
Notice that the Vfile, (application) can be rewound, refresh or provisioned off the virtual even.
This is the interface for Developers and testers- they can bookmark before important tasks or rewind to any point in the process. They can bookmark and branch for full development/testing needs.
Let’s say a catastrophic situation occurred, the developer can simply dial back to the previous bookmark or anywhere inbetween, but bookmarking allows for designation of a change, allowing for easy knowledge when a change has occurred, or version control.
For DBAs and analysts, there often isn’t enough time or resources to run all the processes, ETL, maintenance in the time allocated each night.
You can spin up another VDB to perform tasks like ETL, maintenance jobs like DBCC’s
Spin up another for running Power BI reports against and no need to use up resources valuable to the production environment.
How often does Microsoft send patches?
Do we start picking priorities about what we apply depending on environment access, resources and such?
DBA has to commandeer a database for patch testing.
This has to be performed for EACH environment, 100’s or 1000’s of databases!
Most are not synchronized with production, different outcomes when released to production.
Bugs occurring in one, not another!
The standard process is to patch on a regular basis and to commandeer a database from a developer or tester, once tested, then you have to apply the patch upwards to production.
This is how I would do so with a virtualized environment for all non-prod databases.
I would spin up a VDB, vs. commandeering a database from developers and/or testers so they can continue to be productive.
I’d spin up avirtualize bin home for my SQL Server installation, too.
I would then apply the patch to this new virtual environment, verifying that everything tests without incident.
Once complete, submit the patch submission for production and take the downtime to apply to prod.
Link the production database with the Delphix Server.
Provision a VDB at the existing patch level.
Patch the existing SQL Server bin files against the live VDB.
or
Create the new SQL Server bin file directory and switch the VDB.
Rollback VDB or Refresh from production.
Repeat 3 or 4 until confident.
Once the process has been tested and confirmed, it can be rolled out with confidence into production.
Now that I’ve patched production and my Delphix Engine has kept all the changes up to date in all my virtual SQL homes
Now we simply refresh the VDB and virtual homes. They are automatically patched and I don’t have to apply them to the environments.
The refresh only takes a matter of minutes for each environment, can be automated/scheduled.
Do any of you see the problem with the high level project steps?
We commonly leave optimizing the environment until after we’ve migrated to the cloud.
Bulk copy protocol or other archaic processes
This also leaves the project open to failure.
90% of the cloud projects I’ve been on, including those for Oracle suffered issues with getting data to the cloud.
Amazon is now using a truck to get your data to their data center. Kind of odd, isn’t it?
The cost looks small when we see it from a monthly perspective, but anyone who thinks they’re going to the cloud to save money, may be leaving themselves open for quite the surprise.
Standard backup and recovery methods
Replication
Cloning, SSIS Packages to push data to Azure
Continual feed to keep up to date or refresh on regular basis, via archaic tools- bcp, log shipping or paid replication tools.
How many of you have moved dev and test to the cloud? How many moved cloud or moved it first??
If you moved it, would you consider keeping processing the same?
How can the cost structure impact you?
What all has to be moved? What issues are you going to run into?
Optimize first? Why?
This is for Azure migrations- the requirements
And if you choose wrong or use more resources than expected, you can experience severe performance issues.
What resources are you really using? DBAs know, but do the developers and other stakeholders in the cloud migration project?
Network is the new bottleneck.
IO is the second and with how pricing is done in the cloud, compute and storage doesn’t often consider IO or network issues.
Shannon-Hartley Theorem. The equation relates to the maximum capacity (transmission bit rate) that can be achieved over a given channel with certain noise characteristics and bandwidth.
A given communication, (or data) system has a maximum rate of information C known as the channel capacity
If the transmission information rate R is less than C, then the data transmission in the presence of noise can be made to happen with arbitrarily small error probabilities by using intelligent coding techniques
To get lower error probabilities, the encoder has to work on longer blocks of signal data. This entails longer delays and higher computational requirements
In layman’s terms- the data is only going to go as fast as it can do so without hitting a error threshold.
Once final tests are done- you are testing.
Perform final migration, final sync to prod and downtime to switch from on-prem to cloud.
With virtualization, we virtualize it all- the database, data sources, application and flat files. We containerize it into a Data Pod
By doing so, it’s easier to life and shift to the cloud. It’s lighter and we can move 20 + environments in the same space as one.
By going to a single source, loading to a single source and maintaining a single source, a smaller footprint is attained.
Cost savings in the way of less storage required results in even bigger savings.
The data in the golden source is also compressed and deduped, so less data in flight even from it!
Different cloud manufacturers have different pricing structures- verify what you are being charged for and make sure those costs aren’t in contrast with your environment.
Many avoid RDS on Amazon- we don’t support it. For our Oracle customers, too limited.
Before you start, tune SQL instead of after.
Use network tools like Nagios network analyzer or. Solarwinds Network Performance Monitor, (NPM)
Your performance data can assist you in identifying huge IO, CPU and remote resource work that should be minimized beforehand.
Data in flight can cost you and data processing that was normal on-prem, may need to be redesigned post cloud migration.
Inspect pricing small print carefully and know you’re final decision on *what* choice in cloud and type of service will determine.
Encryption is important for production.
SQL 2016 dynamic data masking isn’t production ready- three steps and I had ‘un-masked
data!
Or does it shift the problem toward authentication and authorization?
Anyone who felt the pain for Target, this is a great example, as it was a non-prod environment accessed via a vendor and customer data was violated.
Same as before, we are virtualizing everything, the app, the database, the data sources, etc.
We can then, after it’s virtualized, add a secondary Delphix engine just for masking.
Masking is resource intensive, so for many of our customers, who have enterprise level environments, we let it do the work.
We Then mask the flat files, big data source and database data that you set up.
Going to the cloud, we mask the data before it ever goes to the cloud. Again, put a delphix engine onprem, masking engine on prem, mask everything and then push it to the delphix engine in the cloud that then grants great performance out there.