NBCUniversal, a worldwide mass media corporation, was looking for a more affordable and easier way to manage their database solution that hosts their extensive online digital assets. With Datavail’s assistance, NBCUniversal made the move from MongoDB 3.6 to MongoDB Atlas on AWS.
In this presentation, learn how making this move enabled the entertainment titan to reduce overhead and labor costs associated with managing its database environment.
1. Why NBC Universal Migrated to MongoDB Atlas
Charleste King,
MongoDB Practice Lead, Datavail
Trinh Nguyen,
Sr. Director of Technology, NBCUniversal
2. MongoDB Practice Lead, Datavail
Charleste King
Over 20 years in the IT industry
Expertise in software development to data
analysis, architecture, and administration
Industry experience includes aerospace,
agriculture, medicine, education, and more.
Specializes in developing solutions for
unique client problems ranging from multi-
level upgrades to forecasting problem areas
to tuning and performance monitoring
Has a Master of Science from Georgia
Institute of Technology
3. Datavail Overview
Databases, Analytics, and
Application Development
We are data
specialists
15+ years delivering
data services
Projects, Managed
Services and staffing
500+ customers
7 years average retention
All major platforms including
MongoDB and MongoDB Atlas
AWS, Azure and Google
Cloud
Comprehensive development
& operational services
US & Global delivery models
Health checks & performance
tuning
Upgrades
Database
Operations
4. Sr. Director of Technology,
NBCUniversal
Trinh Nguyen
12 years with E!’s Interactive Technology team
Oversees three teams: Software engineers, Web
Developers, & DevOps
Expertise in Lambda, EC2, ECS, Redis,
GraphQL, Chef to build Spring-based
applications as well as React apps with NodeJS.
Works primarily in Java/Javascript, using
MongoDB and MySQL for data storage
Migrated E! to the cloud completely in March
2017 - first Cable brand at NBCU to move its
entire infrastructure to the cloud
Has a dog named Professor, a son named
Maverick, and a wife who let him come to this
conference
Bachelor’s in Computer Engineering from the
University of California, Irvine
5. www.nbcuniversal.com
E! is the only global, multi-platform
brand for all things pop culture. The
network is currently available to 91
million cable and satellite subscribers
in the U.S and 161 countries globally.
E! is a network of NBCUniversal Entertainment &
Lifestyle Group, a division of NBCUniversal, one of the
world's leading media and entertainment companies
NBC Universal
E! Entertainment
6. Agenda
• NBC Universal – Business drivers & technical
needs
• Top pain points driving move to MongoDB Atlas
• Technical needs to be overcome before
migration
• Migration
• Post migration observations and benefits
• Lessons learned
8. NBCUniversal Business Drivers For Atlas Migration
Ability to build digital products and features quickly
Highly available and secure database
Streamline operational costs – Licensing and Infrastructure
Self Service provisioning and management
Intuitive metrics and visualizations
9. Technical Needs for E! Digital to Migrate to Atlas
Cost and Ease of Management
Ongoing/Repeated tasks:
Daily health checks
Storage issues – constantly checking storage
Manual deletion of snapshots
Upgrading/Patching
Performance tuning
12. Top Pain Points Driving Move to
MongoDB Atlas
Provisioning Instances
Access to Ops Manager
Downtime / Patching
Storage / Data Retention
Alerts Lagged in Getting to
Right Individuals and
Teams
Self Service
Access Atlas Portal from Anywhere
Auto patching and one click upgrades w/o
downtime
Storage Auto Expand and Automated Backups
Streamlined and Targeted Alerts
Prior Challenge Benefits of Atlas Migration
14. In our network we have a proxy appliance, which
all connections to the internet must route through.
Normally this is fine with web browsing, however
when trying to connect from our desktops to
mongo atlas instances connection is denied. Is
there a way to setup connectivity to include the
proxy settings?
Arbi Begian,
DevOps Lead
15. First Challenge
Developers were not able to
connect from our corporate office
to Mongo Atlas since our
corporate network egress is
restricted to go out through a
proxy firewall device, which
blocks all ports except for HTTP
and HTTPS.
16. Connectivity to Atlas is established using the
MongoDB wire protocol, which is not HTTP-
based. Unless the proxy appliance that you
mention has explicit support for the wire protocol or
port forwarding, then it is unlikely that the
appliance will function in the manner you intend.
Mongo Support
17. How E! Online Overcame Connectivity Restriction
To overcome connectivity limitation from our corporate network to
hosted Mongo Atlas we had to reconfigure our network and routing
setup:
Instead of using corporate internet egress to connect to Atlas we decided to route
through our AWS VPC.
By VPC peering the two networks, E! was able to establish communication to
MongoDB clusters within the network .
To abide with our AWS environment security we had to peer according to each of our
specific VPCs (Dev/Shared/Staging/Prod).
We reconfigured our internal routing to pass the MongoDB communication internally
rather than going through the public interface.
18. /etc/hosts
#Using Dev or Shared Open VPN
172.22.64.XXX dev-pub-shard-00-XX-abcde.mongodb.net
172.22.64.XX dev-pub-shard-00-XX-abcde.mongodb.net
172.22.64.XX dev-pub-shard-00-XX-abcde.mongodb.net
#MongoDB Atlas Properties
### Dev env Atlas ###
atlasConnectionUrl=mongodb://webclient:USER@dev-pub-shard-00-XX-
abcde.mongodb.net:27017,
dev-pub-shard-00-XX-abcde.mongodb.net:27017,
dev-pub-shard-00-XX-
abcde.mongodb.net:27017/{database}?authSource=admin&retryWrites=true
&ssl=true&replicaSet=dev-pub-shard-0
atlasBudConnectionUrl=mongodb://
atlasEnable=true
Developer Setup
19. VPC Peering Developers’ communication issues
were solved since we did not have
to go through the proxy device.
By going this route it also made
our connection to the Atlas cluster
much faster, even faster than
accessing our EC2 instances that
were running MongoDB
Enterprise.
Unexpected
Performance Gains
20. Setting up Your Subnets
When we setup our subnets we tried to keep the address spaces
similar
Dev-Test subnet before: 172.21.68.0
Atlas Dev-Test Cluster after: 172.22.68.0
During Cluster Setup we did not use default subnet, this has to be
explicitly defined if you want to use your own
Easy/Helps know your database IPs when scanning, setting up
firewall rules, inspecting logs, etc.
21. Route 53 We ran into problems configuring SRV within
Route53
SRV service call in Atlas already has logic to
figure out which is node is primary
Keeping the DNS in sync wasn’t worth it just
for cleaner/shorter names when new nodes
were brought up for maintenance or under
load (cluster nodes increasing/decreasing)
In our case we weren’t able to leverage
Route53 to abstract the cluster’s configuration
parameters to be stored in DNS but many of
you will be able to leverage SRV and TXT
records as a solution
Solution and Learnings
Long Mongo hostnames
Setting up Simple to read
CNAMES
Namespace “mongodb.net”
Challenge
22. Updating Application Drivers
Almost all of our applications have a centralized back-end service called
“commons client interface” (CCI).
Previously CCI was configured with Spring Data MongoDB Core »
2.0.7.RELEASE, packaged with mongodb-driver-async v3.5.0.
Upgraded to 3.6.1 and leveraged the retryWrites=true parameter.
Cutover timing.
24. Why we used
Mongomirror Luckily, MongoDB provides
mongomirror which works in this
situation
Solution
If you have VPC Peering with
MongoDB 3.6, Live Migration
requires extra network
configuration that we wanted to
avoid.
Challenge
25. How We Leveraged mongomirror to Help with the
Migration
Helps to perform live migration without downtime.
mongomirror is a utility for migrating data from an existing MongoDB
replica set to a MongoDB Atlas replica set
Does not require to shut down existing replica sets or applications.
Drawback: it does not import the user/role data, thus we had do
manually create the users in MongoDB Atlas after the migration of the
data.
26. 7 Steps to Use Mongomirror
1. Download mongomirror on secondary node
2. Configuration changes and prerequisites in place for mongomirror to work
3. Setup MongoDB User in the AWS and run getParameter command
4. Setup MongoDB User in the target MongoDB Atlas Cluster with Atlas
admin role.
5. Add IP(s) to whitelist
6. Mongomirror sync and tailing of OpLog
7. Switch application and stop mongomirror
34. Benefits of Mongo
Hosted Cloud
Self Service: Developers
can spin up clusters for
project/prototype-based
microservice work.
Additional Gains
35. Benefits of Mongo
Hosted Cloud
Roles & Permissions are
Transparent and don’t
necessarily need a DBA for
setup.
Additional Gains
36. Benefits of Mongo
Hosted Cloud
Maintenance patching
requires no downtime and
easy one-click.
Additional Gains
37. Lessons Learned
If you’re part of a large corporation
that uses a proxy… be ready for:
Know about the port restrictions
Or have a plan on who can
accommodate your testing/migration
and how
If you are running in the cloud, VPC
peering helps, but test routing
Allocate time in your migration plan
to update your applications’ drivers
Talk openly about your
gains to the
organization
Rapid Prototyping
Decreased Downtime
Accessibility
Alerting
Storage/Backup
Autopatching
39. Fill out the
session evaluation
form for a chance
to win our
Datavail
headphones!
40. Stop by our booth for a
chance to win: Datavailopoly
Our GRAND PRIZE:
Oculus Go Standalone Virtual
Reality Headset - 64GB
OR
Notas del editor
Determine if Dan will play moderator role for the breakout session. If so should we include his name on the title slide.
Charleste Will own this slide and open up the event
Thank everyone for joining!
Charleste will own this slide
Need to validate metrics to be consistent with other Datavail messaging
If Dan moderates then he will present this slide. If not then Charleste will present this slide
The audience is here to learn. So make sure this is not a sales pitch, but background on the experience Datavail has working with many customers to address their business and technical needs which enables us to share what we have learned so others can benefit
Trinh will own this slide
“Nuwin” or “n-Win”
This slide is wordy. Is it ok if the formal company message at the bottom is trimmed down?
Trinh will own this slide
Thank Charlese for the Intro and talk about E!’s as a linear TV network that also maintains a strong digital presence due to its global news operations (Celebrity & Entertainment News)
Bread and butter is Red Carpet and Live Events
Also we run our own awards show with last year being the first People’s Choice Awards owned by E!
I’m going to throw It back to Charleste to talk about some of our pain points and why we decided to move to Atlas, and then come back and recap some of the technical challenges that NBCUniversal and my development team had to deal with directly.
Agenda to set the stage for what the Audience will see and learn
Trinh will own this slide
Talk about where E! was:
Originally when we moved from Datacenter to AWS, we figured just running Databases on EC2 would be enough. Set it up and forget it.
Running the underlying infrastructure for OpsManager, the Replica sets power OpsManager, and the databases themselves (including patching the CentOS operating systems) became time consuming for my sys-admin (2) who were trying to lean into DevOps practices
Managing a support ticket in our JIRA and then a Service Now ticket with Datavail, and coordinate back and forth between infra, engineering, and Datavail was slow for new projects as well as maintenance – It ate up time that my small team needed for more product development.
Potentially pull some of thoughts from Trinh’s (slide notes on the NBC Universal overview slide)
Build Digital Products Quickly = to compete in the digital publishing spaced = singular location to store our structured, semi-structured, and unstructured data for various applications
HA and SSL encrypted - Databases designed to be distributed in the cloud
Streamline Operational costs = allow Mongo to manage the underlying infrastructure & software updates + allow my developers to do more configuration and tuning outside of the application (reduce people dependency/communication back and forth)
Keep our databases updated consistently in order to leverage new features as they present themselves
Intuitive metrics = Database health and performance, easily reconfigurable, straight forward billing viewability/estimations, & Dashboard/chart creation
Trinh will own this slide
Talk about the amount of time and logistics needed to interface with the engineering team and the Datavail team, anywhere from 8-20hrs per week for the team just for maintenance items.
Upgrading/Patching was more reactive then proactive and needing to communicate downtime across teams was time consuming
Developers were always focused on application performance but we needed to give additional value to database performance tuning
Trinh will own this slide
Environment
We were running MongoDB Opsmanager 3.6 & MongoDB 3.6 Databases
The DEV and TEST environments had one cluster
The PROD cluster had two clusters – emongo and eolbud (We consolidated into one emongo cluster post migration)
PROD clusters had three node replica set (One Primary and two secondary)
Opsmanager HTTP Serivce (UI) and backup agent were running on one EC2 instance. This agent was also used to store HeadDB and snapshots
Opsmanager application database and Oplog store database were stored in 3 EC2 instances where one EC2 instance was running two mongod processes on different ports.
NEW Environment (MongoDB Atlas)
· We are running MongoDB 3.6.12
· Environments are DEV, TEST and PROD (DEV and TEST are on the same project and PROD is on a separate project in MongoDB Atlas)
Charleste will own this slide
Talk how Datavail supported the E! account and what were the problems or inefficiencies?
Trinh will own this slide
Even with a well planned migration, we ran into some precursory issues we needed to address before using the Atlas product. Let me talk you through what one of our technical challenges (not software or workflow) to adoption was…
These are the top challenges that Sig suggest Trinh covers
Network Connection / Connectivity Restrictions
VPC Peering & Setting Up Subnets
Configuring SRV within Route 53
Updating Application Drivers
Trinh will own this slide
Trinh will own this slide
The biggest issue was the developer experience and access
Trinh will own this slide
There are always nuances
Trinh will own this slide
VPC peering between Atlas and internal environment
Put records of Atlas cluster nodes in route 53 in AWS
Mapping was done and they were able to connect
Trinh will own this slide
Trinh will own this slide
Application hosts-to-Atlas cluster were showing better performance response times then Application host-to-MongoEnterprise on EC2
Trinh will own this slide
Trinh will own this slide
The use of SRV records eliminates the requirement for every client to pass in a complete set of state information for the cluster. Instead, a single SRV record identifies all the nodes associated with the cluster (and their port numbers) and an associated TXT record defines the options for the URI.
Normally SRV would have gained us the complexity of the cluster and its configuration parameters are stored in the DNS server and hidden from the end user. If a node's IP address or name changes or we want to change the replica set name, this can all now be done completely transparently from the client’s perspective. We can also add and remove nodes from a cluster without impacting clients.
Trinh will own this slide
Check your driver version when it is packaged with SpringData
Cutover Timing - While many of our applications have a singular driver configuration location, we still have some legacy applications ~5 that need their drivers updated manually. Therefore cutover timing is important depending on if an application is focused more on read only vs read/write.
Regression testing on new drivers just like you would new features or new database versions – what in your application is relying on deprecated features?