Public cloud adoption is exploding and big data technologies are rapidly becoming an important driver of this growth. According to Wikibon, big data public cloud revenue will grow from 4.4% in 2016 to 24% of all big data spend by 2026. Digital transformation initiatives are now a priority for most organizations, with data and advanced analytics at the heart of enabling this change. This is key to driving competitive advantage in every industry.
There is nothing better than a real-world customer use case to help you understand how to get value from big data in the cloud and apply the learnings to your business. Join Microsoft, MapR, and Sullexis on November 10th to:
Hear from Sullexis on the business use case and technical implementation details of one of their oil & gas customers
Understand the integration points of the MapR Platform with other Azure services and why they matter
Know how to deploy the MapR Platform on the Azure cloud and get started easily
You will also get to hear about customer use cases of the MapR Converged Data Platform on Azure in other verticals such as real estate and retail.
Speakers
Rafael Godinho
Technical Evangelist
Microsoft Azure
Tim Morgan
Managing Director
Sullexis
7. About Sullexis
• Sullexis is a professional services firm that specializes in helping its clients to
create, manage, and enhance data to accelerate and improve decision making
across the enterprise. We bring data and technology together to make our clients
measurably more effective
• With industry experience ranging from energy and manufacturing to finance and
high tech, Sullexis brings the technology, processes, and strategies together to
make you more effective in what you do
• Founded in 2006, Sullexis is headquartered in Houston, TX and has a delivery
center in Monterrey, MX.
• Our consultants have implemented solutions across the US, Caribbean, Europe
and Latin America.
Presentation Title 7
8. Client Background
• Our client is one of North America’s largest Oilfield Services companies
providing well construction, completion and operating services to exploration
and production companies.
• A significant number of acquisitions over the last 10 years resulted in 18
different ERP applications running on 5 different platforms. To enable
future, scale-able growth, they embarked on an ERP standardization project.
The goal to put the entire company on one technology stack with a common
process.
• Having decided to consolidate on a single ERP, the client still needed to
determine how best to handle compliance, regulatory and operational needs
associated with the legacy systems.
• Migrating transaction data to the new ERP would be cost prohibitive and
risky; and market ready data archiving solutions were costly and unable to
meet the defined business needs.
• This left retaining the legacy systems themselves, which would be very
costly, or finding a new approach that was cost effective, reliable and could
meet the business needs.
8
18 to 1
9. Key Requirements
Preserve and provide easy access to ALL data
• Preserve all structured and unstructured data (approx 12 TBs)
• Ability to run legacy reports to meet compliance, regulatory and ongoing business needs
• Easy for a business person to use directly to minimize IT resource dependency
• Ability to provide consolidated views across disparate data sets
Be cost effective
• Flexible and scalable compute/data storage options (ex. Use of cold storage)
• Provide access through existing BI and reporting tools (ex. Hyperion, MS Power BI, SAP Lumira)
to eliminate new purchases and training
• Enable 100% decommissioning of legacy systems
Enable the future
• Establish processes and tools that support future company acquisitions
• Provide platform to enable new and innovate data applications and solutions
9
10. Solution Selection Process
Initial Analysis
• Market Research
• Vendor presentations
Two week POC ‘bake-off’ to demonstrate:
• Rapid integration of different data sources both structured and unstructured
• Connectivity to SAP ECC and Oracle EBS
• Reporting capabilities re-using SAP Lumira
Winning POC Solution
• A MapR Converged Data Platform cluster installed in MapR’s private cloud
• Predefined adapters for Oracle used to extract and load structured data to MapR (<100GB)
• Unstructured data of CSV, PDFs and TXT loaded and made viewable through Elastic Search
• Apache Drill and a local install of SAP Lumira connected to the MapR cluster to demonstrate
reporting capabilities
10
12. Project Considerations
Technology Factors
• Reliability and speed of connection to cloud
• Count and category of machines in cloud
(CPU, RAM, Storage)
• Volume of data (row size and count)
• Ongoing transaction use of source system
• Variable needs for data (frequency,
response, volume)
Project Factors
• Timeliness of and accessibility to various
parties
• Cataloging of all data
• Evaluation of transactional status of existing
data sets, and how to address moving
targets (blackout periods, iterative loads,
journaling)
• Sample extracts from every table
• Ability to validate data loads (row counts
samples)
13. Solution Architecture
NFS
PDF, CSV, XLS Oracle Navision SysPro MS Excel Great Plains
Data
Web-Scale Storage
MapR-FS MapR-DB
Real Time Unified Security Multi-tenancy Disaster Recovery Global NamespaceHigh Availability
MapR Streams
Event StreamingDatabase
Enterprise Grade Platform
13
PDF TIFF CSV
14. Why Azure
• Sullexis and client both experienced with Azure and MSFT
• MapR Quick Start on Azure made it easy and fast to get started
• MapR already successfully running well on Azure (see blog)
• Client’s enterprise MSFT account made it simple to procure and administer
• Connectivity to Azure via ExpressRoute mitigated some of the reliability and latency of
connection
14
15. Apache Drill - Flexible & Fast
Access to any data type, any data source
• Relational
• Nested data
• Schema-less
Rapid time to insights
• Query data in-situ
• No Schemas required
• Easy to get started
Integration with existing tools
• ANSI SQL
• BI tool integration
Scale in all dimensions
• TB-PB of scale
• 1000’s of users
• 1000’s of nodes
Granular security
• Authentication
• Row/column level controls
• De-centralized
15
16. Sqoop – Easy & Efficient
Leveraging a Sullexis developed direct connect extract tool based on Sqoop was
seen as meeting all the technology and project factors:
• Addresses all source data
• Support for both Oracle and SQL Server
• Import direct to Parquet
• Supports type mapping
• Supports incremental imports and merges
• Enables validation via row count matches
• Provides for parallel imports for enhance speed (but also allows for throttling)
16
17. Elastic Search – Simple & Transparent
17
Reporting Client Browser
Web UI
edgenode 1node 0 node 2
POSIX Client
PDF TIFF CSV PDF TIFF CSV PDF TIFF CSV
MapR-FS
ODBC or JDBC HTTP(S)
18. Highlights
• Quick and easy startup
• Primary technical concerns around latency to the cloud can be successfully mitigated (e.g. client’s
cluster enabled transfer rates of 100-140 million records per hour)
• While early, the base business case will result in a payback within a few months and
business users have suggested that data access is easier now than originally available
in the legacy system
• This ERP legacy system decommissioning approach can be executed in as little 2 months
for a complete data archive to 6 months with robust operational reporting
• Provides repeatable tools and process available for future system decommissioning needs
• The client is already experimenting with the platform for use as an IoT sensor data
historian. So far the results have been encouraging
18
19. About Us
We are attuned to the challenges facing organizations in a variety of industries and understand the constant pressure to improve
business processes and make better decisions. But beyond that, we have a passion for technology. Using that passion, we help our
clients use proven technology coupled with our real-world knowledge to accelerate and improve the flow of data and information and
improve productivity. The technical improvements we provide equip our customers to make the best business decisions possible.
Helping our clients unleash the power of their data is our focus.
MapR on Azure: Getting Value from Big Data in the Cloud 19
20. Nearly 50 million Office
Online users
Office for iOS has been
downloaded over 80M times
23. Platform Services
Infrastructure Services
Web
Apps
Mobile
Apps
API
Apps
Notification
Hubs
Hybrid
Cloud
Backup
StorSimple
Azure Site
Recovery
Import/Export
SQL
Database DocumentDB
Redis
Cache
Azure
Search
Storage
Tables
SQL Data
Warehouse
Azure AD
Health Monitoring
AD Privileged
Identity
Management
Operational
Analytics
Cloud
Services
Batch
RemoteApp
Service
Fabric
Visual Studio
Application
Insights
VS Team Services
Domain Services
HDInsight Machine
Learning Stream Analytics
Data
Factory
Event
Hubs
Data Lake
Analytics Service
IoT Hub
Data
Catalog
Security &
Management
Azure Active
Directory
Multi-Factor
Authentication
Automation
Portal
Key Vault
Store/
Marketplace
VM Image Gallery
& VM Depot
Azure AD
B2C
Scheduler
Xamarin
HockeyApp
Power BI
Embedded
SQL Server
Stretch Database
Mobile
Engagement
Functions
Cognitive Services Bot Framework Cortana
Security Center
Container
Service
VM
Scale Sets
Data Lake Store
BizTalk
Services
Service Bus
Logic
Apps
API
Management
Content
Delivery
Network
Media
Services
Media
Analytics