Informatica + MongoDB is a powerful combination that increases developer productivity up to 5x, enabling them to build and deploy big data applications much faster. Informatica provides access to virtually all types of data from modern and legacy systems at any latency, processes and integrates data at scale, and delivers it directly into MongoDB.
With Informatica, companies can unlock the data in MongoDB for downstream analytics to improve decision making and business operations. Using the Informatica PowerCenter Big Data Edition with the PowerExchange for MongoDB adapter users can read and write data in MongoDB, parse the JSON-based documents and then transform the data and combine it with other information for big data analytics - all without having to write a single line of code.
In this webinar you will see a live demo and learn how to:
- Discover insights from Big Data faster
- Run better applications with better data
- Lower costs of data integration
- Deliver business impact with rapid deployment
See the full presentation at:
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Partner Webinar: Deliver Big Data Apps Faster With Informatica & MongoDB
1. Deliver Big Data Apps Faster
With Informatica & MongoDB
John Haddad, Senior Director Big Data
Product Marketing, Informatica
Gaurav Pathak, Senior Product Manager Big
Data, Informatica
4. Informatica for MongoDB
Applications
BI / Analytic Apps
Relational, Mainframe
Data Warehouse
& Data Marts
Documents and
Emails
Social Media, Web
Logs
Machine Device,
Cloud
Access, integrate,
transform, & ingest
data into MongoDB
Access, integrate,
transform & ingest
MongoDB data into
other analytic systems
(e.g. Hadoop, data
warehouse)
5. Informatica for MongoDB
Integrate and move data into MongoDB
Applications
Relational, Mainframe
Customers
table
Customer
Policies
document
Documents and
Emails
Policies
table
Social Media, Web
Logs
Machine Device,
Cloud
Access, integrate,
transform, & ingest
data into MongoDB
6. Informatica for MongoDB
Integrate data on Hadoop for downstream analytics
BI / Analytic Apps
Orders
table
Relational, Mainframe
Product
content
document
Documents and
Emails
Data Warehouse
& Data Marts
Social Media, Web
Logs
Machine Device,
Cloud
Machine data
Access, integrate, transform &
ingest MongoDB data into other
analytic systems (e.g. Hadoop, data
warehouse)
7. Capabilities
•
•
•
•
•
•
•
•
•
High Productivity Development Environment
Universal Data Access
High-Speed Data Ingestion and Extraction
Metadata Discovery for Flexible Data Models
Embedded Entity Access
JSON Document Handling
Distributed Data Access through Read Preferences
Map Once, Deploy Anywhere
Unified Administration and High Availability
9. Unleash the Power of Big Data
With high performance Universal Data Access
Messaging,
and Web Services
Relational and
Flat Files
Mainframe
and Midrange
Unstructured
Data and Files
MPP Appliances
WebSphere MQ
JMS
MSMQ
SAP NetWeaver XI
Web Services
TIBCO
webMethods
Oracle
DB2 UDB
DB2/400
SQL Server
Sybase
Informix
Teradata
Netezza
ODBC
JDBC
ADABAS
Datacom
DB2
IDMS
IMS
VSAM
C-ISAM
Binary Flat Files
Tape Formats…
Word, Excel
PDF
StarOffice
WordPerfect
Email (POP, IMPA)
HTTP
Pivotal
Vertica
Netezza
Flat files
ASCII reports
HTML
RPG
ANSI
LDAP
Teradata
Aster
JD Edwards
SAP NetWeaver
Lotus Notes
SAP NetWeaver BI
Oracle E-Business SAS
PeopleSoft
Siebel
Salesforce CRM
Force.com
RightNow
NetSuite
ADP
Hewitt
SAP By Design
Oracle OnDemand
EDI–X12
EDI-Fact
RosettaNet
HL7
HIPAA
Packaged
Applications
SaaS/BPO
AST
FIX
SWIFT
Cargo IMP
MVR
Industry
Standards
XML Standards
XML
LegalXML
IFX
cXML
Facebook
Twitter
LinkedIn
ebXML
HL7 v3.0
ACORD (AL3, XML)
Kapow
Datasift
Social Media
10. Metadata Discovery for Flexible Data Models
• Schema Inference
done by sampling the
first “n” rows.
• Performs a union of
fields to create the
representative
schema.
• Schema Information
cached in either the
Mongo instance or a
file.
10
12. JSON Document Handling
• Users can read
•
•
and write JSON
documents directly
through
DocumentAsJSON
port.
Filtering while
Read is available
for “Document as
JSON” feature
When writing to
“DocumentAsJSO
N” port, there is no
schema constraint
12
14. The Value of a Virtual Data Machine (like Vibe):
Integration Flexibility: Same skills, multiple deployment modes.
• Skills leverage
• Future proof investment
• Development Acceleration
Development
Deployment
Data
Virtualization
Cloud
Desktop
Server
Data
Federation
Embedded
DQ in apps
Embedded
data quality
in apps
Data
Integration
Hub
Hadoop
14
18. Demo – Product Catalog Creation
Applications
• Sources
• Read Music Data from
a flat file
• Read Books Data from
Relational, Mainframe
a Normalized Database
Schema
• Read Movies Data
Documents and
Emails
Social Media, Web
Logs
from a JSON File
Access, integrate,
transform, & ingest
data into MongoDB
• Lookup
• Lookup MongoDB for
SKU details
• Target
• Write to
Machine Device,
Cloud
ProductCatalog
Collection in MongoDB
19. Demo – Batch Aggregation
Applications
BI / Analytic Apps
• Sources
• Read from Orders
Collection in
MongoDB
• Transform
• Aggregate Data
Data Warehouse
& Data Marts
Access, integrate,
transform & ingest
MongoDB data into
other analytic systems
(e.g. Hadoop, data
warehouse) or
MongoDB itself
based on Nation
and ShipMode
• Target
• Write to Nation
collection in
MongoDB
20. Customer Benefits
•
•
•
•
Discover Insights from Big Data Faster
Run Better Applications with Better Data
Lower Costs of Data Integration
Deliver Business Impact with Rapid Deployment
21. Resources and Q & A
http://bit.ly/mongodbbi
• Configuring PowerExchange to connect to
MongoDB at http://bit.ly/pwxmongodb1
• Create mappings that read and write to
MongoDB at http://bit.ly/pwxmongodb2
• Write JSON documents directly to
MongoDB at http://bit.ly/pwxmongodb3
Notas del editor
And of course what we are seeing is not new. We have seen this phenomenonbefore as wave after wave of technology introductions came upon us, each bringing its own set of opportunities to deliver on bigger and better business outcomes. As has been clear over the past 10-15 years though, the pace is increasing. Just consider that Facebook, Twitter, iPhones weren’t there a mere decade ago. Each wave brings new technologies– new applications, new computing platforms, new development languages. And of course, it is not just the new that IT needs to deal with though when we think of the disruptions. Almost every technology on this slide is still around in your data centers – mainframes, SAP ERP, Oracle databases, plus we have to bring in the new such as Hadoop, VMWare, cloud services like Amazon Web Services.In addition to the technologies, one pervasive theme is the explosion in data collected and required across all of these different waves. This is not just the volume but in the sheer number of connection points between the different information sources. In many ways, the information is the most important aspect in today’s world and the technologies becoming the medium whereby we collect or disseminate insights or insight driven interactions.
We live in a world that’s dominated by smart devices that are constantly connected to each other and the internet – where data is collected about more events or things than ever before and where the promise of how data can change the way the world operates in every aspect – how we run and operate companies, how we build new customer experiences, or build new products. We are in the process of building out a very big data-centric world!This new world impacts every industry and every domain – from aviation where companies like GE now embeds sensors into aircraft engines that can evaluate the engine performance constantly – allowing them to increase fuel efficiency of these engines and optimize engine maintenance. Or in healthcare where genome sequencing is going to reach the mainstream level in the next couple of years, allowing individuals to get it done at a local Walgreens. Imagine the impact on personalized healthcare and treatment for diseases or things like obesity.Business is connecting innovation to big data. Here are some common use-cases across industries:Financial ServicesRisk & Portfolio Analysis,Investment RecommendationsRetail & TelcoProactive Customer Engagement,Location Based ServicesMedia & EntertainmentOnline & In-Game BehaviorCustomer X/Up-SellManufacturingConnected Vehicle,Predictive MaintenanceHealthcare & PharmaPredicting Patient Outcomes,Total Cost of CareDrug DiscoveryPublic SectorHealth Insurance Exchanges,Public Safety,Tax OptimizationFraud Detection
Move policies into MongoDB – this is basically taking massive amounts of policy information from a dozen relational data sources, transforming from relational to hierarchical JSON documents and populating MongoDB.
Once data is in prod then take MongoDB data, transform it and put it into a data warehouse. This is mostly new or changed data.
High Productivity Development EnvironmentProvides a visual metadata-driven development environment so developers can achieve up to 5 times productivity over hand-coding data integration from scratch.Universal Data AccessProvides access to virtually all types of data including RDBMS, mainframe, ERP, CRM, social media, machine and sensor device data, Cloud applications, and industry standards data (e.g. FIX, SWIFT, HL7, HIPAA, ASN.1, EDI, etc.)High-Speed Data Ingestion and ExtractionAccesses, loads, transforms, and extracts big data between source and target systems and MongoDB at high speeds using high-performance connectivityMetadata Discovery for Flexible Data ModelsAutomatically discovers schemas by sampling records in a collection to provide a representative collection schema. Users can edit the discovered schema by adding or removing columns required for analysis.Embedded Entity AccessCreates pivoted columns so users can easily access data from Embedded Documents and Arrays and integrate data independent of the application data modeling in MongoDBJSON Document HandlingInteracts with MongoDB’s BSON serialization format with the ability to ingest complex JSON structures directly into MongoDB as documents or extract selected JSON elements for analysis.Distributed Data Access through Read PreferencesProvides "Read Preferences" that allows users to choose which members of the MongoDB Replica Set to use for data integration. This gives users a choice to distribute data integration jobs to secondary nodes in a replica set, so as to not affect performance of the primary node and the main application load.Map Once, Deploy AnywhereIncludes the Vibe Virtual Data Machine so users can build data integration pipelines once and deploy them anywhere such as distributed grid computing platforms like Hadoop, on-premise, or in the CloudUnified Administration and High AvailabilityAutomatically schedule and coordinate data integration workflows with high availability and reliability using a unified administration console
Metadata Discovery for Flexible Data ModelsAutomatically discovers schemas by sampling records in a collection to provide a representative collection schema. Users can edit the discovered schema by adding or removing columns required for analysis.
Embedded Entity AccessCreates pivoted columns so users can easily access data from Embedded Documents and Arrays and integrate data independent of the application data modeling in MongoDB
JSON Document HandlingInteracts with MongoDB’s BSON serialization format with the ability to ingest complex JSON structures directly into MongoDB as documents or extract selected JSON elements for analysis.
Distributed Data Access through Read PreferencesProvides "Read Preferences" that allows users to choose which members of the MongoDB Replica Set to use for data integration. This gives users a choice to distribute data integration jobs to secondary nodes in a replica set, so as to not affect performance of the primary node and the main application load.
Vibe is the industry’s first and only embeddable virtual data machine to access, aggregate and manage data – regardless of data type, source, volume, compute platform or user. It lets you map once, and deploy anywhere. So you can take your logic that may have defined on-premise, then move it to the cloud. And then move it to Hadoop, or embed it in an application– without recoding.This makes your architecture faster, more flexible, and futureproof.Business BenefitFive time faster turn-around from business idea to solutionAdapt the technology to your business, not vice-versaUtilize all your data, regardless of location, type or volumeIT BenefitFive times faster project deliveryEliminate skills gaps for adopting new technologies and approachesReduce cost of maintaining complex assortment of technologies
Unified Administration and High AvailabilityAutomatically schedule and coordinate data integration workflows with high availability and reliability using a unified administration console
Discover Insights from Big Data FasterThe power of big data means you can access and analyze all of your data. A growing number of the worlds’ top companies and government agencies use MongoDB for applications today which means more and more data is being stored in MongoDB. Using Informatica, data analysts can discover insights by combining data in MongoDB with data from other operational and analytic data stores across the enterprise.Run Better Applications with Better DataThe ease of getting data into MongoDB from other enterprise and third party systems is critical to big data applications delivering trusted, complete, and relevant information to business users. Informatica ensures that users can access and integrate all of their data with MongoDB and other enterprise data to provide a complete picture.Lower Costs of Data IntegrationUp to 80% of the work in a big data project involves data integration and the resource skills to do the work are often in short supply. The good news is that organizations can staff big data projects with over 100,000 trained Informatica developers around the world. Informatica makes it easier to load and extract data into and from MongoDB using readily available resource skills thereby lowering ongoing operational costs.Deliver Business Impact with Rapid DeploymentMongoDB is the world's leading NoSQL database designed for ease of development and scaling. By providing a flexible schema, it makes application development truly agile. Informatica is a leading provider of data integration software. By providing a single, consistent data integration approach for all types of data including traditional relational structures, complex file formats, and flexible dynamic schemas such as in MongoDB, organizations are empowered to rapidly adopt MongoDB into their enterprise data management infrastructure for maximum business impact.