SlideShare una empresa de Scribd logo
1 de 19
Informatica & Big Data  Sanjeev Kumar VP & MD, Informatica India Apache Hadoop India Summit 2011
Agenda Big Data  Big Data in Enterprise Informatica & Data Informatica & Big Data
Why “Big Data” Now? : Exploding Data Volumes Complex, Unstructured Relational ,[object Object]
 Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this yearSource: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009.  .
Why Now? Exploding Data Volumes Explosion in user-generated content e.g. Blogs, Twitter, Facebook etc. Proliferation of web-connected devices Smartphone interactions with the web Increased consumption of digital content Netflix, HULU, Pandora etc. Internet of things Smart-grid and smart-meters Machine-generated data via the web
Why Now? : New Apps/Use-cases Analyze customer/market sentiment Text analytics on Social Media, blogs Achieve Operational Efficiency e.g. Analyze CDRs to optimize cell tower placements Make Recommendations Data mining on click-stream, purchase history Predict the future e.g. Flightcast predicts flight delays
Big Data Challenges Storage Cost-effective Scalability: to multi-terabytes and petabytes Non-traditional data models: complex, semi-structured data Processing Data mining, collaborative filtering for structured data Text Analytics, classification etc. for unstructured data Regulatory Compliance Data Privacy / Masking Data Archival
Addressing Big Data Challenges Storage Parallel Databases Greenplum(EMC), Vertica, AsterData Distributed Key/Value Stores  Hbase, Google’s BigTable, Amazon’s SimpleDB Distributed File Systems HDFS, GFS, ParAccel Analytics SQL with extensions Map Reduce DataFlow Languages : PIG, Sawzall etc
Hadoop Technology Stack Pig Hive Cascading ZooKeeper Map/Reduce HBase HDFS
Hadoop Momentum Job Trends from Indeed.com Search Volume Index News Reference Volume
Big Data in the Enterprise – Hadoop Usage
Big Data in the EnterpriseCase Studies: Hadoop World 2009 Yahoo!: Social Graph Analysis VISA: Large Scale Transaction Analysis China Mobile: Data Mining Platform for Telecom Industry JP Morgan Chase: Data Processing for Financial Services eHarmony: Matchmaking in the Hadoop Cloud Rackspace: Cross Data Center Log Processing Visible Technologies: Real-Time Business Intelligence Booz Allen Hamilton: Protein Alignment using Hadoop Slides and Videos at http://www.cloudera.com/hadoop-world-nyc
Big Data in the EnterpriseCase Studies: Hadoop World 2010 eBay: Hadoop at eBay Twitter: The Hadoop Ecosystem at Twitter General Electric: Sentiment Analysis powered by Hadoop Yale University: MapReduce and Parallel Database Systems AOL: AOL’s Data Layer Facebook: Hbase in Production  Bank of America: The Business of Big Data StumbleUpon: Mixing Real-Time and Batch Processing Raytheon: SHARD: Storing and Querying Large-Scale Data More info at - http://www.cloudera.com/company/press-center/hadoop-world-nyc/
Agenda Big Data  Big Data in Enterprise Informatica & Data Informatica & Big Data
Informatica – Our Singular Mission Enabling The Information Economy     We enable organizations                to gain a competitive advantage from all their information assetsto drive their                   top business imperatives
Informatica – What We DoComprehensive, Unified, Open and Economical platform Application Partner Data SWIFT NACHA HIPAA … Cloud Computing Unstructured Database Complex Event Processing Data  Warehouse Data Migration Test Data Management & Archiving Master Data Management Data  Synchronization B2B Data Exchange Data Consolidation UltraMessaging
Informatica & Data Verbs on Data – We do things to data! INFA = Data + [  Archival | As a Service | Cleansing | Clustering | Consolidation |  Conversion | De-duping | Exchange | Extraction | Federation |  Hub | Identity | Integration | Life-cycle Management |  Loading | Masking | Mastering | Matching | Migration | On Demand |  Privacy | Profiling | Provisioning | Quality | Quality Assessment |  Registry | Replication | Retirement | Services | Stewardship |  Sub-setting | Synchronization | Test Management | Transformation |  Validation | Virtualization | Warehousing| ]
Informatica & Big Data HDFS as a source and a target - Enable universal data connectivity for Hadoop developers Enable Hadoop developers to leverage prebuilt Data Transformation and Data Quality logic  Lower the barrier to Hadoop-entry by using Informatica Developer as a development tool Support virtualized access to data split across HDFS and (relational) data-warehouses
Informatica & Hadoop – Big Picture Enterprise  Connectivity  for  Hadoop programs Weblogs Databases BI DW/DM Metadata Repository Graphical IDE for Hadoop Development Semi-structured Un-structured Enterprise Applications Transformation Engine for custom data processing Hadoop Cluster HDFS Job Tracker HDFS Name Node Data Node HDFS

Más contenido relacionado

La actualidad más candente

Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureLorenzo Nicora
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...Capgemini
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake Pat O'Sullivan
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lakeCapgemini
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Jeffrey T. Pollock
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata Hortonworks
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_SparkMat Keep
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseJeffrey T. Pollock
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...StampedeCon
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesDATAVERSITY
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics ArchitectureArvind Sathi
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationDatabricks
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceJeffrey T. Pollock
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend Jean-Michel Franco
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaCaserta
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisNetAppUK
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseDataWorks Summit
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneySai Paravastu
 

La actualidad más candente (20)

Data Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and FutureData Mesh at CMC Markets: Past, Present and Future
Data Mesh at CMC Markets: Past, Present and Future
 
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
EMC World 2014 Breakout: Move to the Business Data Lake – Not as Hard as It S...
 
IBM Industry Models and Data Lake
IBM Industry Models and Data Lake IBM Industry Models and Data Lake
IBM Industry Models and Data Lake
 
The principles of the business data lake
The principles of the business data lakeThe principles of the business data lake
The principles of the business data lake
 
Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!Klarna Tech Talk - Mind the Data!
Klarna Tech Talk - Mind the Data!
 
Stream based Data Integration
Stream based Data IntegrationStream based Data Integration
Stream based Data Integration
 
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata The Value of the Modern Data Architecture with Apache Hadoop and Teradata
The Value of the Modern Data Architecture with Apache Hadoop and Teradata
 
MongoDB_Spark
MongoDB_SparkMongoDB_Spark
MongoDB_Spark
 
Big Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San JoseBig Data at Oracle - Strata 2015 San Jose
Big Data at Oracle - Strata 2015 San Jose
 
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
Enterprise Search: Addressing the First Problem of Big Data & Analytics - Sta...
 
Slides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data LakesSlides: Accelerating Queries on Cloud Data Lakes
Slides: Accelerating Queries on Cloud Data Lakes
 
Big Data & Analytics Architecture
Big Data & Analytics ArchitectureBig Data & Analytics Architecture
Big Data & Analytics Architecture
 
Active Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with AlationActive Governance Across the Delta Lake with Alation
Active Governance Across the Delta Lake with Alation
 
One Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and GovernanceOne Slide Overview: ORCL Big Data Integration and Governance
One Slide Overview: ORCL Big Data Integration and Governance
 
MDM for Customer data with Talend
MDM for Customer data with Talend MDM for Customer data with Talend
MDM for Customer data with Talend
 
Future of data
Future of dataFuture of data
Future of data
 
Data Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with ClouderaData Governance, Compliance and Security in Hadoop with Cloudera
Data Governance, Compliance and Security in Hadoop with Cloudera
 
Exploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis KapsalisExploring the Wider World of Big Data- Vasalis Kapsalis
Exploring the Wider World of Big Data- Vasalis Kapsalis
 
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data WarehouseHybrid Data Architecture: Integrating Hadoop with a Data Warehouse
Hybrid Data Architecture: Integrating Hadoop with a Data Warehouse
 
BAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, SydneyBAR360 open data platform presentation at DAMA, Sydney
BAR360 open data platform presentation at DAMA, Sydney
 

Destacado

India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015Kanwal Prakash Singh
 
Public policy in the ‘big data’ age: Gavin Freeguard introduction
Public policy in the ‘big data’ age: Gavin Freeguard introductionPublic policy in the ‘big data’ age: Gavin Freeguard introduction
Public policy in the ‘big data’ age: Gavin Freeguard introductionYoungPolicyProfessionals
 
Public policy in the ‘big data’ age: Roeland Beerten presentation
Public policy in the ‘big data’ age: Roeland Beerten presentationPublic policy in the ‘big data’ age: Roeland Beerten presentation
Public policy in the ‘big data’ age: Roeland Beerten presentationYoungPolicyProfessionals
 
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big DataBarbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big DataOECD Governance
 
Public policy in the ‘big data’ age: Martin Ralphs presentation
Public policy in the ‘big data’ age: Martin Ralphs presentationPublic policy in the ‘big data’ age: Martin Ralphs presentation
Public policy in the ‘big data’ age: Martin Ralphs presentationYoungPolicyProfessionals
 
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...SAS Institute India Pvt. Ltd
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaarRegunath B
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power CenterEdureka!
 
Towards a big data roadmap for europe
Towards a big data roadmap for europeTowards a big data roadmap for europe
Towards a big data roadmap for europeBIG Project
 
Big Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General PresentationBig Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General PresentationBIG Project
 
The Big Data Opportunity
The Big Data Opportunity The Big Data Opportunity
The Big Data Opportunity EMC
 

Destacado (16)

Big data in India
Big data in IndiaBig data in India
Big data in India
 
India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015India Analytics and Big Data Summit 2015
India Analytics and Big Data Summit 2015
 
Public policy in the ‘big data’ age: Gavin Freeguard introduction
Public policy in the ‘big data’ age: Gavin Freeguard introductionPublic policy in the ‘big data’ age: Gavin Freeguard introduction
Public policy in the ‘big data’ age: Gavin Freeguard introduction
 
Public policy in the ‘big data’ age: Roeland Beerten presentation
Public policy in the ‘big data’ age: Roeland Beerten presentationPublic policy in the ‘big data’ age: Roeland Beerten presentation
Public policy in the ‘big data’ age: Roeland Beerten presentation
 
Big data Analytics opportunities in India
Big data  Analytics opportunities in IndiaBig data  Analytics opportunities in India
Big data Analytics opportunities in India
 
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big DataBarbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
Barbara Ryan @OECD - 21 Sept 2015 - Water Policy in the Age of Big Data
 
Public policy in the ‘big data’ age: Martin Ralphs presentation
Public policy in the ‘big data’ age: Martin Ralphs presentationPublic policy in the ‘big data’ age: Martin Ralphs presentation
Public policy in the ‘big data’ age: Martin Ralphs presentation
 
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
SAS Forum India: Big Data, Big Analytics & Bad Behaviour - Fighting Financial...
 
Hadoop at aadhaar
Hadoop at aadhaarHadoop at aadhaar
Hadoop at aadhaar
 
The data deluge: Five years on
The data deluge: Five years on The data deluge: Five years on
The data deluge: Five years on
 
Big data: Bringing competition policy to the digital era – Background note – ...
Big data: Bringing competition policy to the digital era – Background note – ...Big data: Bringing competition policy to the digital era – Background note – ...
Big data: Bringing competition policy to the digital era – Background note – ...
 
Big data: Bringing competition policy to the digital era – STUCKE – November ...
Big data: Bringing competition policy to the digital era – STUCKE – November ...Big data: Bringing competition policy to the digital era – STUCKE – November ...
Big data: Bringing competition policy to the digital era – STUCKE – November ...
 
ETL Using Informatica Power Center
ETL Using Informatica Power CenterETL Using Informatica Power Center
ETL Using Informatica Power Center
 
Towards a big data roadmap for europe
Towards a big data roadmap for europeTowards a big data roadmap for europe
Towards a big data roadmap for europe
 
Big Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General PresentationBig Data Public-Private Forum_General Presentation
Big Data Public-Private Forum_General Presentation
 
The Big Data Opportunity
The Big Data Opportunity The Big Data Opportunity
The Big Data Opportunity
 

Similar a Informatica & Big Data - Leveraging Hadoop for Enterprise Analytics

SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingAmazon Web Services
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunitiesBigdata Meetup Kochi
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Etu Solution
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Edureka!
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big DataNetApp
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyArthur_Hansen
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...Hortonworks
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureDatabricks
 
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...MongoDB
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJDaniel Madrigal
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceSalesforce Developers
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaEdureka!
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantStuart Miniman
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoptionHortonworks
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big DecisionsInnoTech
 

Similar a Informatica & Big Data - Leveraging Hadoop for Enterprise Analytics (20)

SendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data WarehousingSendGrid Improves Email Delivery with Hybrid Data Warehousing
SendGrid Improves Email Delivery with Hybrid Data Warehousing
 
Big data an elephant business opportunities
Big data an elephant   business opportunitiesBig data an elephant   business opportunities
Big data an elephant business opportunities
 
Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台Track B-1 建構新世代的智慧數據平台
Track B-1 建構新世代的智慧數據平台
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
 
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
Big Data Analytics Tutorial | Big Data Analytics for Beginners | Hadoop Tutor...
 
Exploring the Wider World of Big Data
Exploring the Wider World of Big DataExploring the Wider World of Big Data
Exploring the Wider World of Big Data
 
Big Data in Azure
Big Data in AzureBig Data in Azure
Big Data in Azure
 
Cisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt onlyCisco event 6 05 2014v3 wwt only
Cisco event 6 05 2014v3 wwt only
 
Big Data
Big DataBig Data
Big Data
 
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
C-BAG Big Data Meetup Chennai Oct.29-2014 Hortonworks and Concurrent on Casca...
 
Modernizing to a Cloud Data Architecture
Modernizing to a Cloud Data ArchitectureModernizing to a Cloud Data Architecture
Modernizing to a Cloud Data Architecture
 
S18
S18S18
S18
 
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
MongoDB World 2019: Managing a Heterogeneous Data Stack with Informatica and ...
 
Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks Solving Big Data Problems using Hortonworks
Solving Big Data Problems using Hortonworks
 
IoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJIoT Crash Course Hadoop Summit SJ
IoT Crash Course Hadoop Summit SJ
 
Bringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to SalesforceBringing the Power of Big Data Computation to Salesforce
Bringing the Power of Big Data Computation to Salesforce
 
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | EdurekaHadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
Hadoop Tutorial | What is Hadoop | Hadoop Project on Reddit | Edureka
 
Big data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You WantBig data? No. Big Decisions are What You Want
Big data? No. Big Decisions are What You Want
 
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption2015 02 12 talend hortonworks webinar challenges to hadoop adoption
2015 02 12 talend hortonworks webinar challenges to hadoop adoption
 
Big Data = Big Decisions
Big Data = Big DecisionsBig Data = Big Decisions
Big Data = Big Decisions
 

Más de Yahoo Developer Network

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaYahoo Developer Network
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Yahoo Developer Network
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanYahoo Developer Network
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Yahoo Developer Network
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathYahoo Developer Network
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuYahoo Developer Network
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolYahoo Developer Network
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Yahoo Developer Network
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Yahoo Developer Network
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathYahoo Developer Network
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Yahoo Developer Network
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathYahoo Developer Network
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsYahoo Developer Network
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Yahoo Developer Network
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondYahoo Developer Network
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Yahoo Developer Network
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...Yahoo Developer Network
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexYahoo Developer Network
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsYahoo Developer Network
 

Más de Yahoo Developer Network (20)

Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon MediaDeveloping Mobile Apps for Performance - Swapnil Patel, Verizon Media
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
 
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
 
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo JapanAthenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
 
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
 
CICD at Oath using Screwdriver
CICD at Oath using ScrewdriverCICD at Oath using Screwdriver
CICD at Oath using Screwdriver
 
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, OathBig Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
 
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenuHow @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
 
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, AmpoolThe Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
 
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
 
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
 
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, OathHDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
 
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
 
Moving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, OathMoving the Oath Grid to Docker, Eric Badger, Oath
Moving the Oath Grid to Docker, Eric Badger, Oath
 
Architecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI ApplicationsArchitecting Petabyte Scale AI Applications
Architecting Petabyte Scale AI Applications
 
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
 
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step BeyondJun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: YARN Scheduling – A Step Beyond
 
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
 
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
 
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache ApexFebruary 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
 
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data AnalyticsFebruary 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
 

Informatica & Big Data - Leveraging Hadoop for Enterprise Analytics

  • 1. Informatica & Big Data Sanjeev Kumar VP & MD, Informatica India Apache Hadoop India Summit 2011
  • 2. Agenda Big Data Big Data in Enterprise Informatica & Data Informatica & Big Data
  • 3.
  • 4. Digital universe grew by 62% last year to 800K petabytes and will grow to 1.2 “zettabytes” this yearSource: An IDC White Paper - sponsored by EMC. As the Economy Contracts, the Digital Universe Expands. May 2009. .
  • 5. Why Now? Exploding Data Volumes Explosion in user-generated content e.g. Blogs, Twitter, Facebook etc. Proliferation of web-connected devices Smartphone interactions with the web Increased consumption of digital content Netflix, HULU, Pandora etc. Internet of things Smart-grid and smart-meters Machine-generated data via the web
  • 6. Why Now? : New Apps/Use-cases Analyze customer/market sentiment Text analytics on Social Media, blogs Achieve Operational Efficiency e.g. Analyze CDRs to optimize cell tower placements Make Recommendations Data mining on click-stream, purchase history Predict the future e.g. Flightcast predicts flight delays
  • 7. Big Data Challenges Storage Cost-effective Scalability: to multi-terabytes and petabytes Non-traditional data models: complex, semi-structured data Processing Data mining, collaborative filtering for structured data Text Analytics, classification etc. for unstructured data Regulatory Compliance Data Privacy / Masking Data Archival
  • 8. Addressing Big Data Challenges Storage Parallel Databases Greenplum(EMC), Vertica, AsterData Distributed Key/Value Stores Hbase, Google’s BigTable, Amazon’s SimpleDB Distributed File Systems HDFS, GFS, ParAccel Analytics SQL with extensions Map Reduce DataFlow Languages : PIG, Sawzall etc
  • 9. Hadoop Technology Stack Pig Hive Cascading ZooKeeper Map/Reduce HBase HDFS
  • 10. Hadoop Momentum Job Trends from Indeed.com Search Volume Index News Reference Volume
  • 11. Big Data in the Enterprise – Hadoop Usage
  • 12. Big Data in the EnterpriseCase Studies: Hadoop World 2009 Yahoo!: Social Graph Analysis VISA: Large Scale Transaction Analysis China Mobile: Data Mining Platform for Telecom Industry JP Morgan Chase: Data Processing for Financial Services eHarmony: Matchmaking in the Hadoop Cloud Rackspace: Cross Data Center Log Processing Visible Technologies: Real-Time Business Intelligence Booz Allen Hamilton: Protein Alignment using Hadoop Slides and Videos at http://www.cloudera.com/hadoop-world-nyc
  • 13. Big Data in the EnterpriseCase Studies: Hadoop World 2010 eBay: Hadoop at eBay Twitter: The Hadoop Ecosystem at Twitter General Electric: Sentiment Analysis powered by Hadoop Yale University: MapReduce and Parallel Database Systems AOL: AOL’s Data Layer Facebook: Hbase in Production Bank of America: The Business of Big Data StumbleUpon: Mixing Real-Time and Batch Processing Raytheon: SHARD: Storing and Querying Large-Scale Data More info at - http://www.cloudera.com/company/press-center/hadoop-world-nyc/
  • 14. Agenda Big Data Big Data in Enterprise Informatica & Data Informatica & Big Data
  • 15. Informatica – Our Singular Mission Enabling The Information Economy We enable organizations to gain a competitive advantage from all their information assetsto drive their top business imperatives
  • 16. Informatica – What We DoComprehensive, Unified, Open and Economical platform Application Partner Data SWIFT NACHA HIPAA … Cloud Computing Unstructured Database Complex Event Processing Data Warehouse Data Migration Test Data Management & Archiving Master Data Management Data Synchronization B2B Data Exchange Data Consolidation UltraMessaging
  • 17. Informatica & Data Verbs on Data – We do things to data! INFA = Data + [ Archival | As a Service | Cleansing | Clustering | Consolidation | Conversion | De-duping | Exchange | Extraction | Federation | Hub | Identity | Integration | Life-cycle Management | Loading | Masking | Mastering | Matching | Migration | On Demand | Privacy | Profiling | Provisioning | Quality | Quality Assessment | Registry | Replication | Retirement | Services | Stewardship | Sub-setting | Synchronization | Test Management | Transformation | Validation | Virtualization | Warehousing| ]
  • 18. Informatica & Big Data HDFS as a source and a target - Enable universal data connectivity for Hadoop developers Enable Hadoop developers to leverage prebuilt Data Transformation and Data Quality logic Lower the barrier to Hadoop-entry by using Informatica Developer as a development tool Support virtualized access to data split across HDFS and (relational) data-warehouses
  • 19. Informatica & Hadoop – Big Picture Enterprise Connectivity for Hadoop programs Weblogs Databases BI DW/DM Metadata Repository Graphical IDE for Hadoop Development Semi-structured Un-structured Enterprise Applications Transformation Engine for custom data processing Hadoop Cluster HDFS Job Tracker HDFS Name Node Data Node HDFS

Notas del editor

  1. Map/Reduce implementationApache Open Source Project : Yahoo dominatedTwo major componentsHDFSFailure Resilient Distributed File SystemsMap/ReduceFailure Resilient Distributed Computing FrameworkScales to thousand+ node clusterUsed by Yahoo, Facebook etc