SlideShare una empresa de Scribd logo
1 de 33
Welcome to Our June Meeting
June 13, 2013 1
• SCQAA-SF (www.scqaa.net) chapter sponsors the
sharing of information to promote and encourage the
improvement in information technology quality practices
and principles through networking, training and
professional development.
• Networking: We meet once in 2 months in San Fernando
Valley.
• Check us out on LinkedIn (SCQAA-SF)
• Contact Sujit at sujit58@gmail.com or call 818-878-0834
About SCQAA-SF- A Not-for Profit
Organization
June 13, 2013 2
Membership Benefits:
• Excellent speaker presentations on
advancements in technology and
methodology
• Networking opportunities
• PDU, CSTE and CSQA credits
• Regular meetings are free for members
and include dinner
June 13, 2013 3
Membership Policy
• Recently revised our membership dues
policy to better accommodate member
needs and current economic conditions.
• Annual membership is $50, or $35 for
those who are in between jobs.
• Please check your renewal with Cheryl
Leoni. If you have recently joined or
renewed, please check before renewing
again
June 13, 2013 4
Sunil Sabat
Data Practitioner, Scientist, Architect
Insights to Big Data and Quality
“
Ref; Jan 2012- for SoCalCodeCamp
Agenda
• Big Data and modern data management
• Old BI and New BI
• Hadoop Frameworks
• Big Data Quality – Hybrid Approach
• Big Data Processing - ETL
• Examples of Hadoop ETL/QA
• Big Data QA ToDo
• Q/A
Big Data
• Today, useful data is 80% unstructured and
20% structured data
• Not easy to build old style warehouses, very
expensive to build and maintain
• Today, business need is real time and
actionable insight driven
• Big Data features volume, variety, velocity and
veracity
• Fact - Business need actionable intelligence to
succeed
Modern Data Management Hub
Obama Election and Big Data
• “The Obama campaign found a way to integrate social media, technology, email
databases, fundraising databases and consumer market data,” said GOP digital
strategist Vincent Harris, who did digital work for Newt Gingrich and Rick Perry in
2012. “That does not exist on the Republican side to that degree”, to the
detriment of Mitt Romney’s campaign, quoted by Politico, “GOP seeks to up its
online game”, December 8, 2012. For more on how the Obama campaign used big
data, see BusinessWeek’s November 29, 2012 article “The Science Behind Those
Obama Campaign Emails”.
BI = ‘Current State’ Questions
•What did we sell?
•When did we sell it?
•Where did we sell it?
•What did we sell with it?
Collecting
Transactional
data
BigData = ‘Next State’ BI
Questions
• What could happen?
• Why didn’t this happen?
• When will the next new thing
happen?
• What will the next new thing be?
• What should happen?
Collecting
behavioral
temporal
data
Comparing old and new BI data
Old BI data New BI data
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
DBA Ratio 1:40 1:3000
Reference: Tom White’s Hadoop: The Definitive Guide
Deeper Comparison Chart
Is Data Science your next Career?
R-Language
Hadoop –
MapR,HortonWorks, Cloudera,IB
M, Apache….
Oracle Loader for Hadoop
SQL Server Connector for
Hadoop
Hadoop on Azure
Amazon AWS
Google App Engine Data
Google – MySQL & Cloud Storage
Big Data QA Process
• Hybrid approach - can use traditional perl like
scripting, tools , Junit tests on destination side
• Use Hadoop jobs to refine and do ETL for
unstructured data at source side
• Improve upstream QA process to do most of
ETL/QA at source
• Leverage Hadoop infrastructure to do mining
• Fact – Big Data QA window is getting smaller
Microsoft SSIS - Hadoop ETL
• Use ODBC driver to extract data from any
Hadoop HDFS
• Use HDInsight ( Microsoft Hadoop ) as data
store
• Use SSIS for ETL
• Source lookups from Melissa Data and others
• Load to SQL Server
Reference URL :
http://sqlmag.com/blog/use-ssis-etl-hadoop
Amazon EMR - Hadoop ETL
• Design and code a JOB on Amazon AWS using
EMR (elastic map reduce )
• Source lookups from Melissa Data and others
• Run the job to do ETL
• Read and write to S3 buckets
• Use open source Pig/Latin, Java UDFs for ETL
Reference URL :
http://docs.aws.amazon.com/ElasticMapReduc
e/latest/DeveloperGuide/emr-etl.html
Google – Freebase & Refine
Karmasphere Studio
for Amazon Elastic MapReduce
Hadoop Connector to Excel
BI >BigData QA ‘To Do List
Get trained and Store some (more) data on the cloud
• Relational and non-relational
Process some data in the cloud
• Do ETL , QA
• Try data mining
• Learn about Data Science
Update your client tools
• New UI (touch, gestures)
• Click to Query
• New form factors (phone, tablet)
Keep Up With Big Data QA
• Learn Big Data Now ( NRIT is a bootcamp training
provider), Learn to write ETL/QA jobs, Query HDFS using
ODBC
• Assume source data is not clean, do upstream ETL and QA by
lookups, reference data sets
• Fact - Hadoop is being used by most of fortune 500
companies now for fast analytics and insights
• Fact - Investment in Hadoop is dependent on BI/analytics in
the end – Obama Election
• FACT - QA matters, garbage in – garbage out is still TRUE!
Questions?
Please contact NRIT at www.nritinc.com or
sunil.sabat@gmail.com
Available on LinkedIn and Twitter ( @ssabat)
NRIT Big Data Architecture
NRIT and BIG DATA BI

Más contenido relacionado

La actualidad más candente

Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for BeginnersMichael Perez
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementationSandip Tipayle Patil
 
Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data AnalyticsVijay Rao
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges Experian_US
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data PresentationMatthew Urdan
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data DATAVERSITY
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A reviewShilpa Soi
 
Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it worksNadeem Tahir
 
Latest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaLatest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaHeru Sutadi
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICSNAGARAJAGIDDE
 

La actualidad más candente (20)

L18 Big Data and Analytics
L18 Big Data and AnalyticsL18 Big Data and Analytics
L18 Big Data and Analytics
 
Big Data for Beginners
Big Data for BeginnersBig Data for Beginners
Big Data for Beginners
 
Data mining with big data implementation
Data mining with big data implementationData mining with big data implementation
Data mining with big data implementation
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Business case for Big Data Analytics
Business case for Big Data AnalyticsBusiness case for Big Data Analytics
Business case for Big Data Analytics
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Presentation on Big Data
Presentation on Big DataPresentation on Big Data
Presentation on Big Data
 
Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges   Trends in Big Data & Business Challenges
Trends in Big Data & Business Challenges
 
Big data
Big dataBig data
Big data
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Team 2 Big Data Presentation
Team 2 Big Data PresentationTeam 2 Big Data Presentation
Team 2 Big Data Presentation
 
Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data Data-Ed Webinar: Demystifying Big Data
Data-Ed Webinar: Demystifying Big Data
 
BIG Data and Methodology-A review
BIG Data and Methodology-A reviewBIG Data and Methodology-A review
BIG Data and Methodology-A review
 
Big Data Presentation
Big  Data PresentationBig  Data Presentation
Big Data Presentation
 
Intro to big data and how it works
Intro to big data and how it worksIntro to big data and how it works
Intro to big data and how it works
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Big data
Big dataBig data
Big data
 
Latest Update Bigdata in indonesia
Latest Update Bigdata in indonesiaLatest Update Bigdata in indonesia
Latest Update Bigdata in indonesia
 
BIG DATA and USE CASES
BIG DATA and USE CASESBIG DATA and USE CASES
BIG DATA and USE CASES
 
BIG DATA & DATA ANALYTICS
BIG  DATA & DATA  ANALYTICSBIG  DATA & DATA  ANALYTICS
BIG DATA & DATA ANALYTICS
 

Similar a Big Data Presentation at SCQAA-SF on June 12 2013

Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with MicrosoftCaserta
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsCaserta
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteCaserta
 
What_BigData_means_to_your_organization
What_BigData_means_to_your_organizationWhat_BigData_means_to_your_organization
What_BigData_means_to_your_organizationAttila Barta
 
Business Intelligence is more than just pretty visuals
Business Intelligence is more than just pretty visualsBusiness Intelligence is more than just pretty visuals
Business Intelligence is more than just pretty visualsVincent Woon
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big DataJames Serra
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeCaserta
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and InnovationCaserta
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...DATAVERSITY
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science OverviewDavide Mauri
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Caserta
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigManish Chopra
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataPrakalp Agarwal
 

Similar a Big Data Presentation at SCQAA-SF on June 12 2013 (20)

Big Data Analytics with Microsoft
Big Data Analytics with MicrosoftBig Data Analytics with Microsoft
Big Data Analytics with Microsoft
 
Architecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment OptionsArchitecting for Big Data: Trends, Tips, and Deployment Options
Architecting for Big Data: Trends, Tips, and Deployment Options
 
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing KeynoteArchitecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
Architecting Data For The Modern Enterprise - Data Summit 2017, Closing Keynote
 
Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!Big Data + PeopleSoft = BIG WIN!
Big Data + PeopleSoft = BIG WIN!
 
What_BigData_means_to_your_organization
What_BigData_means_to_your_organizationWhat_BigData_means_to_your_organization
What_BigData_means_to_your_organization
 
Business Intelligence is more than just pretty visuals
Business Intelligence is more than just pretty visualsBusiness Intelligence is more than just pretty visuals
Business Intelligence is more than just pretty visuals
 
Finding business value in Big Data
Finding business value in Big DataFinding business value in Big Data
Finding business value in Big Data
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Big Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data LakeBig Data: Setting Up the Big Data Lake
Big Data: Setting Up the Big Data Lake
 
Balancing Data Governance and Innovation
Balancing Data Governance and InnovationBalancing Data Governance and Innovation
Balancing Data Governance and Innovation
 
ExistBI Data Integration Consulting Case Study
ExistBI Data Integration Consulting Case StudyExistBI Data Integration Consulting Case Study
ExistBI Data Integration Consulting Case Study
 
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
Self-Service Data Analysis, Data Wrangling, Data Munging, and Data Modeling –...
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics Building a New Platform for Customer Analytics
Building a New Platform for Customer Analytics
 
Big-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-KoenigBig-Data-Seminar-6-Aug-2014-Koenig
Big-Data-Seminar-6-Aug-2014-Koenig
 
The Power of Data
The Power of DataThe Power of Data
The Power of Data
 
Oh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG DataOh! Session on Introduction to BIG Data
Oh! Session on Introduction to BIG Data
 
Intro big data analytics
Intro big data analyticsIntro big data analytics
Intro big data analytics
 

Más de Sujit Ghosh

Aviana presentation
Aviana presentationAviana presentation
Aviana presentationSujit Ghosh
 
QA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationQA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationSujit Ghosh
 
3 S GBS JDE Capabilities
3 S GBS JDE Capabilities3 S GBS JDE Capabilities
3 S GBS JDE CapabilitiesSujit Ghosh
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 Sujit Ghosh
 
How to handle challenginng stakeholder
How to handle challenginng stakeholderHow to handle challenginng stakeholder
How to handle challenginng stakeholderSujit Ghosh
 
3 s glbal presentation on unicode development
3 s glbal presentation on unicode development3 s glbal presentation on unicode development
3 s glbal presentation on unicode developmentSujit Ghosh
 
Sit future of_the_desktop
Sit future of_the_desktopSit future of_the_desktop
Sit future of_the_desktopSujit Ghosh
 
How do you know bp improvements scqaa
How do you know  bp improvements scqaaHow do you know  bp improvements scqaa
How do you know bp improvements scqaaSujit Ghosh
 
How to Do Gmail Merge
How to Do Gmail MergeHow to Do Gmail Merge
How to Do Gmail MergeSujit Ghosh
 
Cognos BI Training Orientation
Cognos BI Training Orientation Cognos BI Training Orientation
Cognos BI Training Orientation Sujit Ghosh
 
Mixed Model Management:Manage Projects and Not Tasks
Mixed Model Management:Manage Projects and Not TasksMixed Model Management:Manage Projects and Not Tasks
Mixed Model Management:Manage Projects and Not TasksSujit Ghosh
 
Big data webinar may23 nrit by sunil
Big data webinar may23 nrit by sunilBig data webinar may23 nrit by sunil
Big data webinar may23 nrit by sunilSujit Ghosh
 
SCQAA-SF Selenium Presentation
SCQAA-SF Selenium  PresentationSCQAA-SF Selenium  Presentation
SCQAA-SF Selenium PresentationSujit Ghosh
 
Presentation by Kiho Sohn
Presentation by Kiho SohnPresentation by Kiho Sohn
Presentation by Kiho SohnSujit Ghosh
 
How To Convince A Skeptic
How To Convince A SkepticHow To Convince A Skeptic
How To Convince A SkepticSujit Ghosh
 
Mobile Cross Platform
Mobile Cross PlatformMobile Cross Platform
Mobile Cross PlatformSujit Ghosh
 
Unconventional Risks Presented by Synergy Assoc
Unconventional Risks Presented by Synergy AssocUnconventional Risks Presented by Synergy Assoc
Unconventional Risks Presented by Synergy AssocSujit Ghosh
 
Harness The Power Of Social Media
Harness The Power Of Social MediaHarness The Power Of Social Media
Harness The Power Of Social MediaSujit Ghosh
 

Más de Sujit Ghosh (19)

Aviana presentation
Aviana presentationAviana presentation
Aviana presentation
 
QA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integrationQA Team Goes to Agile and Continuous integration
QA Team Goes to Agile and Continuous integration
 
3 S GBS JDE Capabilities
3 S GBS JDE Capabilities3 S GBS JDE Capabilities
3 S GBS JDE Capabilities
 
SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014 SCQAA-SF Meeting on May 21 2014
SCQAA-SF Meeting on May 21 2014
 
How to handle challenginng stakeholder
How to handle challenginng stakeholderHow to handle challenginng stakeholder
How to handle challenginng stakeholder
 
3 s glbal presentation on unicode development
3 s glbal presentation on unicode development3 s glbal presentation on unicode development
3 s glbal presentation on unicode development
 
Sit future of_the_desktop
Sit future of_the_desktopSit future of_the_desktop
Sit future of_the_desktop
 
How do you know bp improvements scqaa
How do you know  bp improvements scqaaHow do you know  bp improvements scqaa
How do you know bp improvements scqaa
 
How to Do Gmail Merge
How to Do Gmail MergeHow to Do Gmail Merge
How to Do Gmail Merge
 
Cognos BI Training Orientation
Cognos BI Training Orientation Cognos BI Training Orientation
Cognos BI Training Orientation
 
Mixed Model Management:Manage Projects and Not Tasks
Mixed Model Management:Manage Projects and Not TasksMixed Model Management:Manage Projects and Not Tasks
Mixed Model Management:Manage Projects and Not Tasks
 
Big data webinar may23 nrit by sunil
Big data webinar may23 nrit by sunilBig data webinar may23 nrit by sunil
Big data webinar may23 nrit by sunil
 
SCQAA-SF Selenium Presentation
SCQAA-SF Selenium  PresentationSCQAA-SF Selenium  Presentation
SCQAA-SF Selenium Presentation
 
Presentation by Kiho Sohn
Presentation by Kiho SohnPresentation by Kiho Sohn
Presentation by Kiho Sohn
 
How To Convince A Skeptic
How To Convince A SkepticHow To Convince A Skeptic
How To Convince A Skeptic
 
Mobile Cross Platform
Mobile Cross PlatformMobile Cross Platform
Mobile Cross Platform
 
Attitude
AttitudeAttitude
Attitude
 
Unconventional Risks Presented by Synergy Assoc
Unconventional Risks Presented by Synergy AssocUnconventional Risks Presented by Synergy Assoc
Unconventional Risks Presented by Synergy Assoc
 
Harness The Power Of Social Media
Harness The Power Of Social MediaHarness The Power Of Social Media
Harness The Power Of Social Media
 

Último

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 

Último (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 

Big Data Presentation at SCQAA-SF on June 12 2013

  • 1. Welcome to Our June Meeting June 13, 2013 1
  • 2. • SCQAA-SF (www.scqaa.net) chapter sponsors the sharing of information to promote and encourage the improvement in information technology quality practices and principles through networking, training and professional development. • Networking: We meet once in 2 months in San Fernando Valley. • Check us out on LinkedIn (SCQAA-SF) • Contact Sujit at sujit58@gmail.com or call 818-878-0834 About SCQAA-SF- A Not-for Profit Organization June 13, 2013 2
  • 3. Membership Benefits: • Excellent speaker presentations on advancements in technology and methodology • Networking opportunities • PDU, CSTE and CSQA credits • Regular meetings are free for members and include dinner June 13, 2013 3
  • 4. Membership Policy • Recently revised our membership dues policy to better accommodate member needs and current economic conditions. • Annual membership is $50, or $35 for those who are in between jobs. • Please check your renewal with Cheryl Leoni. If you have recently joined or renewed, please check before renewing again June 13, 2013 4
  • 5. Sunil Sabat Data Practitioner, Scientist, Architect Insights to Big Data and Quality “ Ref; Jan 2012- for SoCalCodeCamp
  • 6. Agenda • Big Data and modern data management • Old BI and New BI • Hadoop Frameworks • Big Data Quality – Hybrid Approach • Big Data Processing - ETL • Examples of Hadoop ETL/QA • Big Data QA ToDo • Q/A
  • 7. Big Data • Today, useful data is 80% unstructured and 20% structured data • Not easy to build old style warehouses, very expensive to build and maintain • Today, business need is real time and actionable insight driven • Big Data features volume, variety, velocity and veracity • Fact - Business need actionable intelligence to succeed
  • 9. Obama Election and Big Data • “The Obama campaign found a way to integrate social media, technology, email databases, fundraising databases and consumer market data,” said GOP digital strategist Vincent Harris, who did digital work for Newt Gingrich and Rick Perry in 2012. “That does not exist on the Republican side to that degree”, to the detriment of Mitt Romney’s campaign, quoted by Politico, “GOP seeks to up its online game”, December 8, 2012. For more on how the Obama campaign used big data, see BusinessWeek’s November 29, 2012 article “The Science Behind Those Obama Campaign Emails”.
  • 10. BI = ‘Current State’ Questions •What did we sell? •When did we sell it? •Where did we sell it? •What did we sell with it? Collecting Transactional data
  • 11. BigData = ‘Next State’ BI Questions • What could happen? • Why didn’t this happen? • When will the next new thing happen? • What will the next new thing be? • What should happen? Collecting behavioral temporal data
  • 12. Comparing old and new BI data Old BI data New BI data Data Size Gigabytes (Terabytes) Petabytes (Hexabytes) Access Interactive and Batch Batch Updates Read / Write many times Write once, Read many times Structure Static Schema Dynamic Schema Integrity High (ACID) Low Scaling Nonlinear Linear DBA Ratio 1:40 1:3000 Reference: Tom White’s Hadoop: The Definitive Guide
  • 14. Is Data Science your next Career?
  • 18. SQL Server Connector for Hadoop
  • 22. Google – MySQL & Cloud Storage
  • 23. Big Data QA Process • Hybrid approach - can use traditional perl like scripting, tools , Junit tests on destination side • Use Hadoop jobs to refine and do ETL for unstructured data at source side • Improve upstream QA process to do most of ETL/QA at source • Leverage Hadoop infrastructure to do mining • Fact – Big Data QA window is getting smaller
  • 24. Microsoft SSIS - Hadoop ETL • Use ODBC driver to extract data from any Hadoop HDFS • Use HDInsight ( Microsoft Hadoop ) as data store • Use SSIS for ETL • Source lookups from Melissa Data and others • Load to SQL Server Reference URL : http://sqlmag.com/blog/use-ssis-etl-hadoop
  • 25. Amazon EMR - Hadoop ETL • Design and code a JOB on Amazon AWS using EMR (elastic map reduce ) • Source lookups from Melissa Data and others • Run the job to do ETL • Read and write to S3 buckets • Use open source Pig/Latin, Java UDFs for ETL Reference URL : http://docs.aws.amazon.com/ElasticMapReduc e/latest/DeveloperGuide/emr-etl.html
  • 27. Karmasphere Studio for Amazon Elastic MapReduce
  • 29. BI >BigData QA ‘To Do List Get trained and Store some (more) data on the cloud • Relational and non-relational Process some data in the cloud • Do ETL , QA • Try data mining • Learn about Data Science Update your client tools • New UI (touch, gestures) • Click to Query • New form factors (phone, tablet)
  • 30. Keep Up With Big Data QA • Learn Big Data Now ( NRIT is a bootcamp training provider), Learn to write ETL/QA jobs, Query HDFS using ODBC • Assume source data is not clean, do upstream ETL and QA by lookups, reference data sets • Fact - Hadoop is being used by most of fortune 500 companies now for fast analytics and insights • Fact - Investment in Hadoop is dependent on BI/analytics in the end – Obama Election • FACT - QA matters, garbage in – garbage out is still TRUE!
  • 31. Questions? Please contact NRIT at www.nritinc.com or sunil.sabat@gmail.com Available on LinkedIn and Twitter ( @ssabat)
  • 32. NRIT Big Data Architecture
  • 33. NRIT and BIG DATA BI

Notas del editor

  1. Presentation: BI/Big Data Futures - Is it really all about the Cloud?In this survey session, SKS will bring you up-to-date on what's happening in the world of enterprise Business Intelligence.  BigData, NoSQL, Hadoop, Big Analytics, Cloud Storage, what does all of this mean to you as a data professional?  Which products and technologies are mature enough for enterprise adoption and which ones are not?  Which vendors should you be trying out and why? What is the reality of hosting enterprise data on the cloud? What are the business reasons to explore these new technologies?  How do you learn to implement them?SKS frames this talk with the three major trends that she sees in the Enterprise BI space, highlighting products and technologies that warrant a deeper look.  
  2. From the blog - http://www.thisisthegreenroom.com/2011/data-science-vs-business-intelligence/
  3. http://www.romymisra.com/the-new-job-market-rulers-data-scientists/
  4. http://www.r-project.org/
  5. http://hortonworks.com/technology/hortonworksdataplatform/http://www.cloudera.com/
  6. http://www.oracle.com/technetwork/bdc/hadoop-loader/overview/index.html
  7. http://www.microsoft.com/download/en/details.aspx?id=27584
  8. https://www.hadooponazure.com/Account
  9. http://aws.amazon.com/
  10. http://code.google.com/appengine/http://code.google.com/appengine/articles/datastore/overview.html
  11. http://code.google.com
  12. http://www.freebase.com/http://code.google.com/p/google-refine/
  13. http://www.youtube.com/watch?v=gjsMDAcI1Mo
  14. http://dennyglee.com/