SlideShare una empresa de Scribd logo
1 de 21
Nathan Kohn
BU MET
enzyme@bu.edu
Thinking Big in Small Spaces
One Hadoop Two Hadoop
(Big Data & 21st Century Analytics in the Classroom)
Stanislav Seltser
BU MET
sseltser@bu.edu
Mar 7, 2014 2
Big Data is Everywhere
72 Hours a Minute
YouTube
28 Million
Wikipedia Pages
900 Million
Facebook Users
6 Billion
Flickr Photos
2
“… data a new class of economic asset,
like currency or gold.”
“…growing at 50 percent a year…”
Mar 7, 2014 3
How will we
design and implement
Big learning systems?
Big Learning
3
GPUs Multicore Clusters Clouds Supercomputers
Mar 7, 2014 4
Graphs are Everywhere
User
Movie
Netflix
Collaborative Filtering
Docs
Words
Wiki
Text Analysis
Social Network
Probabilistic Analysis
4
Mar 7, 2014 6
Big Data & Linear Regression
Mar 7, 2014 7
Stochastic Gradient Descent
Mar 7, 2014 8
Serial vs Parallel SGD
Mar 7, 2014 9
Big Data Landscape –Apps,
Infrastructure, Data Semantics
Mar 7, 2014 10
Landscape
Mar 7, 2014 11
Grad Student Response #1
How Big is Big? How is BigData measured?
As per my understanding, the term big data doesn’t refer directly to the size of the
data itself. What the term might mean is that the demand of data
(storage/transfer/analysis) has surpassed several parameters that the relational
databases cannot control (or handle) –too big to handle--.
How is it measure, I really don’t know. Server storage keeps increasing and
increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be
keeping up with it, but then again I don’t know exactly what measure is being used.
Is Big Data relevant to you professionally?
Indeed it is, even though I am not using it or practicing it daily.
I am really interested in learning it.
Is Big Data relevant to you personally?
Very relevant, and it is a topic that drove me into pursuing a master’s degree
Mar 7, 2014 12
Grad Student Response #2
How Big is Big? How is BigData measured?
Big data is a term for large data sets that are too complex to compute by traditional
data management processes and tools. Its points and data types are dependent and
measured by the parameters set forth by each organization.
Where does BigData come from?
Big data can come from various sources that can be categorized as internal or external
contributors.
What is BigData good for?
BigData is good for complex and large data sets that exist within a relational databases
and may require object-oriented programming.
Would you like to see Big Data incorporated in your courses?
Yes, I think that we exist in a period in which we are inundated by social media,
numbers, photographs and other forms of data which require us to be well versed in
the storage, maintenance, and interface design so that we are better able to parse
through the Big Data that we encounter on a daily basis.
Mar 7, 2014 13
Undergrad Student #1
Is Big Data relevant to you personally?
Yes. As my current major is Business Application Development, I can see myself
gaining a lot of opportunities to deal with not only the technologies of building up
user interface in the future but also the technologies of storing user information,
and the techniques used to understand those data could be another opportunity for
the business
Would you like to see Big Data incorporated in your courses?
Yes. I would like to see our course includes some of the techniques that the
corporates use nowadays to understand the relation between their data and the
problems they need to address, such as how they decide which part of the their big
data provides them with the most helpful information for their problem, and explain
the meaning of their data analysis based on the result, such as how they can decide
the result is accurate and meaningful enough to allow them to take an action.
Do you have any questions about Big Data?
Big data is a pretty interesting and useful topic. It will be nice to have more
background information to help our understanding.
Mar 7, 2014 14
Undergrad Student #2
How Big is Big? How is BigData measured?
The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at
that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper
languages) data on magnitudes that were impossible before. Instead of just a phone book type of data,
people can gather every relevant or even possibly relevant piece of information about anything (often
but not limited to customers of a business). I have read articles about how some companies (credit card
mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves.
Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and
deviations from those habits.
While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big
data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the
search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes
aren't being entered by hand).
Classes or lectures on big data should come away with some practical knowledge on the subject,
otherwise we're just applying a name to something people generally understand: organizations collect
and analyze as much data as they can, and recent technology has made that amount of data
staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is
generally more important.
Mar 7, 2014 15
Student Response #4
How Big is Big? How is BigData measured?
Big data is a term developed recently to describe the trend of exponentially
increasing amount of data stored by organizations for business uses. Very often
these big data might be extremely big, such as 16 petabytes. These data is measured
by the memory space they occupy. Thus, a 16 petabytes of big data approximately
occupies 1015 bytes of memory.
Where does BigData come from?
Big Data could come from different sources, such as emails, social-networking sites,
sensors on the webs, sensors installed on other tracking devices, or line of business
applications.
Is Big Data relevant to you professionally?
Yes. In my previous work as market researcher, we always needed to gather
information and analyzed them for the business decision making. The technologies
of gathering big data and the techniques used to analyze and filter data is also
considered extremely helpful for the career.
Mar 7, 2014 16
Data Warehouse Course
Student Comments:
Very informative, content-rich course, covers the latest technologies, trends, and
skills of data warehousing and data management, and data analysis. I would
recommend to include this course in the required courses for the MS in CIS
with concentration in Database Management and BI Program.
Relevance to job opportunities and cutting edge technologies.
This is probably the most useful course I have taken at Boston University. I have
used every bit of what this professor taught every night at work. I have made
contribution to my employer, a data mining company in ways that had
never been done before as a result of this course. I have for the first time in
my 8 years career planned, designed, and augmented a Data Warehouse from
scratch. I have configured an analysis server and reported using MD x queries.
This professor has been helpful in many ways. He has guided me through
some Data Warehouse design projects at work. Moreover, he has been
available to work with me and others after class and on week days.
Mar 7, 2014 17
Road map
to help archaeologists find answers to questions hidden in
thousands of images and text files generated from field
sites around the world:
Professor Mark Eramian et al. have been awarded
$548,000 through the Digging into Data
Challenge, National Endowment for the Humanities
A
Archeology
Recently, a researcher wanted to ascertain whether a
search against GQ-Pat could provide novel insight into
his work related to a specific gene, the cAMP
Responsive Element Modulator.
Reporting to the VP of R&D:
Apply data mining and machine learning techniques to
develop better search and content discovery in the field of
patents Invent new ways to index tens of millions of
documents with semantic information
B
Biology
(hint: beer)
Z
Zymurgy
QUIZ ?
Quiz:
Nathan Kohn
BU MET
enzyme@bu.edu
Stanislav Seltser
BU MET
sseltser@bu.edu

Más contenido relacionado

Similar a KOHN.ppt

Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
MuhammadTahiriqbal13
 
Big Data (This paper has some minor issues with the refere.docx
Big Data (This paper has some minor issues with the refere.docxBig Data (This paper has some minor issues with the refere.docx
Big Data (This paper has some minor issues with the refere.docx
hartrobert670
 

Similar a KOHN.ppt (20)

Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
How to start thinking like a data scientist
How to start thinking like a data scientistHow to start thinking like a data scientist
How to start thinking like a data scientist
 
Data Science for Finance Interview.
Data Science for Finance Interview. Data Science for Finance Interview.
Data Science for Finance Interview.
 
Embracing data science
Embracing data scienceEmbracing data science
Embracing data science
 
Building data science teams
Building data science teamsBuilding data science teams
Building data science teams
 
Unit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptxUnit 1 Introduction to Data Analytics .pptx
Unit 1 Introduction to Data Analytics .pptx
 
Untitled document.pdf
Untitled document.pdfUntitled document.pdf
Untitled document.pdf
 
1 UNIT-DSP.pptx
1 UNIT-DSP.pptx1 UNIT-DSP.pptx
1 UNIT-DSP.pptx
 
Big Data why Now and where to?
Big Data why Now and where to?Big Data why Now and where to?
Big Data why Now and where to?
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.Bda assignment can also be used for BDA notes and concept understanding.
Bda assignment can also be used for BDA notes and concept understanding.
 
Big data (word file)
Big data  (word file)Big data  (word file)
Big data (word file)
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
A Primer for a layman about Big Data, Business Analytics and Cloud
A Primer for a layman  about Big Data, Business Analytics and CloudA Primer for a layman  about Big Data, Business Analytics and Cloud
A Primer for a layman about Big Data, Business Analytics and Cloud
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
What is big data
What is big dataWhat is big data
What is big data
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their role
 
A study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websitesA study on web analytics with reference to select sports websites
A study on web analytics with reference to select sports websites
 
Big Data (This paper has some minor issues with the refere.docx
Big Data (This paper has some minor issues with the refere.docxBig Data (This paper has some minor issues with the refere.docx
Big Data (This paper has some minor issues with the refere.docx
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 

KOHN.ppt

  • 1. Nathan Kohn BU MET enzyme@bu.edu Thinking Big in Small Spaces One Hadoop Two Hadoop (Big Data & 21st Century Analytics in the Classroom) Stanislav Seltser BU MET sseltser@bu.edu
  • 2. Mar 7, 2014 2 Big Data is Everywhere 72 Hours a Minute YouTube 28 Million Wikipedia Pages 900 Million Facebook Users 6 Billion Flickr Photos 2 “… data a new class of economic asset, like currency or gold.” “…growing at 50 percent a year…”
  • 3. Mar 7, 2014 3 How will we design and implement Big learning systems? Big Learning 3 GPUs Multicore Clusters Clouds Supercomputers
  • 4. Mar 7, 2014 4 Graphs are Everywhere User Movie Netflix Collaborative Filtering Docs Words Wiki Text Analysis Social Network Probabilistic Analysis 4
  • 5. Mar 7, 2014 6 Big Data & Linear Regression
  • 6. Mar 7, 2014 7 Stochastic Gradient Descent
  • 7. Mar 7, 2014 8 Serial vs Parallel SGD
  • 8. Mar 7, 2014 9 Big Data Landscape –Apps, Infrastructure, Data Semantics
  • 9. Mar 7, 2014 10 Landscape
  • 10. Mar 7, 2014 11 Grad Student Response #1 How Big is Big? How is BigData measured? As per my understanding, the term big data doesn’t refer directly to the size of the data itself. What the term might mean is that the demand of data (storage/transfer/analysis) has surpassed several parameters that the relational databases cannot control (or handle) –too big to handle--. How is it measure, I really don’t know. Server storage keeps increasing and increasing (5TB, 10TB, 50TB, 100TB……) and RBDMS’s like ORACLE seem to be keeping up with it, but then again I don’t know exactly what measure is being used. Is Big Data relevant to you professionally? Indeed it is, even though I am not using it or practicing it daily. I am really interested in learning it. Is Big Data relevant to you personally? Very relevant, and it is a topic that drove me into pursuing a master’s degree
  • 11. Mar 7, 2014 12 Grad Student Response #2 How Big is Big? How is BigData measured? Big data is a term for large data sets that are too complex to compute by traditional data management processes and tools. Its points and data types are dependent and measured by the parameters set forth by each organization. Where does BigData come from? Big data can come from various sources that can be categorized as internal or external contributors. What is BigData good for? BigData is good for complex and large data sets that exist within a relational databases and may require object-oriented programming. Would you like to see Big Data incorporated in your courses? Yes, I think that we exist in a period in which we are inundated by social media, numbers, photographs and other forms of data which require us to be well versed in the storage, maintenance, and interface design so that we are better able to parse through the Big Data that we encounter on a daily basis.
  • 12. Mar 7, 2014 13 Undergrad Student #1 Is Big Data relevant to you personally? Yes. As my current major is Business Application Development, I can see myself gaining a lot of opportunities to deal with not only the technologies of building up user interface in the future but also the technologies of storing user information, and the techniques used to understand those data could be another opportunity for the business Would you like to see Big Data incorporated in your courses? Yes. I would like to see our course includes some of the techniques that the corporates use nowadays to understand the relation between their data and the problems they need to address, such as how they decide which part of the their big data provides them with the most helpful information for their problem, and explain the meaning of their data analysis based on the result, such as how they can decide the result is accurate and meaningful enough to allow them to take an action. Do you have any questions about Big Data? Big data is a pretty interesting and useful topic. It will be nice to have more background information to help our understanding.
  • 13. Mar 7, 2014 14 Undergrad Student #2 How Big is Big? How is BigData measured? The survey is asking rather easy conceptual questions about big data. Big data is easy to understand at that level: we finally have the technology to store, retrieve (cheap memory), and analyze (with proper languages) data on magnitudes that were impossible before. Instead of just a phone book type of data, people can gather every relevant or even possibly relevant piece of information about anything (often but not limited to customers of a business). I have read articles about how some companies (credit card mostly, if I remember correctly) that can tell if a woman is pregnant before they even know themselves. Or they can predict divorce rates a year in advance quite reliably. All this from their spending habits and deviations from those habits. While all this is fascinating, I don't have any real interest in learning the conceptual level like this. If big data is to be relevant in a class, it needs to show HOW all this is done. Teach the language, teach the search and statistical algorithms, or even the methods people use to collect big data (the penta+bytes aren't being entered by hand). Classes or lectures on big data should come away with some practical knowledge on the subject, otherwise we're just applying a name to something people generally understand: organizations collect and analyze as much data as they can, and recent technology has made that amount of data staggeringly large. The key- and buzz-words are nice to sound like an expert, but the how to is generally more important.
  • 14. Mar 7, 2014 15 Student Response #4 How Big is Big? How is BigData measured? Big data is a term developed recently to describe the trend of exponentially increasing amount of data stored by organizations for business uses. Very often these big data might be extremely big, such as 16 petabytes. These data is measured by the memory space they occupy. Thus, a 16 petabytes of big data approximately occupies 1015 bytes of memory. Where does BigData come from? Big Data could come from different sources, such as emails, social-networking sites, sensors on the webs, sensors installed on other tracking devices, or line of business applications. Is Big Data relevant to you professionally? Yes. In my previous work as market researcher, we always needed to gather information and analyzed them for the business decision making. The technologies of gathering big data and the techniques used to analyze and filter data is also considered extremely helpful for the career.
  • 15. Mar 7, 2014 16 Data Warehouse Course Student Comments: Very informative, content-rich course, covers the latest technologies, trends, and skills of data warehousing and data management, and data analysis. I would recommend to include this course in the required courses for the MS in CIS with concentration in Database Management and BI Program. Relevance to job opportunities and cutting edge technologies. This is probably the most useful course I have taken at Boston University. I have used every bit of what this professor taught every night at work. I have made contribution to my employer, a data mining company in ways that had never been done before as a result of this course. I have for the first time in my 8 years career planned, designed, and augmented a Data Warehouse from scratch. I have configured an analysis server and reported using MD x queries. This professor has been helpful in many ways. He has guided me through some Data Warehouse design projects at work. Moreover, he has been available to work with me and others after class and on week days.
  • 16. Mar 7, 2014 17 Road map
  • 17. to help archaeologists find answers to questions hidden in thousands of images and text files generated from field sites around the world: Professor Mark Eramian et al. have been awarded $548,000 through the Digging into Data Challenge, National Endowment for the Humanities A Archeology
  • 18. Recently, a researcher wanted to ascertain whether a search against GQ-Pat could provide novel insight into his work related to a specific gene, the cAMP Responsive Element Modulator. Reporting to the VP of R&D: Apply data mining and machine learning techniques to develop better search and content discovery in the field of patents Invent new ways to index tens of millions of documents with semantic information B Biology
  • 20. Quiz:
  • 21. Nathan Kohn BU MET enzyme@bu.edu Stanislav Seltser BU MET sseltser@bu.edu