SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
BioCloud

Random large-scale tools that you
           can use
Disclaimer

I'm working on computer security research... no biology
background anywhere in my field, not even on computer virus ;)

While working, I stumbled across hadoop for scalable web
spidering purposes.

I'm not a bioinformatician (yet)... but I saw a powerful tool that
could be useful in your research field(s):

                       "biodatacrunching" ?
Glossary

• Cluster (beowulf)
• Grid
• Cloud
Biology and computer science

• Increasingly resource-hungry applications
   o Nowadays, they can be approached by "brute force"
   o More data means more "iron" to crunch it
• Local IT team nor budget keep up with this pace
   o €€€ spent on new hardware
   o €€€ spent on IT personnel
   o Isn't it wiser to scale one machine at a time ?
• Developers get angry or frustrated on
   o Delays on software installation and config
   o Unscheduled downtimes
   o Delays as a result of not enough computing power
What is cloud computing ?

In plain english:
http://www.youtube.com/watch?v=XdBd14rjcs0
Infrastructure layer
Cloud niche
Infraestructure

• Amazon
  o EC2
  o S3
  o AMI
      Recently added BioInformatic appliances
      Public data sets
• Eukalyptus
  o EC2 + AMI server-side open source implementation
  o We run it for our internal projects
• Enomalism
• Rightscale & Service Cloud
  o Tools/Consultants for the upcoming cloud issues
Application layer
        • Tecnologias para paralelizar
          aplicaciones
Application layer

• Hadoop
  o Open source mapreduce implementation
  o Java based, but any language can be used
• Cloudburst-bio
  o MapReduce fine tuned implementation for Bio (XXX)
Easy mapreduce
What is hadoop

Quotation from official web page:

 "Hadoop is a software platform that lets one easily write and
    run applications that process vast amounts of data."

"vast amounts of data (ATGTTAG...)" + "easily" = sounds good

                                  
                isn't it ? or is it vaporware ?
Why is it used for ?

• Attack problems that imply several GB, TB even PB of data
• The programmer does not care on job management
   o The focus is on data transformation, piping (useful work)

• Not intended for realtime processing
• Suitable to offload databases from long batch jobs
What is MapReduce

Joel on software explanation
Useful to crunch *tons* of data parallellized by design
HDFS: Hadoop Distributed FileSystem
What about Jobs control ?
Who is using it ?

• Google
  o Lots of internal projects (proprietary MapReduce)
      GMail spam machine learning
      Google maps
      ...

• Yahoo
  o Internal web graph (powers search engine)
  o Pig (sqlish abstraction)
  o Sort 1 terabyte of data in 209 seconds

• Facebook
   o Users big graph, used for data mining (Hive)
Hadoop has (lots of) new friends

•   Nutch
•   Mahout
•   Hbase
•   Hama
•   Pig
•   ZooKeeper
•   Smartfrog
•   ...
Next steps ?

Identify resource-hungry applications (batch vs interactive)
Migrate apps to cloud
1) Allocate a certain fixed amount of money
2) Give a try on amazon EC2
3) Optional: Build (local) rocks cluster with Eukaliptus cloud

Test, deploy, automate, automate and automate ... puppet ?
(a few) References


http://www.cloudera.com/hadoop-training-thinking-at-scale
http://www.slideshare.net/tag/hadoop
http://sourceforge.net/projects/cloudburst-bio/
http://hadoop.apache.org/core/
http://people.apache.org/~rdonkin/hadoop-talk/hadoop.html

Más contenido relacionado

La actualidad más candente

Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudAlluxio, Inc.
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructuredatastack
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick viewRajesh Nadipalli
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irdatastack
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightGert Drapers
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟datastack
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyThe Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyAlluxio, Inc.
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1Milind gunjan
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Alluxio, Inc.
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystemJakub Stransky
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio, Inc.
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overviewvhrocca
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And HiveCloudera, Inc.
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data SymphonyAlluxio, Inc.
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudQubole
 

La actualidad más candente (20)

Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
 
Data Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and CloudData Orchestration for AI, Big Data, and Cloud
Data Orchestration for AI, Big Data, and Cloud
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
 
Hd insight essentials quick view
Hd insight essentials quick viewHd insight essentials quick view
Hd insight essentials quick view
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
 
The Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsightThe Fundamentals Guide to HDP and HDInsight
The Fundamentals Guide to HDP and HDInsight
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
 
The Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and ResiliencyThe Pandemic Changes Everything, the Need for Speed and Resiliency
The Pandemic Changes Everything, the Need for Speed and Resiliency
 
BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1BigData- On - AWS Cloud -1
BigData- On - AWS Cloud -1
 
Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS Enabling big data & AI workloads on the object store at DBS
Enabling big data & AI workloads on the object store at DBS
 
Hadoop distributions - ecosystem
Hadoop distributions - ecosystemHadoop distributions - ecosystem
Hadoop distributions - ecosystem
 
Hadoop
HadoopHadoop
Hadoop
 
Alluxio Use Cases and Future Directions
Alluxio Use Cases and Future DirectionsAlluxio Use Cases and Future Directions
Alluxio Use Cases and Future Directions
 
Hd insight overview
Hd insight overviewHd insight overview
Hd insight overview
 
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
Hw09   Rethinking The Data Warehouse With Hadoop And HiveHw09   Rethinking The Data Warehouse With Hadoop And Hive
Hw09 Rethinking The Data Warehouse With Hadoop And Hive
 
Orchestrate a Data Symphony
Orchestrate a Data SymphonyOrchestrate a Data Symphony
Orchestrate a Data Symphony
 
Hadoop
HadoopHadoop
Hadoop
 
The solution for big data
The solution for big dataThe solution for big data
The solution for big data
 
Optimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public CloudOptimizing Big Data to run in the Public Cloud
Optimizing Big Data to run in the Public Cloud
 

Destacado

Código limpo: Comentários
Código limpo:   ComentáriosCódigo limpo:   Comentários
Código limpo: ComentáriosInael Rodrigues
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMABioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMAM. Gonzalo Claros
 
Contact Center Technology Trends: Part 1
Contact Center Technology Trends: Part 1Contact Center Technology Trends: Part 1
Contact Center Technology Trends: Part 1DATAMARK
 
Global remittances product writeup
Global remittances   product writeupGlobal remittances   product writeup
Global remittances product writeupPartho Chakraborty
 
Cloud Based Infrastructure for Banking
Cloud Based Infrastructure for BankingCloud Based Infrastructure for Banking
Cloud Based Infrastructure for BankingHeri Supriadi
 
Cloud Computing for Banking - Accenture
Cloud Computing for Banking - AccentureCloud Computing for Banking - Accenture
Cloud Computing for Banking - AccentureKim Jensen
 
The Everyday Bank: The Role of Cloud Computing in the Future of Banking
The Everyday Bank: The Role of Cloud Computing in the Future of BankingThe Everyday Bank: The Role of Cloud Computing in the Future of Banking
The Everyday Bank: The Role of Cloud Computing in the Future of BankingAccenture Technology
 
fog computing ppt
fog computing ppt fog computing ppt
fog computing ppt sravya raju
 
Cloud computing security issues and challenges
Cloud computing security issues and challengesCloud computing security issues and challenges
Cloud computing security issues and challengesDheeraj Negi
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing SecurityNinh Nguyen
 
Data security in cloud computing
Data security in cloud computingData security in cloud computing
Data security in cloud computingPrince Chandu
 

Destacado (17)

Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Código limpo: Comentários
Código limpo:   ComentáriosCódigo limpo:   Comentários
Código limpo: Comentários
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMABioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
Bioinformática y supercomputación. Razones para hacerse bioinformático en la UMA
 
Contact Center Technology Trends: Part 1
Contact Center Technology Trends: Part 1Contact Center Technology Trends: Part 1
Contact Center Technology Trends: Part 1
 
Global remittances product writeup
Global remittances   product writeupGlobal remittances   product writeup
Global remittances product writeup
 
Cloud banking
Cloud bankingCloud banking
Cloud banking
 
Salesforce - AI for CRM
Salesforce - AI for CRMSalesforce - AI for CRM
Salesforce - AI for CRM
 
Cloud Based Infrastructure for Banking
Cloud Based Infrastructure for BankingCloud Based Infrastructure for Banking
Cloud Based Infrastructure for Banking
 
Cloud Computing for Banking - Accenture
Cloud Computing for Banking - AccentureCloud Computing for Banking - Accenture
Cloud Computing for Banking - Accenture
 
The Everyday Bank: The Role of Cloud Computing in the Future of Banking
The Everyday Bank: The Role of Cloud Computing in the Future of BankingThe Everyday Bank: The Role of Cloud Computing in the Future of Banking
The Everyday Bank: The Role of Cloud Computing in the Future of Banking
 
fog computing ppt
fog computing ppt fog computing ppt
fog computing ppt
 
Cloud security ppt
Cloud security pptCloud security ppt
Cloud security ppt
 
Cloud computing security issues and challenges
Cloud computing security issues and challengesCloud computing security issues and challenges
Cloud computing security issues and challenges
 
Cloud Computing Security
Cloud Computing SecurityCloud Computing Security
Cloud Computing Security
 
Data security in cloud computing
Data security in cloud computingData security in cloud computing
Data security in cloud computing
 
FOG COMPUTING
FOG COMPUTINGFOG COMPUTING
FOG COMPUTING
 

Similar a Cloud computing and Hadoop introduction

Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talksyhadoop
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow霈萱 蔡
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)Sascha Dittmann
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridEvert Lammerts
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Chris Baglieri
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionCodemotion
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudKhazret Sapenov
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An OverviewC. Scyphers
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist SoftServe
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Mahantesh Angadi
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-HadoopNagarjuna D.N
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopJosh Patterson
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopAmir Shaikh
 

Similar a Cloud computing and Hadoop introduction (20)

Hadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University TalksHadoop at Yahoo! -- University Talks
Hadoop at Yahoo! -- University Talks
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Raspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflowRaspberry pi x kubernetes x tensorflow
Raspberry pi x kubernetes x tensorflow
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
 
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
SQLSaturday #230 - Introduction to Microsoft Big Data (Part 1)
 
Hadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG GridHadoop @ Sara & BiG Grid
Hadoop @ Sara & BiG Grid
 
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
 
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in ProductionTugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
Tugdual Grall - Real World Use Cases: Hadoop and NoSQL in Production
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
The elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloudThe elephantintheroom bigdataanalyticsinthecloud
The elephantintheroom bigdataanalyticsinthecloud
 
WTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The FundamentalsWTIA Cloud Computing Series - Part I: The Fundamentals
WTIA Cloud Computing Series - Part I: The Fundamentals
 
Big Data: An Overview
Big Data: An OverviewBig Data: An Overview
Big Data: An Overview
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist Essential Data Engineering for Data Scientist
Essential Data Engineering for Data Scientist
 
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
Introduction and Overview of BigData, Hadoop, Distributed Computing - BigData...
 
Introduction to Cloud computing and Big Data-Hadoop
Introduction to Cloud computing and  Big Data-HadoopIntroduction to Cloud computing and  Big Data-Hadoop
Introduction to Cloud computing and Big Data-Hadoop
 
Oct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on HadoopOct 2011 CHADNUG Presentation on Hadoop
Oct 2011 CHADNUG Presentation on Hadoop
 
Introduction to BIg Data and Hadoop
Introduction to BIg Data and HadoopIntroduction to BIg Data and Hadoop
Introduction to BIg Data and Hadoop
 
Hadoop and Big Data
Hadoop and Big DataHadoop and Big Data
Hadoop and Big Data
 
Hadoop jon
Hadoop jonHadoop jon
Hadoop jon
 

Último

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfAijun Zhang
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6DianaGray10
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?SANGHEE SHIN
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IES VE
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopBachir Benyammi
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...Aggregage
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIUdaiappa Ramachandran
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Websitedgelyza
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...DianaGray10
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Commit University
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum ComputingGDSC PJATK
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1DianaGray10
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.YounusS2
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesDavid Newbury
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostMatt Ray
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxGDSC PJATK
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfAnna Loughnan Colquhoun
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UbiTrack UK
 

Último (20)

Machine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdfMachine Learning Model Validation (Aijun Zhang 2024).pdf
Machine Learning Model Validation (Aijun Zhang 2024).pdf
 
UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6UiPath Studio Web workshop series - Day 6
UiPath Studio Web workshop series - Day 6
 
Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?Do we need a new standard for visualizing the invisible?
Do we need a new standard for visualizing the invisible?
 
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
IESVE Software for Florida Code Compliance Using ASHRAE 90.1-2019
 
NIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 WorkshopNIST Cybersecurity Framework (CSF) 2.0 Workshop
NIST Cybersecurity Framework (CSF) 2.0 Workshop
 
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
The Data Metaverse: Unpacking the Roles, Use Cases, and Tech Trends in Data a...
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
RAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AIRAG Patterns and Vector Search in Generative AI
RAG Patterns and Vector Search in Generative AI
 
COMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a WebsiteCOMPUTER 10 Lesson 8 - Building a Website
COMPUTER 10 Lesson 8 - Building a Website
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
Connector Corner: Extending LLM automation use cases with UiPath GenAI connec...
 
Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)Crea il tuo assistente AI con lo Stregatto (open source python framework)
Crea il tuo assistente AI con lo Stregatto (open source python framework)
 
Introduction to Quantum Computing
Introduction to Quantum ComputingIntroduction to Quantum Computing
Introduction to Quantum Computing
 
Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1Secure your environment with UiPath and CyberArk technologies - Session 1
Secure your environment with UiPath and CyberArk technologies - Session 1
 
Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.Basic Building Blocks of Internet of Things.
Basic Building Blocks of Internet of Things.
 
Linked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond OntologiesLinked Data in Production: Moving Beyond Ontologies
Linked Data in Production: Moving Beyond Ontologies
 
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCostKubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
KubeConEU24-Monitoring Kubernetes and Cloud Spend with OpenCost
 
Cybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptxCybersecurity Workshop #1.pptx
Cybersecurity Workshop #1.pptx
 
Spring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdfSpring24-Release Overview - Wellingtion User Group-1.pdf
Spring24-Release Overview - Wellingtion User Group-1.pdf
 
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
UWB Technology for Enhanced Indoor and Outdoor Positioning in Physiological M...
 

Cloud computing and Hadoop introduction

  • 2. Disclaimer I'm working on computer security research... no biology background anywhere in my field, not even on computer virus ;) While working, I stumbled across hadoop for scalable web spidering purposes. I'm not a bioinformatician (yet)... but I saw a powerful tool that could be useful in your research field(s): "biodatacrunching" ?
  • 4. Biology and computer science • Increasingly resource-hungry applications o Nowadays, they can be approached by "brute force" o More data means more "iron" to crunch it • Local IT team nor budget keep up with this pace o €€€ spent on new hardware o €€€ spent on IT personnel o Isn't it wiser to scale one machine at a time ? • Developers get angry or frustrated on o Delays on software installation and config o Unscheduled downtimes o Delays as a result of not enough computing power
  • 5. What is cloud computing ? In plain english: http://www.youtube.com/watch?v=XdBd14rjcs0
  • 8. Infraestructure • Amazon o EC2 o S3 o AMI  Recently added BioInformatic appliances  Public data sets • Eukalyptus o EC2 + AMI server-side open source implementation o We run it for our internal projects • Enomalism • Rightscale & Service Cloud o Tools/Consultants for the upcoming cloud issues
  • 9. Application layer • Tecnologias para paralelizar aplicaciones
  • 10. Application layer • Hadoop o Open source mapreduce implementation o Java based, but any language can be used • Cloudburst-bio o MapReduce fine tuned implementation for Bio (XXX)
  • 12. What is hadoop Quotation from official web page: "Hadoop is a software platform that lets one easily write and run applications that process vast amounts of data." "vast amounts of data (ATGTTAG...)" + "easily" = sounds good   isn't it ? or is it vaporware ?
  • 13. Why is it used for ? • Attack problems that imply several GB, TB even PB of data • The programmer does not care on job management o The focus is on data transformation, piping (useful work) • Not intended for realtime processing • Suitable to offload databases from long batch jobs
  • 14. What is MapReduce Joel on software explanation Useful to crunch *tons* of data parallellized by design
  • 16. What about Jobs control ?
  • 17. Who is using it ? • Google o Lots of internal projects (proprietary MapReduce)  GMail spam machine learning  Google maps  ... • Yahoo o Internal web graph (powers search engine) o Pig (sqlish abstraction) o Sort 1 terabyte of data in 209 seconds • Facebook o Users big graph, used for data mining (Hive)
  • 18. Hadoop has (lots of) new friends • Nutch • Mahout • Hbase • Hama • Pig • ZooKeeper • Smartfrog • ...
  • 19. Next steps ? Identify resource-hungry applications (batch vs interactive) Migrate apps to cloud 1) Allocate a certain fixed amount of money 2) Give a try on amazon EC2 3) Optional: Build (local) rocks cluster with Eukaliptus cloud Test, deploy, automate, automate and automate ... puppet ?