SlideShare una empresa de Scribd logo
1 de 23
KNOW THY LOGOS
DATA SCIENCE EDITION
BY VISHAL
Excel (software
application)
• What is it?
• A spreadsheet application that helps you
analyse data efficiently. It is an elite member
of the Microsoft Office suite of software
applications. If you weren’t living under a rock
all these years, you would have surely worked
on Excel. From schools to industries, everybody
uses Excel. It is an indispensable tool in the
data analyst’s arsenal.
• Who made it?
• It was developed by Microsoft for Windows,
macOS, Android and iOS.
R (programming
language)
• What is it?
• An open source (freely available) language for
statistical investigation and visualization. It is
the descendant of the S language. You can call
R, the “Batman” of the data science world.
Current version (as of May 2018) : 3.5.0
• R has a commercial sibling called S-PLUS.
• Who made it?
• This incredible tool was created by Ross Ihaka
and Robert Gentleman. You can easily guess
how the language got its name. R is currently
developed by the R Development Core Team.
R Studio (integrated
development
environment for R)
• What is it?
• An open source tool for implementing the R
language. Whenever you hear about R, you will
also hear about R Studio. R Studio is like the
“Batcave” where you can perform all your
statistical analysis. It is just as intuitive as
Google in completing your sentences -
commands. It is important to download R
along with R Studio.
• Who made it?
• RStudio was founded by JJ Allaire, creator of
the programming language ColdFusion.
Python
(programming
language)
• What is it?
• An open source language used for general
purpose programming. It can used for
statistical computing, implementing AI,
creating games, and web applications. You can
call it the “Superman” of the data science
world. Current version (as of May 2018) : 3.7
• Who made it?
• Created by Guido van Rossum and first
released in 1991.
Jupyter (a non-profit,
open-source project)
• What is it?
• Project Jupyter is a revolutionary non-profit open-
source project which builds software applications
for interactive computing andsuch applications
support dozens of programming languages. A
popular web-based application used by data
scientists and data enthusiasts is the Jupyter
notebook.
• The Jupyter Notebook is an incredibly powerful
tool for interactively developing and presenting
data science projects.
• Who made it?
• Jupyter is developed in the open on GitHub,
through the consensus of the Jupyter community.
Anaconda (an open
source distribution
for Python and R)
• What is it?
• An open source distribution of the Python and
R programming languages for data science
and machine learning related applications. It
comes with all the necessary tools and
packages for data analysis, eliminating the
burden from the user who will be on a pursuit
for such tools.
• The distribution includes Jupyter Notebook.
• Who made it?
• Developed by Anaconda Inc.
SPSS (software
application)
• What is it?
• SPSS is a commercially available software
package for performing statistical analysis. It
offers a rich set of capabilities for every stage
of the analytical process.
• SPSS stands for “Statistical Package for the
Social Sciences”, and is officially known as
IBM SPSS Statistics, but most users refer to it
as “SPSS”.
• Who made it?
• The software was developed by the SPSS Inc.
• It was later acquired by IBM in 2009.
Java (programming
language)
• What is it?
• Java is a general purpose programming
language that can be used for data analysis,
statistical modelling and to build virtually
anything. Java is instrumental in the creation
of popular data science applications that are
used today. A prime example would be
Hadoop.
• As Java is one of the oldest languages, it
comes with a great many libraries and tools
for machine learning and data science.
• Who made it?
• Developed by Sun Microsystems (now owned
by Oracle Corporation) and designed by James
Gosling.
Julia (programming
language)
• What is it?
• Julia is a open source programming language
for technical computing, data exploration, and
analysis. It is relatively new.
• It has attracted some high-profile clients, from
investment manager BlackRock, which uses it
for time-series analytics, to the British insurer
Aviva, which uses it for risk calculations.
• Who made it?
• Designed by Jeff Bezanson, Alan Edelman,
Stefan Karpinski, and Viral B. Shah.
MATLAB
(programming
language)
• What is it?
• MATLAB stands for Matrix Laboratory. It is a
commercially available programming
language for mathematical computing, data
processing and visualization. It is the easiest
and most productive software environment for
engineers and scientists.
• Who made it?
• Designed by Cleve Moler and developed by
MathWorks.
GNU Octave
(programming
language)
• What is it?
• GNU Octave is an open source programming
language used for numerical computations and
data analysis. Octave is one of the major free
alternatives to MATLAB. It can be used for
creating data visualizations in 2D and 3D.
• Octave has support for various statistical
methods. This includes basic descriptive
statistics, probability distributions, statistical
tests, random number generation, and much
more. It was named after a chemical engineer
professor Octave Levenspiel.
• Who made it?
• Developed by John W. Eaton and many others[
Database (any data
management
system)
• What is it?
• A Database is a general term for an organized collection of
data.
• Databases support storage and manipulation of data.
• The data is organized into rows and columns which is in the
form of a table. This is referred as a Relational Database.
• SQL is a popular language used by 90% of data scientists
for inserting, searching, updating, and deleting database
records. It stands for Structured Query Language.
• Relational databases like MySQL Database, Oracle, Ms SQL
server, Sybase, etc uses SQL. SQL can be pronounced as
“sequel” or “es-que-el”.
• Who made it?
• SQL was developed by Donald D. Chamberlin and Raymond
F. Boyce
Tableau (software
company)
• What is it?
• Tableau is the provider of various interactive data
visualization tools focused on business intelligence.
Their commercially available product is called Tableau
Desktop and it comes with 14-days trail period.
• Tableau can connect to almost any database, and
allows the user to drag and drop data to create
interesting visualizations.
• Tableau is also freely available as Tableau Public.
• Tableau is based on VizQL (visual query language) which
allows simple drag and drop approach to create
incredible data visualizations.
• Who made it?
• Tableau was founded by Pat Hanrahan, Christian
Chabot, and Chris Stolte
Qlik (software
company)
• What is it?
• Qlik is the provider of QlikView and Qlik Sense,
business intelligence & visualization software.
• QlikView allows users to rapidly build and
deploy analytic apps without the need for
professional development skills
• Who made it?
• Qlik was founded by Björn Berg and Staffan
Gestrelius
Hadoop (a big
data framework)
• What is it?
• Hadoop is an open source, Java-based programming
framework where you can work on large volumes and
varieties of data that cannot be stored and processed in
relational databases.
• The name Hadoop is a made-up name. It owes its name
to a stuffed toy elephant owned by the creator Doug
Cutting’s son.
• Hadoop consists of three key parts – HDFS(distributed
file storage layer), Map-Reduce (distributed processing
layer) and YARN (data management layer).
• Who made it?
• Hadoop was created by Doug Cutting and Mike
Cafarella and presently developed by Apache Software
Foundation.
• Hadoop's MapReduce and HDFS components drew
inspiration from Google papers on MapReduce and
Google File System.
Hive (a data
warehouse
software)• What is it?
• Hive is a data warehouse software built on top
of Hadoop for providing data summarization,
query and analysis.
• Hive provides a mechanism to work on data
using a SQL like language called HiveQL.
• HiveQL automatically translates SQL-like
queries into MapReduce jobs executed on
Hadoop.
• Who made it?
• While initially developed by Facebook, Hive is
used and developed by other companies such
as Netflix and the Financial Industry
Regulatory Authority (FINRA).
Pig (an open-
source
technology)• What is it?
• Pig is a high-level platform for creating
programs that run on Hadoop. The scripting
language used for this platform is called Pig
Latin.
• Pig Latin enables users to write complex data
transformations without knowing Java. Map-
reduce programs were primarily written in
Java.
• Pig scripts are translated into a series of
MapReduce jobs that are executed on Hadoop.
• Who made it?
• Pig was a result of development effort at
Spark (a big data
processing
framework)
• What is it?
• Apache Spark is a fast and efficient big data
processing framework with built-in modules for
streaming, SQL, machine learning and graph
processing.
• While Hadoop suits for batch processing of
data, Spark is specially useful for real-time
streaming data.
• Who made it?
• Spark was authored by Matei Zaharia.
• It is developed by Apache Software
Foundation, UC Berkeley AMPLab, and
Databricks.
Github (software
development
platform)
• What is it?
• Github is a web-based hosting platform for
computer science projects. Its main
implementation is version control. This helps in
keeping tabs on changes to a project. GitHub
allows developers to discover, share, and build
better software.
• A budding data scientist can present her/his data
science projects on GitHub. If a Facebook account
is your personal profile and a Linkedin account is
your professional profile, think of Github as your
technical profile.
• Who made it?
• Github was founded by Tom Preston-Werner
Kaggle (a data
science platform)
• What is it?
• Kaggle is a platform for learning data science
and hosting analytics competitions in which
users compete to build the best models for
analysing and predicting the datasets
uploaded by companies and users.
• Datasets are available on everything from
government, health, and science to popular
games and dating trends.
• Who made it?
• Kaggle was founded by Anthony Goldbloom
and its parent organization is Google.
DataCamp (a web-
based learning
platform)
• What is it?
• DataCamp is a popular online interactive
training and education platform in the field of
data analytics.
• DataCamp offers free and premium interactive
online training by experts from various fields.
• Who made it?
• DataCamp was founded by Martijn Theuwissen
and Jonathan Cornelissen.
TO BE CONTINUED..

Más contenido relacionado

La actualidad más candente

Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Markus Harrer
 

La actualidad más candente (15)

H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio H2O & Tensorflow - Fabrizio
H2O & Tensorflow - Fabrizio
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - Denver
 
The evolution of data analytics
The evolution of data analyticsThe evolution of data analytics
The evolution of data analytics
 
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
Big Data Everywhere Chicago: Leading a Healthcare Company to the Big Data Pro...
 
H2O.ai - Road Ahead - keynote presentation by Sri Ambati
H2O.ai - Road Ahead - keynote presentation by Sri AmbatiH2O.ai - Road Ahead - keynote presentation by Sri Ambati
H2O.ai - Road Ahead - keynote presentation by Sri Ambati
 
Intro to Neo4j and Graph Databases
Intro to Neo4j and Graph DatabasesIntro to Neo4j and Graph Databases
Intro to Neo4j and Graph Databases
 
Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Data Science in 2016: Moving Up
Data Science in 2016: Moving UpData Science in 2016: Moving Up
Data Science in 2016: Moving Up
 
Introduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for ManagersIntroduction to Deep Learning and AI at Scale for Managers
Introduction to Deep Learning and AI at Scale for Managers
 
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
Using Crowdsourced Images to Create Image Recognition Models with Analytics Z...
 
Large scale computing
Large scale computing Large scale computing
Large scale computing
 
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
Software Analytics with Jupyter, Pandas, jQAssistant, and Neo4j [Neo4j Online...
 
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2ODeep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
Deep Water - Bringing Tensorflow, Caffe, Mxnet to H2O
 
Iotbds v1.0
Iotbds v1.0Iotbds v1.0
Iotbds v1.0
 

Similar a Know thy logos

Similar a Know thy logos (20)

tools
toolstools
tools
 
R_L1-Aug-2022.pptx
R_L1-Aug-2022.pptxR_L1-Aug-2022.pptx
R_L1-Aug-2022.pptx
 
2019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 42019 DSA 105 Introduction to Data Science Week 4
2019 DSA 105 Introduction to Data Science Week 4
 
CSB_community
CSB_communityCSB_community
CSB_community
 
Big Data Technologies.pdf
Big Data Technologies.pdfBig Data Technologies.pdf
Big Data Technologies.pdf
 
Architecting Your First Big Data Implementation
Architecting Your First Big Data ImplementationArchitecting Your First Big Data Implementation
Architecting Your First Big Data Implementation
 
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data ScienceIntroduction to Data Science - Week 4 - Tools and Technologies in Data Science
Introduction to Data Science - Week 4 - Tools and Technologies in Data Science
 
Sudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdfSudipta_Mukherjee_Resume_APR_2023.pdf
Sudipta_Mukherjee_Resume_APR_2023.pdf
 
Computer programminglanguages
Computer programminglanguagesComputer programminglanguages
Computer programminglanguages
 
Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021Top 10 Data analytics tools to look for in 2021
Top 10 Data analytics tools to look for in 2021
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
 
DATA SCIENCE
DATA SCIENCEDATA SCIENCE
DATA SCIENCE
 
Data analytics
Data analyticsData analytics
Data analytics
 
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
Big Data Warsaw v 4 I "The Role of Hadoop Ecosystem in Advance Analytics" - R...
 
Data mining tools overall
Data mining tools overallData mining tools overall
Data mining tools overall
 
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)Data mining tools (R , WEKA, RAPID MINER, ORANGE)
Data mining tools (R , WEKA, RAPID MINER, ORANGE)
 
Business analytics tools
Business analytics toolsBusiness analytics tools
Business analytics tools
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to hadoop
Introduction to hadoopIntroduction to hadoop
Introduction to hadoop
 
Useful Open Source Software
Useful Open Source SoftwareUseful Open Source Software
Useful Open Source Software
 

Último

Último (20)

COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17How to Add New Custom Addons Path in Odoo 17
How to Add New Custom Addons Path in Odoo 17
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
Plant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptxPlant propagation: Sexual and Asexual propapagation.pptx
Plant propagation: Sexual and Asexual propapagation.pptx
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 

Know thy logos

  • 1. KNOW THY LOGOS DATA SCIENCE EDITION BY VISHAL
  • 2. Excel (software application) • What is it? • A spreadsheet application that helps you analyse data efficiently. It is an elite member of the Microsoft Office suite of software applications. If you weren’t living under a rock all these years, you would have surely worked on Excel. From schools to industries, everybody uses Excel. It is an indispensable tool in the data analyst’s arsenal. • Who made it? • It was developed by Microsoft for Windows, macOS, Android and iOS.
  • 3. R (programming language) • What is it? • An open source (freely available) language for statistical investigation and visualization. It is the descendant of the S language. You can call R, the “Batman” of the data science world. Current version (as of May 2018) : 3.5.0 • R has a commercial sibling called S-PLUS. • Who made it? • This incredible tool was created by Ross Ihaka and Robert Gentleman. You can easily guess how the language got its name. R is currently developed by the R Development Core Team.
  • 4. R Studio (integrated development environment for R) • What is it? • An open source tool for implementing the R language. Whenever you hear about R, you will also hear about R Studio. R Studio is like the “Batcave” where you can perform all your statistical analysis. It is just as intuitive as Google in completing your sentences - commands. It is important to download R along with R Studio. • Who made it? • RStudio was founded by JJ Allaire, creator of the programming language ColdFusion.
  • 5. Python (programming language) • What is it? • An open source language used for general purpose programming. It can used for statistical computing, implementing AI, creating games, and web applications. You can call it the “Superman” of the data science world. Current version (as of May 2018) : 3.7 • Who made it? • Created by Guido van Rossum and first released in 1991.
  • 6. Jupyter (a non-profit, open-source project) • What is it? • Project Jupyter is a revolutionary non-profit open- source project which builds software applications for interactive computing andsuch applications support dozens of programming languages. A popular web-based application used by data scientists and data enthusiasts is the Jupyter notebook. • The Jupyter Notebook is an incredibly powerful tool for interactively developing and presenting data science projects. • Who made it? • Jupyter is developed in the open on GitHub, through the consensus of the Jupyter community.
  • 7. Anaconda (an open source distribution for Python and R) • What is it? • An open source distribution of the Python and R programming languages for data science and machine learning related applications. It comes with all the necessary tools and packages for data analysis, eliminating the burden from the user who will be on a pursuit for such tools. • The distribution includes Jupyter Notebook. • Who made it? • Developed by Anaconda Inc.
  • 8. SPSS (software application) • What is it? • SPSS is a commercially available software package for performing statistical analysis. It offers a rich set of capabilities for every stage of the analytical process. • SPSS stands for “Statistical Package for the Social Sciences”, and is officially known as IBM SPSS Statistics, but most users refer to it as “SPSS”. • Who made it? • The software was developed by the SPSS Inc. • It was later acquired by IBM in 2009.
  • 9. Java (programming language) • What is it? • Java is a general purpose programming language that can be used for data analysis, statistical modelling and to build virtually anything. Java is instrumental in the creation of popular data science applications that are used today. A prime example would be Hadoop. • As Java is one of the oldest languages, it comes with a great many libraries and tools for machine learning and data science. • Who made it? • Developed by Sun Microsystems (now owned by Oracle Corporation) and designed by James Gosling.
  • 10. Julia (programming language) • What is it? • Julia is a open source programming language for technical computing, data exploration, and analysis. It is relatively new. • It has attracted some high-profile clients, from investment manager BlackRock, which uses it for time-series analytics, to the British insurer Aviva, which uses it for risk calculations. • Who made it? • Designed by Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah.
  • 11. MATLAB (programming language) • What is it? • MATLAB stands for Matrix Laboratory. It is a commercially available programming language for mathematical computing, data processing and visualization. It is the easiest and most productive software environment for engineers and scientists. • Who made it? • Designed by Cleve Moler and developed by MathWorks.
  • 12. GNU Octave (programming language) • What is it? • GNU Octave is an open source programming language used for numerical computations and data analysis. Octave is one of the major free alternatives to MATLAB. It can be used for creating data visualizations in 2D and 3D. • Octave has support for various statistical methods. This includes basic descriptive statistics, probability distributions, statistical tests, random number generation, and much more. It was named after a chemical engineer professor Octave Levenspiel. • Who made it? • Developed by John W. Eaton and many others[
  • 13. Database (any data management system) • What is it? • A Database is a general term for an organized collection of data. • Databases support storage and manipulation of data. • The data is organized into rows and columns which is in the form of a table. This is referred as a Relational Database. • SQL is a popular language used by 90% of data scientists for inserting, searching, updating, and deleting database records. It stands for Structured Query Language. • Relational databases like MySQL Database, Oracle, Ms SQL server, Sybase, etc uses SQL. SQL can be pronounced as “sequel” or “es-que-el”. • Who made it? • SQL was developed by Donald D. Chamberlin and Raymond F. Boyce
  • 14. Tableau (software company) • What is it? • Tableau is the provider of various interactive data visualization tools focused on business intelligence. Their commercially available product is called Tableau Desktop and it comes with 14-days trail period. • Tableau can connect to almost any database, and allows the user to drag and drop data to create interesting visualizations. • Tableau is also freely available as Tableau Public. • Tableau is based on VizQL (visual query language) which allows simple drag and drop approach to create incredible data visualizations. • Who made it? • Tableau was founded by Pat Hanrahan, Christian Chabot, and Chris Stolte
  • 15. Qlik (software company) • What is it? • Qlik is the provider of QlikView and Qlik Sense, business intelligence & visualization software. • QlikView allows users to rapidly build and deploy analytic apps without the need for professional development skills • Who made it? • Qlik was founded by Björn Berg and Staffan Gestrelius
  • 16. Hadoop (a big data framework) • What is it? • Hadoop is an open source, Java-based programming framework where you can work on large volumes and varieties of data that cannot be stored and processed in relational databases. • The name Hadoop is a made-up name. It owes its name to a stuffed toy elephant owned by the creator Doug Cutting’s son. • Hadoop consists of three key parts – HDFS(distributed file storage layer), Map-Reduce (distributed processing layer) and YARN (data management layer). • Who made it? • Hadoop was created by Doug Cutting and Mike Cafarella and presently developed by Apache Software Foundation. • Hadoop's MapReduce and HDFS components drew inspiration from Google papers on MapReduce and Google File System.
  • 17. Hive (a data warehouse software)• What is it? • Hive is a data warehouse software built on top of Hadoop for providing data summarization, query and analysis. • Hive provides a mechanism to work on data using a SQL like language called HiveQL. • HiveQL automatically translates SQL-like queries into MapReduce jobs executed on Hadoop. • Who made it? • While initially developed by Facebook, Hive is used and developed by other companies such as Netflix and the Financial Industry Regulatory Authority (FINRA).
  • 18. Pig (an open- source technology)• What is it? • Pig is a high-level platform for creating programs that run on Hadoop. The scripting language used for this platform is called Pig Latin. • Pig Latin enables users to write complex data transformations without knowing Java. Map- reduce programs were primarily written in Java. • Pig scripts are translated into a series of MapReduce jobs that are executed on Hadoop. • Who made it? • Pig was a result of development effort at
  • 19. Spark (a big data processing framework) • What is it? • Apache Spark is a fast and efficient big data processing framework with built-in modules for streaming, SQL, machine learning and graph processing. • While Hadoop suits for batch processing of data, Spark is specially useful for real-time streaming data. • Who made it? • Spark was authored by Matei Zaharia. • It is developed by Apache Software Foundation, UC Berkeley AMPLab, and Databricks.
  • 20. Github (software development platform) • What is it? • Github is a web-based hosting platform for computer science projects. Its main implementation is version control. This helps in keeping tabs on changes to a project. GitHub allows developers to discover, share, and build better software. • A budding data scientist can present her/his data science projects on GitHub. If a Facebook account is your personal profile and a Linkedin account is your professional profile, think of Github as your technical profile. • Who made it? • Github was founded by Tom Preston-Werner
  • 21. Kaggle (a data science platform) • What is it? • Kaggle is a platform for learning data science and hosting analytics competitions in which users compete to build the best models for analysing and predicting the datasets uploaded by companies and users. • Datasets are available on everything from government, health, and science to popular games and dating trends. • Who made it? • Kaggle was founded by Anthony Goldbloom and its parent organization is Google.
  • 22. DataCamp (a web- based learning platform) • What is it? • DataCamp is a popular online interactive training and education platform in the field of data analytics. • DataCamp offers free and premium interactive online training by experts from various fields. • Who made it? • DataCamp was founded by Martijn Theuwissen and Jonathan Cornelissen.