2. Excel (software
application)
• What is it?
• A spreadsheet application that helps you
analyse data efficiently. It is an elite member
of the Microsoft Office suite of software
applications. If you weren’t living under a rock
all these years, you would have surely worked
on Excel. From schools to industries, everybody
uses Excel. It is an indispensable tool in the
data analyst’s arsenal.
• Who made it?
• It was developed by Microsoft for Windows,
macOS, Android and iOS.
3. R (programming
language)
• What is it?
• An open source (freely available) language for
statistical investigation and visualization. It is
the descendant of the S language. You can call
R, the “Batman” of the data science world.
Current version (as of May 2018) : 3.5.0
• R has a commercial sibling called S-PLUS.
• Who made it?
• This incredible tool was created by Ross Ihaka
and Robert Gentleman. You can easily guess
how the language got its name. R is currently
developed by the R Development Core Team.
4. R Studio (integrated
development
environment for R)
• What is it?
• An open source tool for implementing the R
language. Whenever you hear about R, you will
also hear about R Studio. R Studio is like the
“Batcave” where you can perform all your
statistical analysis. It is just as intuitive as
Google in completing your sentences -
commands. It is important to download R
along with R Studio.
• Who made it?
• RStudio was founded by JJ Allaire, creator of
the programming language ColdFusion.
5. Python
(programming
language)
• What is it?
• An open source language used for general
purpose programming. It can used for
statistical computing, implementing AI,
creating games, and web applications. You can
call it the “Superman” of the data science
world. Current version (as of May 2018) : 3.7
• Who made it?
• Created by Guido van Rossum and first
released in 1991.
6. Jupyter (a non-profit,
open-source project)
• What is it?
• Project Jupyter is a revolutionary non-profit open-
source project which builds software applications
for interactive computing andsuch applications
support dozens of programming languages. A
popular web-based application used by data
scientists and data enthusiasts is the Jupyter
notebook.
• The Jupyter Notebook is an incredibly powerful
tool for interactively developing and presenting
data science projects.
• Who made it?
• Jupyter is developed in the open on GitHub,
through the consensus of the Jupyter community.
7. Anaconda (an open
source distribution
for Python and R)
• What is it?
• An open source distribution of the Python and
R programming languages for data science
and machine learning related applications. It
comes with all the necessary tools and
packages for data analysis, eliminating the
burden from the user who will be on a pursuit
for such tools.
• The distribution includes Jupyter Notebook.
• Who made it?
• Developed by Anaconda Inc.
8. SPSS (software
application)
• What is it?
• SPSS is a commercially available software
package for performing statistical analysis. It
offers a rich set of capabilities for every stage
of the analytical process.
• SPSS stands for “Statistical Package for the
Social Sciences”, and is officially known as
IBM SPSS Statistics, but most users refer to it
as “SPSS”.
• Who made it?
• The software was developed by the SPSS Inc.
• It was later acquired by IBM in 2009.
9. Java (programming
language)
• What is it?
• Java is a general purpose programming
language that can be used for data analysis,
statistical modelling and to build virtually
anything. Java is instrumental in the creation
of popular data science applications that are
used today. A prime example would be
Hadoop.
• As Java is one of the oldest languages, it
comes with a great many libraries and tools
for machine learning and data science.
• Who made it?
• Developed by Sun Microsystems (now owned
by Oracle Corporation) and designed by James
Gosling.
10. Julia (programming
language)
• What is it?
• Julia is a open source programming language
for technical computing, data exploration, and
analysis. It is relatively new.
• It has attracted some high-profile clients, from
investment manager BlackRock, which uses it
for time-series analytics, to the British insurer
Aviva, which uses it for risk calculations.
• Who made it?
• Designed by Jeff Bezanson, Alan Edelman,
Stefan Karpinski, and Viral B. Shah.
11. MATLAB
(programming
language)
• What is it?
• MATLAB stands for Matrix Laboratory. It is a
commercially available programming
language for mathematical computing, data
processing and visualization. It is the easiest
and most productive software environment for
engineers and scientists.
• Who made it?
• Designed by Cleve Moler and developed by
MathWorks.
12. GNU Octave
(programming
language)
• What is it?
• GNU Octave is an open source programming
language used for numerical computations and
data analysis. Octave is one of the major free
alternatives to MATLAB. It can be used for
creating data visualizations in 2D and 3D.
• Octave has support for various statistical
methods. This includes basic descriptive
statistics, probability distributions, statistical
tests, random number generation, and much
more. It was named after a chemical engineer
professor Octave Levenspiel.
• Who made it?
• Developed by John W. Eaton and many others[
13. Database (any data
management
system)
• What is it?
• A Database is a general term for an organized collection of
data.
• Databases support storage and manipulation of data.
• The data is organized into rows and columns which is in the
form of a table. This is referred as a Relational Database.
• SQL is a popular language used by 90% of data scientists
for inserting, searching, updating, and deleting database
records. It stands for Structured Query Language.
• Relational databases like MySQL Database, Oracle, Ms SQL
server, Sybase, etc uses SQL. SQL can be pronounced as
“sequel” or “es-que-el”.
• Who made it?
• SQL was developed by Donald D. Chamberlin and Raymond
F. Boyce
14. Tableau (software
company)
• What is it?
• Tableau is the provider of various interactive data
visualization tools focused on business intelligence.
Their commercially available product is called Tableau
Desktop and it comes with 14-days trail period.
• Tableau can connect to almost any database, and
allows the user to drag and drop data to create
interesting visualizations.
• Tableau is also freely available as Tableau Public.
• Tableau is based on VizQL (visual query language) which
allows simple drag and drop approach to create
incredible data visualizations.
• Who made it?
• Tableau was founded by Pat Hanrahan, Christian
Chabot, and Chris Stolte
15. Qlik (software
company)
• What is it?
• Qlik is the provider of QlikView and Qlik Sense,
business intelligence & visualization software.
• QlikView allows users to rapidly build and
deploy analytic apps without the need for
professional development skills
• Who made it?
• Qlik was founded by Björn Berg and Staffan
Gestrelius
16. Hadoop (a big
data framework)
• What is it?
• Hadoop is an open source, Java-based programming
framework where you can work on large volumes and
varieties of data that cannot be stored and processed in
relational databases.
• The name Hadoop is a made-up name. It owes its name
to a stuffed toy elephant owned by the creator Doug
Cutting’s son.
• Hadoop consists of three key parts – HDFS(distributed
file storage layer), Map-Reduce (distributed processing
layer) and YARN (data management layer).
• Who made it?
• Hadoop was created by Doug Cutting and Mike
Cafarella and presently developed by Apache Software
Foundation.
• Hadoop's MapReduce and HDFS components drew
inspiration from Google papers on MapReduce and
Google File System.
17. Hive (a data
warehouse
software)• What is it?
• Hive is a data warehouse software built on top
of Hadoop for providing data summarization,
query and analysis.
• Hive provides a mechanism to work on data
using a SQL like language called HiveQL.
• HiveQL automatically translates SQL-like
queries into MapReduce jobs executed on
Hadoop.
• Who made it?
• While initially developed by Facebook, Hive is
used and developed by other companies such
as Netflix and the Financial Industry
Regulatory Authority (FINRA).
18. Pig (an open-
source
technology)• What is it?
• Pig is a high-level platform for creating
programs that run on Hadoop. The scripting
language used for this platform is called Pig
Latin.
• Pig Latin enables users to write complex data
transformations without knowing Java. Map-
reduce programs were primarily written in
Java.
• Pig scripts are translated into a series of
MapReduce jobs that are executed on Hadoop.
• Who made it?
• Pig was a result of development effort at
19. Spark (a big data
processing
framework)
• What is it?
• Apache Spark is a fast and efficient big data
processing framework with built-in modules for
streaming, SQL, machine learning and graph
processing.
• While Hadoop suits for batch processing of
data, Spark is specially useful for real-time
streaming data.
• Who made it?
• Spark was authored by Matei Zaharia.
• It is developed by Apache Software
Foundation, UC Berkeley AMPLab, and
Databricks.
20. Github (software
development
platform)
• What is it?
• Github is a web-based hosting platform for
computer science projects. Its main
implementation is version control. This helps in
keeping tabs on changes to a project. GitHub
allows developers to discover, share, and build
better software.
• A budding data scientist can present her/his data
science projects on GitHub. If a Facebook account
is your personal profile and a Linkedin account is
your professional profile, think of Github as your
technical profile.
• Who made it?
• Github was founded by Tom Preston-Werner
21. Kaggle (a data
science platform)
• What is it?
• Kaggle is a platform for learning data science
and hosting analytics competitions in which
users compete to build the best models for
analysing and predicting the datasets
uploaded by companies and users.
• Datasets are available on everything from
government, health, and science to popular
games and dating trends.
• Who made it?
• Kaggle was founded by Anthony Goldbloom
and its parent organization is Google.
22. DataCamp (a web-
based learning
platform)
• What is it?
• DataCamp is a popular online interactive
training and education platform in the field of
data analytics.
• DataCamp offers free and premium interactive
online training by experts from various fields.
• Who made it?
• DataCamp was founded by Martijn Theuwissen
and Jonathan Cornelissen.