SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
So You Want To Be A Data Scientist?
What It Means To Be A Data Scientist
About:Me
Mohd Izhar Firdaus Ismail
- Current: Solution Architect @ ABYRES Enterprise
Technologies Sdn Bhd
- Open Source Activist & (self-proclaimed) Hacker, Open Data
Advocate, Fedora Ambassador, Data Architect, Data Engineer,
Consultant, Python Programmer, Analyst, Trainer, and bunch of
other hats ;-)
- Contributing to Open Source projects for over 8 years
- Over 6 years building systems related to data, content,
information and knowledge management
- http://linkedin.com/in/kagesenshi
- izhar@abyres.net / kagesenshi.87@gmail.com
The People I Work For
● Open Source Technology
Company
– Specialize in Cloud, Big Data &
Enterprise Application
Development
– Red Hat & Hortonworks Partner
● IT Consulting & Professional
Services around Open Source
Softwares
– Design, development,
implementation and training
services
– Consulting practice around
leveraging Open Source
technologies and implementing
Big Data project
● The largest organized mafia of
pure play open source geeks in
Malaysia ;-)
Before I Start
Some people call me a data scientist,
But I don't consider myself one (yet)
(( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point ))
But I do work quite a bit with data: designing application,
infrastructure, algorithms, processes and pipelines for big data
workload – from data acquisition to visualization
Who is A Data Scientist?
"Data scientists are involved with gathering data,
massaging it into a tractable form, making it tell its
story, and presenting that story to others."
- Mike Loukides, VP, O’Reilly Media.
"A data scientist is someone who can obtain, scrub,
explore, model and interpret data, blending hacking,
statistics and machine learning. Data scientists not only are
adept at working with data, but appreciate data itself as a
first-class product."
- Hillary Mason, Data Scientist, Accel, Scientist
Emeritus, bitly, co-founder, HackNY.
Whats With The Superhuman
Requirements?
Domain Knowledge & Soft Skills
● Knowledge to find what matters
– Knowing the statistics does not mean knowing
what is the significance of the results to a
business
– Business rules, terminologies, problem solving
techniques, scientific theories & formulas
– Identifying actionable informations
●
Problem solving & Hacker mindset
– New & creative ways to find, acquire,
transform, manipulate, mashing, and using
data
– Possibily unconventional uses of the same
result
– Knowing what data needed, and houw to get
them, to solve particular business problem
Math & Statistics
● People use your output for
decision making – wrong numbers
might end up with bad decisions
– Lies, damned lies, and statistics
● Machine Learning
– Predict future values
– Analyze patterns in structured and
unstructured data
– Automated decision support
systems
Programming & Database
● Programming
– Calculating few thousand rows on excel might be
okay, but dealing with distributed processing need
some skills
● Query over distributed data – you don't want a query that
stuck in a single core on a hundreds node cluster
– Simple visualizations can be done with drag-drop
builders, complex visualization will require you to get
yourself dirty
– Advanced decision system capabilities can only be
implemented through some sort of rule programming
– Develop data pipelines both batch and stream
– Develop data collection, scraping, machine learning &
artificial intelligence softwares
● Database
– Ingesting data from various type of sources,
managing data format, data storage, governance
Communication & Visualization
● Spreading information and discoveries
– Presenting data in the form that non-
scientist can understand
– Knowing how to explain to business users
as to why a result matters, how it can be
used to benefit the business,
organization, society
● Identifying patterns through visual
analysis
– Some insights might not be obvious when
presented in column and rows
– Knowing how to visualize information so
to make hidden patterns more obvious
Data Science
VS
Data Engineering
The Key Differences
● Data Science
– Problem solving through
strategies around data
– Hindsight, Insight,
Foresight
– Understanding of patterns,
behaviors, etc
– Automated Data Driven
Decision Making
● Data Engineering
– Ingestion pipelines
– Data integration
– Data enrichment
– Data cleansing
– Data preparation
– Data pipeline
Hadoop?
Hadoop is for Big Data
● Core of "Big Data"
– Techniques, technologies &
strategies, to handle ingestion,
storage, and processing of high
velocity, high volume, high
variety datasets
– Historical data, and not just
current state
– Transaction + interaction +
observation = Big Data
Data Science Need Big Data
"The reaction of one man could be forecast by no known mathematics;
the reaction of a billion is something else again"
– Asimov
● Without rich historical data, analysis and development become
more challenging
– Patterns will start to show itself in rich historical data
– Models that accurate with small data, might start to fall apart when
more parameters/data are introduced
● Start collecting data today!, you never know when you need it,
and when you do, the historical data is there for you to mine
Getting Started With Data Science
Some tips for beginners
Attn.
● Courses, trainings, documents, tools, etc will definitely
help you to establish your foundations and basics in
data science
– but, like any technical field, what important is your ability to
mash everything up and apply it to solve problems
● Anybody can learn how to draw, anybody can draw, but
not anybody can be an artist.
Domain & Business
● Learn more about your industry (or your target industry)
● Learn what make they tick, what number that matters,
what are scientific knowledge around the domain
● Businesses exist for they key purpose of making profit,
which usually translates to; increase sales & reduce
cost
– Find how to help your organization business by collecting
data and analyze to produce visualizations that will help in
organization make more profit
Math & Statistics
● Find that old textbook you had from university, and
study them again ;-)
● Learn, understand and start to apply how statistics can
be used for estimation, predictions.
Programming & Information System
● If you haven't know programming yet, start to pick up one
– I suggest Python as it has strong background in scientific computing
communities, and was designed by a mathematician – Guido Van Rossum
– Though I'm a biased parseltongue :P
– Books:
●
Packt's Practical Data Analysis
●
How to Think Like A Computer Scientist
● SQL is important
– Pretty much the most mature method for declaring data queries
● Pick up Big Data technologies to help you handle massive datasets
One more thing
http://pysiphae.rtfd.org
Thanks
Contact:
Izhar Firdaus (KageSenshi)
izhar@abyres.net / kagesenshi.87@gmail.com
+60172792765

Más contenido relacionado

La actualidad más candente

Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...DATAVERSITY
 
Choosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to useChoosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to usemark madsen
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data scienceTanyaAgarwal71
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Formulatedby
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in dataDavid Rostcheck
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data scienceTyrone Systems
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsMotaz Saad
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016iECARUS
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Simplilearn
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceVignesh Prajapati
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science OverviewDavide Mauri
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesUpXAcademy
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big DataIndu Khemchandani
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data AnalyticsS P Sajjan
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino Data Lab
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data ScienceAndrew Gardner
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapSrinath Perera
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesFormulatedby
 

La actualidad más candente (20)

Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
Smart Data Slides: Data Science and Business Analysis - A Look at Best Practi...
 
Choosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to useChoosing which big data, nosql or database technology to use
Choosing which big data, nosql or database technology to use
 
Top career opportunities in data science
Top career opportunities in data scienceTop career opportunities in data science
Top career opportunities in data science
 
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
Data Science Salon: Culture, Data Engineering and Hamburger Stands: Thoughts ...
 
New professional careers in data
New professional careers in dataNew professional careers in data
New professional careers in data
 
5 ways to get more from data science
5 ways to get more from data science5 ways to get more from data science
5 ways to get more from data science
 
Data Mining and Business Intelligence Tools
Data Mining and Business Intelligence ToolsData Mining and Business Intelligence Tools
Data Mining and Business Intelligence Tools
 
Data science fin_tech_2016
Data science fin_tech_2016Data science fin_tech_2016
Data science fin_tech_2016
 
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
Data Scientist vs Data Analyst vs Data Engineer - Role & Responsibility, Skil...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Data Science Overview
Data Science OverviewData Science Overview
Data Science Overview
 
How to crack Big Data and Data Science roles
How to crack Big Data and Data Science rolesHow to crack Big Data and Data Science roles
How to crack Big Data and Data Science roles
 
Big Data Presentation
Big Data PresentationBig Data Presentation
Big Data Presentation
 
Intro to Data Science Big Data
Intro to Data Science Big DataIntro to Data Science Big Data
Intro to Data Science Big Data
 
Presentation on Big Data Analytics
Presentation on Big Data AnalyticsPresentation on Big Data Analytics
Presentation on Big Data Analytics
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
Big Data and the Art of Data Science
Big Data and the Art of Data ScienceBig Data and the Art of Data Science
Big Data and the Art of Data Science
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Big Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and RoadmapBig Data Analytics Strategy and Roadmap
Big Data Analytics Strategy and Roadmap
 
Data Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business ProcessesData Science Salon: Applying Machine Learning to Modernize Business Processes
Data Science Salon: Applying Machine Learning to Modernize Business Processes
 

Similar a So you want to be a Data Scientist?

Demand For Data Scientist
Demand For Data ScientistDemand For Data Scientist
Demand For Data ScientistZaranTech LLC
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analyticssunnypatil1778
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargShiv Shakti Ghosh
 
ds.pptx
ds.pptxds.pptx
ds.pptxElves3
 
Applied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptApplied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptJonathan Sedar
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st centuryMartinFrigaard
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career pathRubikal
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 

Similar a So you want to be a Data Scientist? (20)

Demand For Data Scientist
Demand For Data ScientistDemand For Data Scientist
Demand For Data Scientist
 
data science and business analytics
data science and business analyticsdata science and business analytics
data science and business analytics
 
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGargColloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
Colloquium(7)_DataScience:ShivShaktiGhosh&MohitGarg
 
ds.pptx
ds.pptxds.pptx
ds.pptx
 
Week1day2 (1)
Week1day2 (1)Week1day2 (1)
Week1day2 (1)
 
Applied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science DeptApplied AI Tech Talk: How to Setup a Data Science Dept
Applied AI Tech Talk: How to Setup a Data Science Dept
 
Data fluency for the 21st century
Data fluency for the 21st centuryData fluency for the 21st century
Data fluency for the 21st century
 
First Steps on Big Data
First Steps on Big DataFirst Steps on Big Data
First Steps on Big Data
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
Data analytics career path
Data analytics career pathData analytics career path
Data analytics career path
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 

Último

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023ymrp368
 

Último (20)

Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Data-Analysis for Chicago Crime Data 2023
Data-Analysis for Chicago Crime Data  2023Data-Analysis for Chicago Crime Data  2023
Data-Analysis for Chicago Crime Data 2023
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 

So you want to be a Data Scientist?

  • 1. So You Want To Be A Data Scientist? What It Means To Be A Data Scientist
  • 2. About:Me Mohd Izhar Firdaus Ismail - Current: Solution Architect @ ABYRES Enterprise Technologies Sdn Bhd - Open Source Activist & (self-proclaimed) Hacker, Open Data Advocate, Fedora Ambassador, Data Architect, Data Engineer, Consultant, Python Programmer, Analyst, Trainer, and bunch of other hats ;-) - Contributing to Open Source projects for over 8 years - Over 6 years building systems related to data, content, information and knowledge management - http://linkedin.com/in/kagesenshi - izhar@abyres.net / kagesenshi.87@gmail.com
  • 3. The People I Work For ● Open Source Technology Company – Specialize in Cloud, Big Data & Enterprise Application Development – Red Hat & Hortonworks Partner ● IT Consulting & Professional Services around Open Source Softwares – Design, development, implementation and training services – Consulting practice around leveraging Open Source technologies and implementing Big Data project ● The largest organized mafia of pure play open source geeks in Malaysia ;-)
  • 4. Before I Start Some people call me a data scientist, But I don't consider myself one (yet) (( its a personal integrity thing – Machine Learning & Stats is not (yet) my strong point )) But I do work quite a bit with data: designing application, infrastructure, algorithms, processes and pipelines for big data workload – from data acquisition to visualization
  • 5. Who is A Data Scientist?
  • 6. "Data scientists are involved with gathering data, massaging it into a tractable form, making it tell its story, and presenting that story to others." - Mike Loukides, VP, O’Reilly Media. "A data scientist is someone who can obtain, scrub, explore, model and interpret data, blending hacking, statistics and machine learning. Data scientists not only are adept at working with data, but appreciate data itself as a first-class product." - Hillary Mason, Data Scientist, Accel, Scientist Emeritus, bitly, co-founder, HackNY.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. Whats With The Superhuman Requirements?
  • 12. Domain Knowledge & Soft Skills ● Knowledge to find what matters – Knowing the statistics does not mean knowing what is the significance of the results to a business – Business rules, terminologies, problem solving techniques, scientific theories & formulas – Identifying actionable informations ● Problem solving & Hacker mindset – New & creative ways to find, acquire, transform, manipulate, mashing, and using data – Possibily unconventional uses of the same result – Knowing what data needed, and houw to get them, to solve particular business problem
  • 13. Math & Statistics ● People use your output for decision making – wrong numbers might end up with bad decisions – Lies, damned lies, and statistics ● Machine Learning – Predict future values – Analyze patterns in structured and unstructured data – Automated decision support systems
  • 14. Programming & Database ● Programming – Calculating few thousand rows on excel might be okay, but dealing with distributed processing need some skills ● Query over distributed data – you don't want a query that stuck in a single core on a hundreds node cluster – Simple visualizations can be done with drag-drop builders, complex visualization will require you to get yourself dirty – Advanced decision system capabilities can only be implemented through some sort of rule programming – Develop data pipelines both batch and stream – Develop data collection, scraping, machine learning & artificial intelligence softwares ● Database – Ingesting data from various type of sources, managing data format, data storage, governance
  • 15. Communication & Visualization ● Spreading information and discoveries – Presenting data in the form that non- scientist can understand – Knowing how to explain to business users as to why a result matters, how it can be used to benefit the business, organization, society ● Identifying patterns through visual analysis – Some insights might not be obvious when presented in column and rows – Knowing how to visualize information so to make hidden patterns more obvious
  • 16.
  • 18.
  • 19. The Key Differences ● Data Science – Problem solving through strategies around data – Hindsight, Insight, Foresight – Understanding of patterns, behaviors, etc – Automated Data Driven Decision Making ● Data Engineering – Ingestion pipelines – Data integration – Data enrichment – Data cleansing – Data preparation – Data pipeline
  • 21. Hadoop is for Big Data ● Core of "Big Data" – Techniques, technologies & strategies, to handle ingestion, storage, and processing of high velocity, high volume, high variety datasets – Historical data, and not just current state – Transaction + interaction + observation = Big Data
  • 22.
  • 23. Data Science Need Big Data "The reaction of one man could be forecast by no known mathematics; the reaction of a billion is something else again" – Asimov ● Without rich historical data, analysis and development become more challenging – Patterns will start to show itself in rich historical data – Models that accurate with small data, might start to fall apart when more parameters/data are introduced ● Start collecting data today!, you never know when you need it, and when you do, the historical data is there for you to mine
  • 24. Getting Started With Data Science Some tips for beginners
  • 25. Attn. ● Courses, trainings, documents, tools, etc will definitely help you to establish your foundations and basics in data science – but, like any technical field, what important is your ability to mash everything up and apply it to solve problems ● Anybody can learn how to draw, anybody can draw, but not anybody can be an artist.
  • 26. Domain & Business ● Learn more about your industry (or your target industry) ● Learn what make they tick, what number that matters, what are scientific knowledge around the domain ● Businesses exist for they key purpose of making profit, which usually translates to; increase sales & reduce cost – Find how to help your organization business by collecting data and analyze to produce visualizations that will help in organization make more profit
  • 27. Math & Statistics ● Find that old textbook you had from university, and study them again ;-) ● Learn, understand and start to apply how statistics can be used for estimation, predictions.
  • 28. Programming & Information System ● If you haven't know programming yet, start to pick up one – I suggest Python as it has strong background in scientific computing communities, and was designed by a mathematician – Guido Van Rossum – Though I'm a biased parseltongue :P – Books: ● Packt's Practical Data Analysis ● How to Think Like A Computer Scientist ● SQL is important – Pretty much the most mature method for declaring data queries ● Pick up Big Data technologies to help you handle massive datasets
  • 29.
  • 32. Thanks Contact: Izhar Firdaus (KageSenshi) izhar@abyres.net / kagesenshi.87@gmail.com +60172792765