History of Data Mining and Big Data …
What is the Big Data ?
What are the real life dimensions for Big Data ?
How to use Big Data for STEM and INFONOMICS?
Analytical Case studies and tools using Big Data fintech examples
What is the future of Data Science ?
1. Real life for Big Data:
what is data science ?
Irina Muhina,
PhD in AI with 25 years practical experience,
Big Data and STEM Expert, Founder of iECARUS,
President of ERUDITE school
iECARUS is your concierge for educational intelligence.
www.iecarus.com
September, 2016,
Russia
The future belongs to the companies аnd people that turn data into products.
2. Agenda
• History of Data Mining and Big Data …
• What is the Big Data ?
• What are the real life dimensions for Big Data ?
- return on investment (ROI)
- amount of real-time data
- demand for data scientists job and average compensation packages
- expectations for the data scientist
- salaries for data scientists
How to use Big Data for STEM and INFONOMICS?
• Case studies and tools using Big Data examples from industries:
– Trading strategy analysis
– Parametric and distribution analysis
– Two-regimes risk model
– Correlation analysis with different cut-off
– Optimization models with re-sampling
• What is the future of Data Science ?
3. History of data mining
https://rayli.net/blog/data/history-of-data-mining/
8. on the 2012 list of most ambiguous terms -
Global Language Monitor most
searched term among clients –
on Gartner.com
Big Data initiatives
Traditional DW & BI Big Data & Advanced Analytics
Big Data is #1
Requirements-based
Top-down design
Integration and reuse
Competence centers
Better decisions
Enterprise
Opportunity-oriented
Bottom-up experimentation
Immediate use
Hackathons
Business innovation
Functional
9.
10.
11. Who is a Data Scientist ?
• Works more closely with multiple teams when compared to
statisticians
• always expected to work with types of big data — operational
technology, text, streaming
• Combinations of mathematics, statistics, machine learning and
algorithmic processing
• Demand for communication skills much more frequently than BI
or statistics roles
• Have to be able to code, write and present well
Current roles:
• Solution architect
• Business analyst
• Requirements analyst
• Data modeler
•Data integration lead
•Data integration
developer
•Report writer
•BI platform lead
•Database administrator
•User trainer
•Data steward
12. Success of Data Science Solutions:
skills, roles, responsibilities
17. The Hadoop Distributed File System (HDFS) is a distributed file system designed to run on commodity hardware.
It has many similarities with existing distributed file systems.
However, the differences from other distributed file systems are significant. HDFS is highly fault-tolerant and
is designed to be deployed on low-cost hardware.
HDFS provides high throughput access to application data and is suitable for applications that have large data sets.
HDFS relaxes a few POSIX requirements to enable streaming access to file system data. HDFS was originally built as
infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject
The project URL is http://hadoop.apache.org/hdfs/.
Master Management System
Database Management System
Hybrid information architectures
19. Anticipate, govern and hedge information-borne risks.
Data is the new currency and new asset.
Likelihood of optimistic,
pessimistic and realistic
scenarios .
20. My role is a translator: from business to analytics to IT
and back to business.
25. O’Reilly Data Science
Salary Survey, we’ve analyzed input from 983 respondents
working in the data space, across a variety of industries—
representing 45 countries and 45 US states and 3/5 from US
representing 45 countries and 45 US states.
26.
27. There is a difference of $10K
between the median salaries of
men and women. Keeping all other
variables constant—same roles,
same skills—women make less than
men.
28.
29.
30.
31.
32. • How to use Big Data for STEM ?
Emerging Role of the Data Scientist the Art of Data Science for IT,
business The Birth of Infonomics, the New Economics of Information
33. Real projects using Big Data
case studies and tools from industries
• Trading strategy case study
• Parametric and distribution case study
• Two-regimes risk model case study
• Correlation analysis with different cut-off
• Optimization models with re-sampling
Analytical Tools
Excel, SAS, SPSS, R , SQL, Tableau,
MatLab, Watson , Hadoop
40. Daily price crossing 50D EMA of
ACWI seems to be a good strategy
Price crosses EMA from below, go overweight
Price crosses EMA from above, go underweight
Different trading strategies analysis
Trade benefit VS Trade length
Bad trades tend to be very short,
i.e. occur when the model is
switching between overweight
and underweight rapidly
41.
42.
43.
44.
45. 3 Scenarios for the Future of Data Science
•Big Data Ventures
Data Science will be practiced exclusively by companies
specializing in big data analytics
•Big Data Accountants
Data Science will become a specialized, in-house function,
similar to today’s Accounting, Legal, and IT departments.
•Everybody’s a Big Data Expert
The vision of “data democracy” will come true and everybody in
the organization will create and consume big data. Data
science fundamentals will be thoroughly integrated in all levels
of management education.
https://whatsthebigdata.com/2012/03/12/3-scenarios-for-the-future-of-
data-science/
50. No one knows for certain what the future can bring, but
without vision, how can we achieve our dreams?
www.gartner.com
www.theoryandpractice.ru
www.ted.com
www.zonein.ca/virtual-child
www.digcompass.ca
www.ictc-ctic.ca
www.computingcareers.acm.org
www.tfsa.ca/centre-of-excellence
http://thinkbigdata.in/
http://data-informed.com/
If you have questions about this presentation you could
write us at iecarus.ca@gmail.com