If you ask people what BIG DATA is they often say it is about a lot of data. But the world has ALWAYS had a lot of data. It is about datafication – a word so new even spellcheck functions don’t know it is a real word!
Learn more about:
» How BIG DATA changes career paths of even the most unsuspecting?
» How BIG DATA changes the way business decision are made?
» How BIG DATA changes who makes those decisions & the reshuffle of the balance of power it causes?
» What BIG DATA skills can you bring to the office tomorrow to increase your value to the firm
5. Today’s Agenda
If you ask people what BIG DATA is they often say it is about a lot of
data. But the world has ALWAYS had a lot of data! It is about
datafication – a word so new that even spellcheck functions don’t
know it’s a real word!
Today’s Agenda
How BIG DATA changes career paths of even the most unsuspecting!
How BIG DATA changes the way business decision are made.
How BIG DATA changes who makes the decisions & the reshuffling balance of power.
What BIG DATA skills can you bring to the office tomorrow to increase your value.
6. The experienced
Data scientists &
those managers
who leverage
them.
BIG DATA is a management tool even if you have other employees perform
the coding.
BIG DATA is as ubiquitous as the internet.
Gut instinct now
of less value
7. Datafication
A modern technological trend turning
many aspects of our life into computerized
data that transforms respective
information into new forms of value.
15. Just as voice mail and email obviated the manager’s need of
secretarial functions algorithms eating BIG DATA are now
obviating tactical managerial functions.
Transactional
Work
Tactical
Work
16. Strategy needs to consume data.
Data, without strategy, has little value.
17. Modified sine wave
Sine wave
What is the
difference between
analogue
and
digital?
Datafication
only possible due to digitalization of
analogue informaton.
41. Shigeomi Koshimizu datafied body contour (body,
posture, weight distribution, etc.).
Quantified “sitting down.” Measured pressure drivers
exert at 360 different points via sensors (0 to 256 scale).
Quality Quantify
Datafication Turns Everything into a Data Point
46. Place sensors on parts to identify heat
& vibrational patterns associated with
failures leading to breakdowns.
Can predict a breakdown before it happens &
replace parts in garage & not on side of the road.
Data does not tell us why the part is in trouble
It reveals enough to know the what
Can guide investigations into discovering underlying cause
Causation to Correlation
47. When saving lives, knowing something is likely
to occur more important than knowing why.
Eventually, “the why” will be investigated.
48. Can Big Data Save Babies?
Used Big Data to spot infections in premature
babies before symptoms appear.
Information flow >1000 data points per second
Discovered correlations between very minor changes and more serious problems
49. Big Data Predicts Epidemics Better than CDC
CDC tracks patient visits to clinics
Information suffers from 2 week reporting lag
Google took 50 mm most commonly searched terms from 2003 – 2008
Compared them against historical influenza data from CDC.
Searches then correlated with CDC’s data on outbreaks of flu.
50. How All Three Shifts Are Illustrated
Small to All
Ran 100% of US searches for 6 years through an algorithm
identified 45 searches correlated against CDC data on flu outbreak (runny nose,
body aches, etc. - ).
Clean to Messy
Searches imperfect with misspellings, incomplete phrases & included healthy
people searching on behalf of others.
Causation to Correlation
Will anyone claim typing symptoms in a search engine gives you the flu?
Big Data via searches predicts outbreaks real time compared to
CDC’s traditional data analytics that lag 2 week lag
52. NYC created database of 900K buildings augmented
by troves of data collected by 19 agencies:
• Records of tax liens
• Anomalies in utility usage
• Service cuts
• Missed payments
• Ambulance visits
• Local crime rates
• Rodent complaints
• Etc.
Big Data
increases the
productivity of
each inspector
53. How Did They Do It?
1. Compared database (5 years of building fires)
2. Ranked by severity
3. Observed correlation. (Not causality!)
4. Data scientists triaged complaints for inspections.
Concluded that a building’s:
type & age main predictor of fire; other variables superfluous
permit for exterior brickwork correlated lower risk of fire.
Result: Vacate orders increased from 13% to 70%
Building characteristics did not cause fire but were correlated with fire risk.
54. Spending money on the exterior
correlates for an up to code interior
But just the intent to begin work
correlates enough to predict an outcome
55.
56. Pull disparate sets of texts & puts them into a
“point of singularity.”
Currently ae 70% of data is text. Pictures to be
quantified under separate protocols.Create a Corpus body of text to
be analyzed.
R, for example, has set of functions to clean up a Corpus by excluding data points
superfluous to analysis. (Delete commas, periods & words such as but & and, etc. –
R cleans up files by reducing corpus to primary words crucial to analysis.
Truncates words with common stem this is called stemming. (e.g. engineer &
engineering both become the same word. Think of mathematical analogy of
number factoring versus least common dominator.
1
2
3
57. 4Mathematical matrix to describes frequency of
terms that occur in a collection of documents.
Rows correspond to documents in the collection
& columns correspond to terms.
Create a document term matrix that measures
frequency of words that remain after corpus
“cleanup” discussed in previous slide.
4
You are left with primary
outputs that enable you to do
counts in each cell.
You’ve datafied or quantified
words that others only qualify
that prevents analysis.
You can now do lots of
interesting stuff!
Term document matrix cluster
analysis reveals prevalent themes.
Document-term matrix
58. Cluster analysis review at how all your words cluster in your data matrix cluster.
The result of this analysis is that we can reduce our matrix to fewer columns.
Font Size & even
Color embedded
with information.
This information
is actionable.
59. For centuries we have manually counted sets of
words to determining their frequencies.
Zipf's law states that given some corpus of
natural language utterances, the frequency of any
word is inversely proportional to its rank in the
frequency table.
Used for resumes as a way to
increase information density – to
be covered at a future webinar.
60. With these data sets, we can run sentiment analysis!
Determine occurrence rate of certain themes qualified as opinions.
To determine if people like a restaurant we’d look at words
reviewers used via social media in the comment section.
Love
10
Hate
-10
Dislike
- 7
Qualitatively, we quantify the
weakness or strength of these signals.
We determine words that correlate to
having disliked or liked the movie and
to what degree along a predetermined
discreet continuum .
Pre-establish words in
narrative responses now
embedded in clusters
signal positive or negative
statements about a movie,
restaurant or Hammacher
Schlemme customer
review.
Like
7
61. The difference between analog and digital signals is that
an analog signal is a continuous electrical message while digital is a
series of values that represent information.
62. To determinate what traits can predict future outcomes, look at historical data.
Correlate “judgements” to see if they can predict from groupings, meaning which
ones predict against other dataset.
This is cross validation and is determined by looking at historical data sets.
Master Algorithms script other
algorithms on an at need basis
free of human interaction.
Machine to machine (M2M) technology that
enables networked devices to exchange
information & perform actions without the
manual assistance of humans.
This is what is replacing traditional
managerial jobs.
Firms that still employ these types of
jobs feel less pressure to keep salaries at
pace with inflation over time.
63. Machine learning can test statistical models. ….. for
example, testing against known political party membership
& updating the algorithm as new data comes in.
In M2M, we let data points come in, refresh & update to
automatically script even more accurate algorithm.
Can infer your political affliction by
first 19 likes even if those likes are
completely apolitical.
64. What Can I Do Tomorrow Morning at the Office?
1. Take inventory of the data you already collect
A. Internal data.
B. External data accessed from FOI Act – to be discuss subsequently.
C. External data legally purchased from vendors (Yelp, FB, Double Click, etc.) -
D. Create glossary of data definition. (headcount example)
2. Determine decisions to derive from Big Data
A. Select most pressing problem based on Pareto 80/20 rule.
B. In plain English, state your problem statement.
C. Write down independent variables (inventory set of data at your disposal.)
D. Determine dependent variable (preferred outcome to your problem statement.)
3. Write down your hypothesis
4. Contact your IT or data science department. If not …..
5. Contract STEM grad students & turn them into data scientists
6. Code your hypothesis
Even if I hate coding and math!
QuantitativeSkills
65. The Freedom of Information
Act (FOIA), 5 U.S.C. § 552, is a
federal law that allows for
disclosure of previously
unreleased information controlled
by the US government.
Correlate to external
data with troves of data
from US gov’t.
(Examples: MTA apps)!
Enacted in 1966, allows
U.S. citizens to petition
government for official
information.
66. Business problem you are trying to solve in plain language stated as a
problem statement
State it in a hypothesis.
Collect Data, from systems
already set in place.
Test hypothesis
67.
68. Coding is
the new
literacy.
Coding Classes.
Most are on-line, a
few on-site.
Some free & some
at cost.
Most of you will not be competing
with other coders – just other
Marketing, HR or Financial
professionals who know nothing
about coding!
69. Should I learn to read?
Should I learn how to use the internet?
Should I learn about coding?
70.
71.
72. A little about R• R – Free
• Contains embedded tools to pull external data
• Tools that scrape data from any website, (Reuters, as one example)
• Text Mining: Knime (another software tool for text mining) – you can
download it. (pronounced like 9 but with a “m”. Has graphical interface
instead of using a scripting language.)
• Remember, Word Clouds is an example of text mining.
• R was written in C language – coders wrote functions in “C” to create
macros in R to pull data - analogous to a macro in excel.
• R will let you pull data into a corpus.
KNIME - Konstanz Information Miner open source data analytics, reporting & integration
platform. It integrates various components for machine learning & data mining.
73. You’re not competing against other coders.
You’re competing against others in your field that know
nothing about coding.
75. Datafication turns all aspects of life & turns it into data.
Google’s
augmented reality
glasses datafy the
gaze
Twitter datafies
stray thoughts
LinkedIn
datafied
professional
networks
77. Download a Coupon Texted to You
• What aisles did you walk down or ignore?
• In what sequence did you browse the aisles?
• How long were you in the store?
• What is length of time between store visits?
• How long did you linger in front of the cereal aisle?
• When you checked out, did cereal wind up your cart? How many boxes?
Compare viewing patterns with what wound up in your shopping cart.
Script algorithms to better predicts independent variables (what
they stock) with the depend variable of revenue thresholds.
78. So, what’s my role again in a Big Data World?
As Big Data becomes ubiquitous what skills mark points of differentiations?
Discovering latent needs & intuition that goes against the facts?
The mere ability to define a problem proceeds its solution
Big Data has a quantitative & qualitative side
And if you hate math - qualitative skills to harness
Develop observational skills to separate signal from the noise
Take inventory of existing data
Learn to develop hypotheses to test
Learn how to access external data (FOIA. LinkedIn, etc. - )
Liaison between internal ERP data & external data
Network with STEM student to contract data scientists
82. Recommended Courses
NetCom Learning offers a comprehensive portfolio for Big Data training
options. Please see below the list of recommended courses with upcoming
schedules:
Introduction to Python Programming
Essential Python
Introduction to Python Scripting: for the Security Analyst
Check out more Big Data training options with NetCom Learning. CLICK HERE
83. Our live webinars will help you to touch base a wide variety of IT, soft skills and business
productivity topics; and keep you up to date on the latest IT industry trends. Register
now for our upcoming webinars:
A Brief on Benefits of ITIL for the Organization – April 4
Visualization with Tableau to Enhance Efficiency in Organization – April 6
How Machine Learning Helps Organizations to Work More Efficiently? – April 11
Why Certified Associate in Project Management (CAPM) and How to Prepare? - April 18
A Brief About DevOps and its Practices – April 20
84. Special Promotion
Whether you're learning new IT or Business skills, or you are developing
a learning plan for your team, for limited time, register for our
Guarantee to Run classes and get 25% off on the course price.
Learn more»
85. To get latest technology updates, please follow our social media pages!