Presentation to Analyze Boulder on Sept 3 2014 by the Data Detectives of Boulder (https://www.linkedin.com/groups?home=&gid=6525462). Sharing our experiences over the past 3 years with MOOCs, Kaggle, etc.
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Staying Competitive in Data Analytics: Analyze Boulder 20140903
1. H o w t o S t a y
C o m p e t i t i v e i n
D a t a A n a l y t i c s
Data Detectives of Boulder
Richard Hackathorn, Jennifer Brendle, Scott Oetting & Jay Brophy
with contributions from Mike McUne, Karen Blakemore, Ali Ongun,
Larry Rupp, Jon Bates, Sara Bates, Brian Feifarek
https://www.linkedin.com/groups?home=&gid=6525462
2. Never Stop Learning…
• Massive open online courses (MOOCs)
- learning at scale
• Coursera, Udacity, edX…
• Certificate Programs
• Johns Hopkins Specialization
• Bootcamps (hands-on labs)
Exploding & Diversifying
• Big Data Bootcamp, Denver Oct 17-19, $1,500!
http://globalbigdataconference.com
• University Degree Programs
• Leeds M.S. in Business Analytics
From Richard Hackathorn
4. Johns Hopkins Specialization
Note: The first two
courses in the series are
prerequisites for the
others. The remaining
courses may be taken in
any order and can be
taken in parallel.
1. The Data Scientist’s Toolbox
2. R Programming
3. Getting and Cleaning Data
4. Exploratory Data Analysis
5. Reproducible Research
6. Statistical Inference
7. Regression Models
8. Practical Machine Learning
9. Developing Data Products
plus… The Capstone Project
From Jennifer Brendle
5. PROS CONS
Pre-defined group of related
courses
No direct interaction with
Instructors.
Somewhat self-paced Some strict due dates
Active discussion forums Focuses completely on R and
heavily on Biostatistics
Volunteer Teaching Assistants Unknown value as a credential
compared to MS or more
traditional certification
programs
Develops a portfolio of work Low completion rates. During
its first five months, more than
800K enrolled, but only 266
students have completed all
nine courses
6. Learn Analytics, Then Do
Analytics
• Do Competitions
• Kaggle (especially involvement in the forums)
• CrowdANALYTIX
• HackerRank
• Do Mess with Real Data
• KDnuggets dataset directory http://www.kdnuggets.com/datasets/
• Data Mining Competitions http://www.kdnuggets.com/competitions/
• Social & IoT Data Streams http://www.programmableweb.com/
• Do Know Your Industry’s Data (from Mike McUne)
• Do Appreciate The Classics (like iris, titanic, bird strikes…)
From Richard Hackathorn
7. Depend upon Your Community
• Professional Communities
• IEEE Computer, ACM …
• LinkedIn groups, Data Science Central, KDnuggets…
• Reddit: machine learning, statistics, computer vision
• Technical Communities
• R, Python, and hundreds of others
• Tutorials, galleries, cheat sheets
• Local Communities
• Data Detectives, Analyze Boulder, and
dozens of Denver/Boulder meetups
Lots of learning tips
in Data Detective blog
From Richard Hackathorn
8. What’s In Your Toolbox?
• Lots to choose from…
• 2014 KDnuggets Software Poll – 69 used in real projects*
• “You need to be familiar with different tools, but you also
need to be able to use them. A lot of tools have tutorials
available. You have to be able to use the tools to solve
real problems. That being said, there is not enough time
to become adept at everything. Choose carefully.”
• Balancing excellence with comprehensiveness
• But still do something useful
• Compatibility with your team and company
*http://www.kdnuggets.com/2014/06/kdnuggets-annual-software-poll-rapidminer-continues-lead.html
From Mike McUne
9. What Do Employers Value?
• Still being decided…
• “Unfortunately, I do not know if there is an answer to this
yet because (1) we are in the very early innings, (2) the
technology is rapidly evolving, and (3) the business usage of
analytics is exploding into all kinds of new areas.”
• Depth versus Breadth?
• Specialist or generalist?
• Small toolbox or large toolbox?
• Deep industry smarts or cross-industry savvy?
• Degree, certificate or track record?
• Document your abilities. Build your portfolio!
• “Pro-bono consulting groups, such as Code for Colorado”
• Regardless… “Stick to a focused plan”
From Scott Oetting
10. In Summary
• Lots of resources! Perhaps too many?
• Learn and then do
• Depend upon your community
• Choose your toolbox wisely
• Document your abilities for desired employers
• Need to get serious & stay focused
Join the Data Detectives at BJ’s Boulder on Weds at 11:30 for
group mentoring and discussion. Note our blog on LinkedIn
group for current information.
- https://www.linkedin.com/groups?home=&gid=6525462
Notas del editor
Please fold these thoughts into this slide
Nine Courses, plus a capstone project
Each course runs 4 weeks and is offered every month
Only prerequisites are some programming experience in any language and working knowledge of math up to algebra.
Estimated level of effort is 3-5 hours per week per course, but many students have reported this to be a low estimate, especially those who have no experience with the R programming language.
In order to complete the Capstone Project and receive the certificate, all nine courses must be taken in the Signature Track, which costs $49 per course (allows 1 free retake).
A single payment of $490 for all 9 courses plus the capstone entitles the student to unlimited retakes for 2 years.
Capstone project will include real-world applications. Hopkins recently announced a partnership with SwiftKey, a London-based company that specializes in keyboard and language prediction technology to develop capstone projects.
More recommendations from the Scott Oetting…
From class to class, there is a high degree of variability. Just because the Intro course did not seem too hard, does not mean that all the classes will be the same. They won't be. In retrospect, taking any more than two at a time is inadvisable but, if you insist on taking three or more, audit one for the first two weeks to make sure you can handle the workload. Since the classes are free, you can always and start again next month.
The professors at Johns Hopkins put out a post on the recommended course sequence. Do not second guess it and treat it like the law.
Read and contribute to the Discussion Boards for every class without exception. Even if you think you are getting the class material, you may have a blind spot that will probably discover in one of the forums. Also, you may get an opportunity to help out a complete stranger.
Many of the class projects are open-ended which means that the amount of time you work on can be huge (particularly for the type-A perfectionist).
On the peer assessments, always go through the grading rubic before you start and make sure you follow it no matter how trivial the criteria might seem. Sometimes it gets to be very frustrating when you did a great job but neglected to label a graph or failed to have a descriptive title. Other than that, there seems to be a certain randomness to the grading as it is based on the, at times subjective, perspective of your fellow students ... that is part of being in a MOOC.
Utilize LinkedIn in connection with this program. Connect with your classmates (almost every class with have discussion thread on connecting with each other) and join the LinkedIn groups 'Coursera Specialization - Data Science' and, of course, 'Data Detectives of Boulder'
If you are new to R Programming, you should take the Intro to R Programming class twice. Simply audit it the first time before taking it for real the second. Short changing this class will make it significantly harder down the road.
Unfortunately, this sequence is taught by three professors and the lectures are not all of the same quality. Be prepared for this.
Make sure you get the book 'The Cartoon Guide to Statistics' by Larry Gonick & Woollcott Smith. Why? It covers about 95% of the material in the Statistical Inference class better than I found in the lectures and it is, obviously, cartoons.