This document provides information about a data science course taught using Apache Spark and Apache Hadoop. It introduces the instructors Sean Owen and Tom White and describes what data science is and the roles of data scientists. Data scientists have skills in engineering, statistics, and business domains. The document discusses why companies need data scientists due to the growth of data and its value. It presents the tools used in data science, including Apache Spark, and how Spark can be used for both investigative and operational analytics. The course teaches a complete data science problem process through hands-on examples using tools like Hadoop, Python, R, Hive, and Spark MLlib.
You’re the one (SME) that understands the business you operate in, we can never teach that knowledge. We are here to give you the tools to analyze it.
Code not just for their own use
Easy to get nonsense
Simpson’s paradox can be artifact or real. 1964 voting rights act exapmle
Building infrastructure .
Don’t want data scientists, want INFRA, MODELS, INSIGHTS
More Qs asked == better results/insights
Moneyball example: buying wins, for which you ened to buy runs. Not players.
Don’t want data scientists, want INFRA, MODELS, INSIGHTS. For that, you need to be able to ask and answer lots of questions, using the minimum resources. Fewer questions than data analyst, better than software engineers. Qs == runs in moneyball exmaple
Someone coming from the business org. Software engineers with experience in these areas.A more difficult student case for this course to accommodate would be a student coming in with neither a computing nor analytics background, although a dedicated student should be able to complete this course coming from this background as well.