Más contenido relacionado
La actualidad más candente (20)
Similar a Scalding @ Coursera (20)
Scalding @ Coursera
- 3. Technical (Online Stack)
• 100% hosted on AWS
• Service-oriented architecture
• Mix of MySQL and Cassandra for persistence
• Scala
- 9. {
"typeName": "multipart",
"definition": {
"assignmentParts": {
"id1": {
"typeName": "plainText",
"order": 0,
"definition": {
"prompt": "Write a sentence describing what you think about cereal."
}
},
"id2": {
"typeName": "richText",
"order": 1,
"definition": {
"prompt": "Write a long essay with lots of fancy formatting describing what you
think about cereal."
}
},
"id3": {
"typeName": "url",
"order": 2,
"definition": {
"prompt": "Post a link to your favorite cereal."
}
},
"id4": {
"typeName": "plainText",
"order": 3,
"definition": {
…
- 11. Hive
• SQL-like language
• Great for simple rollups and aggregations
• Procedural transforms difficult to express
- 13. Scalding – Pros
• Succinct
• Expressive
• All code in one language
• Re-use online data models
- 15. Scalding – Cons
• Have to learn Scala
• More heavy weight for simple experimental things.
• Many layers abstracted from MapReduce
- 17. Scalding – Example
val events = TypedTsv … /* load data */
.toTypedPipe
val courses = TypedTsv …
.toTypedPipe
val topics = TypedTsv …
.toTypedPipe
- 18. Scalding – Example
events.groupBy(_.courseId)
.leftJoin(courses.groupBy(_.courseId))
.groupBy(_._2.topicId)
.leftJoin(topics.groupBy(_.topicId))
/* more analysis */
- 19. Scalding – Example
events.groupBy(_.courseId)
.leftJoin(courses.groupBy(_.courseId))
.groupBy(_._2.topicId)
.leftJoin(topics.groupBy(_.topicId))
/* more analysis */
- 20. Scalding – Example
events.groupBy(_.courseId)
.leftJoin(courses.groupBy(_.courseId))
.groupBy(_._2.topicId)
.sketch(reducer = 100)
.leftJoin(topics.groupBy(_.topicId))