Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Spark Streamingによるリアルタイムユーザ属性推定

1.441 visualizaciones

Publicado el

Spark Meetup December 2015 http://connpass.com/event/23159/ 発表資料

Publicado en: Datos y análisis
  • People used to laugh at me behind my back before I was in shape or successful. Once I lost a lot of weight, I was so excited that I opened my own gym, and began helping others. I began to get quite a large following of students, and finally, I didn't catch someone laughing at me behind my back any longer. CLICK HERE NOW ●●● https://tinyurl.com/1minweight4u
       Responder 
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

Spark Streamingによるリアルタイムユーザ属性推定

  1. 1. Spark Streaming / @laclefyoshi <ysaeki@r.recruit.co.jp>
  2. 2. • • Spark Streaming • • • Spark Streaming Tips • 2
  3. 3. : / SAEKI Yoshiyasu : IT : Web 4 9 R&D Hadoop, Kafka, Storm, Spark, Druid : RICOH Theta ( ) + Google Cardboard 3
  4. 4. Spark Streaming http://spark.apache.org/docs/1.5.2/streaming-programming-guide.html 4
  5. 5. 5
  6. 6. • • = • • http://www.recruit.jp/company/about/structure.html 6
  7. 7. • • ≒ … • • ! OS etc. 7
  8. 8. 1. Web 
 (JavaScript) 2. fluentd Kafka 8
  9. 9. : fluentd → Kafka • fluent-plugin-kafka • https://github.com/htgc/fluent-plugin-kafka • output type = kafka_buffered (on file) • Kafka 0.8.2.2 • 0.9.0 • ACL 9
  10. 10. 10
  11. 11. Suro • Netflix • https://github.com/Netflix/suro • : Kafka Consumer API Thrift API • : • HDFS • AWS S3 • Kafka Producer • Elasticsearch • 11 LinkedIn Gobblin
  12. 12. Hadoop • • HDFS • MLlib 
 • Streaming linear regression (Classification) • Streaming k-means (Clustering) • 12
  13. 13. Spark Streaming 13
  14. 14. Kafka • Direct Approach (>= Spark 1.3) • • Exactly-once • Kafka Simple Consumer API Direct Approach 14
  15. 15. Spark Streaming 1 15 http://spark.apache.org/docs/1.5.2/streaming-programming-guide.html RDD @ time1 RDD @ time2 RDD @ time3 RDD @ time4
  16. 16. Spark Streaming 2 16 http://spark.apache.org/docs/1.5.2/streaming-programming-guide.html
  17. 17. Micro-batch 17 1Micro-batch (Cookie)
  18. 18. Window-based micro-batch 1 1Micro-batch1Micro-batch 18
  19. 19. Micro-batch • RDD HBase dstream.foreachRDD { rdd => val hbaseConf = createHbaseConfiguration() val jobConf = new Configuration(hbaseConf) jobConf.set("mapreduce.job.output.key.class", classOf[Text].getName) jobConf.set("mapreduce.job.output.value.class", classOf[Text].getName) jobConf.set("mapreduce.outputformat.class", classOf[TableOutputFormat[Text]].getName) new PairRDDFunctions(rdd.map(hbaseConvert)).saveAsNewAPIHadoopDataset(jobConf) } // RDD[(String, Map[K,V])] RDD[(String, Put)] def hbaseConvert(t:(String, Map[String, String])) = { val p = new Put(Bytes.toBytes(t._1)) t._2.toSeq.foreach( m => p.addColumn(Bytes.toBytes("seg"), Bytes.toBytes(m._1), Bytes.toBytes(m._2)) ) (t._1, p) } 19 0.5 1
  20. 20. 20
  21. 21. Spark Streaming : • DStream RDD • Spark 
 Spark Streaming 21 http://spark.apache.org/docs/1.5.2/streaming-programming-guide.html
  22. 22. Spark Streaming : • Fault Tolerance • Micro-batch • YARN • YARN Dynamic Resource Allocation • 22
  23. 23. Spark Streaming : • : → 
 RDD → RDD DStream → DStream • 1Micro-batch 23 // RDD → RDD val input:RDD[String] = sparkContext.makeRDD(Seq("a", "b", “c")) // DStream → DStream val queue = scala.collection.mutable.Queue(rdd) val dstream:DStream[String] = sparkStreamingContext.queueStream(queue)
  24. 24. Spark Streaming : • spark-testing-base • https://github.com/holdenk/spark-testing-base class JsonElementCountTest extends StreamingSuiteBase { test("simple") { val input = List(List("aa"), List("bb")) val expected = List(List("AA"), List(“BB")) testOperation[String, String]( input, converterMethod _, expected, useSet = true) }
 } 24
  25. 25. Spark Streaming : • Window-based micro-batch • • o.a.spark.streaming.util.ManualClock
 • private class Scala • http://mkuthan.github.io/blog/2015/03/01/spark- unit-testing/ 25
  26. 26. Spark Streaming : • Scala Java • • Spark Streaming Kafka HBase Scala • Java 26 // api/java/JavaRDD.scala object JavaRDD { implicit def fromRDD[T: ClassTag](rdd: RDD[T]): JavaRDD[T] = new JavaRDD[T](rdd) implicit def toRDD[T](rdd: JavaRDD[T]): RDD[T] = rdd.rdd }
  27. 27. 27 • • • = • Spark Streaming • MLlib • GraphX

×