Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

The Evolution of Apache Kylin by Luke Han

1.142 visualizaciones

Publicado el

This deck has been presented by Luke Han at Apache Big Data 2016 NA on May 9, 2016.

Publicado en: Software
  • Hi there! Get Your Professional Job-Winning Resume Here - Check our website!
    ¿Estás seguro?    No
    Tu mensaje aparecerá aquí

The Evolution of Apache Kylin by Luke Han

  1. 1. The Evolution of Apache KylinLuke Han | 韩卿 2016-05-09 Vancouver, Canada
  2. 2. About me… §Luke Han | 韩卿 § Co-creator & VP of Apache Kylin § ASF Member § Co-founder & CEO at Kyligence Inc § § Twitter: @lukehq
  3. 3. Apache Kylin
  4. 4. Why Happiness Latency 10s
  5. 5. What we have tried? Kylin
  6. 6. About Apache Kylin Extreme OLAP Engine for Big Data Apache Kylin is an open source Distributed Analytics Engine designed to provide SQL interface and multi-dimensional analysis (OLAP) on Hadoop, supporting extremely large datasets and sub-second level response time. kylin / ˈkiːˈlɪn / 麒麟 --n. (in Chinese art) a mythical animal of composite form
  7. 7. About Apache Kylin OLAP/数据集市 • Born for Big Data Anlytics • Sub-seconds Latency • ANSI SQL • Seamless Integration with BI Tools • Plug-able Architecture
  8. 8. time, item time, item, location time, item, location, supplier time item location supplier time, location Time, supplier item, location item, supplier location, supplier time, item, supplier time, location, supplier item, location, supplier 0-D(apex) cuboid 1-D cuboids 2-D cuboids 3-D cuboids 4-D(base) cuboid • Base vs. aggregate cells; ancestor vs. descendant cells; parent vs. child cells 1. (9/15, milk, Urbana, Dairy_land) - <time, item, location, supplier> 2. (9/15, milk, Urbana, *) - <time, item, location> 3. (*, milk, Urbana, *) - <item, location> 4. (*, milk, Chicago, *) - <item, location> 5. (*, milk, *, *) - <item> • Cuboid = one combination of dimensions • Cube = all combination of dimensions (all cuboids) OLAP Cube Cube - Balance Between Space and Time
  9. 9. Architecture Map Reduce/Spark Kylin BI Tools, Web App… ANSI SQL
  10. 10. Apache Kylin Journey Go Live at eBay & Open Source on Github Apache Incubator First Apache Release v0.71 InfoWorld: Bossie Award Best Open Source Big Data Tool Apache Release v1.0 Apache Top Level Project Sept 2013 Oct 2014 June 2015 Nov 2015 Nov 2014 Sept 2015 § Kyligence founded Mar 2016 Project kickoff
  11. 11. Apache Kylin Global Adoptions
  12. 12. Use Case:
  13. 13. Use Case: Baidu Map
  14. 14. Use Case: NetEase
  15. 15. Performance and Throughput By NetEase:
  16. 16. The Evolution
  17. 17. Apache Kylin New Features § Plugin-able architecture § New MR Cube Engine with fast cubing (1.5x faster) § New HBase Storage with parallel scan (2x faster) § Near real-time analysis § User defined aggregations § Excel / PowerBI / Zeppelin integration
  18. 18. The Freedom, Extensibility, Flexibility § Freedom § Zoo break, not bound to Hadoop any more § Free to go to a better engine or storage § Extensibility § Accept any input, e.g. Kafka § Embrace next-gen distributed platform, e.g. Spark § Flexibility § Choose different engine for different data set
  19. 19. New generation design Cube Builder (MapReduce…) SQL Low Latency - SecondsRouting 3rd Party App (Web App, Mobile…) Metadata SQL-Based Tool (BI Tools: Tableau…) Query Engine Hadoop Hive REST API JDBC/ODBC Ø Online Analysis Data Flow Ø Offline Data Flow Ø Clients/Users interactive with Kylin via SQL Ø OLAP Cube is transparent to users Star Schema Data Key Value Data Data Cube OLAP Cubes (HBase) SQL REST ServerData Source Abstraction Engine Abstraction Storage Abstraction
  20. 20. MR Engine IN OUT Hive Source HBase Storage Cube Metadata SourceFactory StorageFactoryEngineFactory Plug-able architecture
  21. 21. Plug-able architecture MR Engine Hive Adapter HBase Adapter load data save cubeHive Source HBase Storage adapt to IN adapt to OUT
  22. 22. Parallel Scan § Slow queries are 5-10x faster. § New Hbase storage enables partition on cuboids that are big enough. § Overall query time is 2x faster than before, sum results from 10,000+ queries. Query Cuboid A Cuboid B Query A1 B1 A2 B2 A3 C Cuboid C Server 1 Server 2 Server 3 Server 1 Server 2 Server 3
  23. 23. Near Realtime Incremental Build n Minutes micro cubes n Kafka source n In-mem cubing n Auto merge
  24. 24. User Defined Aggregation Types § HyperLogLog Count Distinct § TopN § BitMap Precise Count Distinct § from Sun, Yerui ( § Raw Records § from Wang, Xiaoyu (
  25. 25. Support more BI & Visualization Tools § Supports Tableau 9.1 § Supports MS Excel § Supports MS Power BI § Supports Zeppelin
  26. 26. Roadmap
  27. 27. Apache Kylin Roadmap
  28. 28. 2016 Focus… § Streaming and Real Time § Performance, performance and performance § Support more BI & visualization tools § SQL & OLAP Functions.
  29. 29. Q & A §More… §Website: §Twitter: @ApacheKylin §Contact Me: § § @lukehq