Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications - #H2OWorld 2019 NYC

239 visualizaciones

Publicado el

This session was recorded in NYC on October 22nd, 2019 and can be viewed here: https://www.youtube.com/watch?v=xAhQAYV5_PY&list=PLNtMya54qvOE3AvWRCNF2tybxNobUbAYp&index=3&t=2s

Bio: Prithvi is Chief of Technology, Applications at H2O.ai. Prithvi leads the design and development of “Q”, H2O.ai’s high scale exploratory data analysis and analytical application development platform.

Prithvi has been with H2O.ai since its early days and has been responsible for several products including Driverless AI (our flagship automatic machine learning platform), Steam (distributed cluster management, model management and deployment for H2O), H2O.js (Javascript transpiler for H2O’s distributed runtime), Play (on-demand cloud provisioning system for H2O), Flow (a hybrid GUI/REPL/Notebook for H2O) and Lightning (statistical graphics for H2O).

Bio: Shivam Bansal is a Data Scientist at H2O.ai and Kaggle Grandmaster in Kernels Section. He is the three times winner of Kaggle’s Data Science for Good Competition and winner of multiple other offline AI and Data Science competitions.

Shivam has extensive cross-industry and hands-on experience in building data science products. He has helped clients in the Insurance, Healthcare, Banking, and Retail domains to solve unstructured data science problems by building end to end pipelines and solutions.

Publicado en: Tecnología
  • Sé el primero en comentar

  • Sé el primero en recomendar esto

Prithvi Prabhu + Shivam Bansal, H2O.ai - Building Blocks for AI Applications - #H2OWorld 2019 NYC

  1. 1. H2O Q: Building Blocks for AI Applications Prithvi Prabhu Chief of Technology, Applications Shivam Bansal Data Scientist Kaggle Grandmaster
  2. 2. Leland Wilkinson Statistics & Graphics Chicago Prithvi Prabhu Systems & Graphics Mountain View Peter Szabo Systems & Pipelines Košice Lena Rampula Insights Prague SRK Kaggle Grandmaster NLP Chennai Shivam Bansal Kaggle Grandmaster Insights & Storytelling Singapore Ranjith Anantharaman Insights Chennai Pramit Choudhary Insights OC Mathias Müller Kaggle Grandmaster Forecasting Dresden Trushant Kalyanpur Insights Sacramento Team “Q” 10 Contributors 6 Countries
  3. 3. Agenda Part 1 —Premise: Why Q? —What is Q? —Feature Tour —Extending Q Part 2 —Automatic Insights with Q
  4. 4. Why Q?
  5. 5. Visualize Ingest Prep ModelDeploy Refine Analytics = “Information resulting from the systematic analysis of data or statistics” What is analytics? AI/ML/DataScience “BI”/InternalApplications Consumer/End-userApplications Customer / Data Consumer Internal / Business User Data Scientist / ML Engineer Analytics / ML / AI Workflow The three levels of analytical information consumption.
  6. 6. What does it take to build this? Every analytical application needs to: — Ingest, store and retrieve data. — Prepare or transform data. — Handle user inputs (forms / UI). — Filter or search through data. — Display visualizations. — Create or use ML models. — Allow collaboration / sharing. — Make all this fast, fun and easy! Analytics / ML / AI Front End / User Interface Transformed Data Operational Metrics Model Metrics & Predictions Refine Typical Analytical Application Architecture Visualization Forms Search / Filter Database PrepIngest Model Score Collaboration Raw Data Your application logic goes here.— Data Science / ML / AI — Business Intelligence — End-user Applications Applies to: Your application logic goes here.
  7. 7. Needs specialized skills Analytics / ML / AI User Interface Transformed Data Operational Metrics Model Metrics & Predictions Refine Visualization Forms Search / Filter Database PrepIngest Model Score Collaboration Raw Data Your application logic goes here. Your application logic goes here. Data Scientist Data Visualization Specialist Database Developer Front-end Developer Application Engineer Data Engineer
  8. 8. Every stage of analytics requires interactive ad-hoc data exploration and visualization. Every analytical application is in fact a bespoke data visualization application! Visualize Ingest Prep ModelDeploy Raw Data Transformed Data Operational Metrics Model Metrics & Predictions Refine Analytics / ML / AI Workflow Visualization everywhere
  9. 9. Retrofit AI on BI? No! The start of the art has advanced! — Back-end: “BI” is too manual / reactive / Q&A driven — Reports / dashboards not enough: need to be live, proactive, predictive — ML algorithms are better, faster, cheaper at finding insights — Front-end: Drag-and-drop “BI" mental models are clunky to use — Search is a simpler paradigm: get to results quickly — Everyone understands and uses search every day — More powerful with predictive / recommendation capabilities AI + BI need to work as a cohesive whole - not as an afterthought.
  10. 10. Conclusion Building beautiful, usable predictive apps is hard. Doing all this quickly is harder. Doing all this without a diverse set of skills is insanely hard!
  11. 11. Questions — How do we simplify this process? — How do we ease development of AI/ML applications? — How do we rapidly experiment / prototype new ideas? — How do we lower development costs? — How do we reduce time to market? Can we empower data scientists to quickly prototype and deliver interactive predictive applications directly to business users? Data Scientist Business
  12. 12. H2O Q
  13. 13. H2O Q — Provides: — Large-scale analytical data storage — High-performance analytical search + superior UX — Beautiful, high-scale, ad-hoc, interactive, automatic visualizations — Point-and-click ad-hoc data prep — Automatic Machine Learning — Extensible back-end and front-end — Using 100% pure Python! — No front-end programming! — No need to reason about client-server / distributed architecture. — Deploy apps in Months Weeks Days Hours, Minutes!
  14. 14. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text
  15. 15. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text Q Core Building Blocks for AI Applications. Q Apps Your AI Applications and Extensions.
  16. 16. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text (1) Q Store - Distributed analytical database - Column store - Parallel, vectorized query execution - Linearly scalable - Optimized for analytical queries - No pre-aggregation required - Fast!
  17. 17. Compression - Supports zlib and zstd
  18. 18. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text (2) Search w/ Typeahead - Low-latency keyword search - In-memory search index on tables and views - On-demand, optional
  19. 19. - IDE-style fuzzy matching - Typeahead on keywords, schema, data and common NL phrases. Resilent parser Better error reporting, correction suggestions.
  20. 20. Incremental parser Forgiving: know what you did wrong, and always get an answer, unlike SQL!
  21. 21. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text (3) Exploratory Data Analysis - Not a charting library! - Tight integration with Q Store + Search - High scale visualization (tested ~5M marks) - Unique 2-phase incremental rendering: fast static pass followed by JIT interactivity. - Based on Leland Wilkinson's Grammar of Graphics - Advised by Leland Wilkinson!
  22. 22. Notebooks & Collaboration: Editable, Presentable, Free-form
  23. 23. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text (4) Self-service Data Prep: Pipelines
  24. 24. Qlang stdlib: 125+ parallel/distributed functions Math: abs acos asin atan atan2 cos cosh cot coth degrees div exp ln log pi power radians sign sin sinh sqrt square tan tanh ceiling floor round Conditional: if Aggregate: avg count countd corr covar covarp max median min percentile stdev stdevp sum var varp String: contains endswith find findnth left len lower ltrim ltrim_this mid regexp_replace regexp_match regexp_extract regexp_extract_nth replace right rtrim space split startswith str trim upper Date: now today date datetime dateadd day month year datediff datepart datename isdate usec_to_timestamp timestamp_to_usec Conversion: makedate makedatetime maketime datetrunc ascii char float int Misc: attr first ifnull index isnull last max min size zn host tld parse_url parse_url_query
  25. 25. Use NLP in Pipelines!
  26. 26. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text (5) Build Predictive Models - H2O's flagship Driverless AI under the hood: Industry-leading automatic machine learning - For everything else, just import your favorite ML libraries in Q Apps!
  27. 27. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text Q Apps - Q’s extensibility mechanism - Build interactive UI apps - Authored in 100% Python - No HTML/Javascript required
  28. 28. Built for extensibility Q Store 3. Analyze UI Q App 4. Write Data5. Output Results 2. Read Data1. Prompt for inputs Your favorite ML libraries!
  29. 29. Q Apps - Apps run in parallel, managed by scheduler - Full fledged workflow engine - app workflows can run for days/weeks/months - hydrated/dehydrated just-in-time: cheap to run! - Apps run venv isolated: no Docker / Kubernetes required - light on resources - runs on your laptop!
  30. 30. ` Dogfooding! Data connectors, pipelines, automatic insights are all Q apps!
  31. 31. Ready!
  32. 32. 100% Python!
  33. 33. fx Driverless AI EDA Formula Editor Transformation Editor Notebook Editor Automatic Insights Formula ParserTypeahead Visualization Tables, Views & Transformation Pipelines Tables Views Statistics Typeahead Index(Cold) Q Store External Data Sources Typeahead Index (Hot) Fuzzy Matcher Query Translator Query Parser FormulaTranslator Q App Scheduler + Workflow Engine Metadata Store Tables Notebooks Visualizations Pipelines App Data Q Server Q Apps (Python) Pipelines AutoInsights Connectors Frontend Q App API H2O Q System Architecture AI/ML QApp Your App Q App UI QApp QApp Text 1 2 3 4 5
  34. 34. Thank you!

×