3. Connecting the User to UBIX Services
> CLI
tools
UBIX
Platform
Developer
IDE
UBIX SDK
AC
4. Data Analytics Platform
Unified big data technology stack (spark, cassandra, hadoop, kafka, es..)
Cloud agnostic architecture
Universal predictive interface (MlLib, VW, scikit-learn, H20, TensorFlow…)
Extensible and integration via robust and expressive API (DSL)
Enterprise grade: scalability, performance, high availability, geo-replication,
resilience, security, manageability, interoperability, testability
5.
6.
7. DSL
High level abstraction in the form of a simple and concise language that enables power users with advanced analytics
functions and optimal usage of the core technology stack
Enables AC to automate data science
Replaces men/years of development/expertise in big data/ML, 100s of lines of complex code behind single DSL
command
Enables the creation of full data pipelines (ingestion, transformation, aggregations and statistics, predictive modeling,
visualization) with a few lines of code!
Enables real time/streaming, interactive and batch workloads under a unified “lambda” architecture
Ubix ships with an interactive editing/publishing tool (“DSL Workbench") that allows for building analytical pipelines and
publish their output (datasets, models, metrics API) for consumption by user-facing applications
Http (REST), WS (WebSocket) or Akka (tcp, AC-only)
8. DSL Workbench Demo
Goal:
• unified lambda architecture in action: multiple workloads running concurrently
(streaming + interactive + batch) with persistence in different stores (H* + C*
lookup/time series + ES index + ws) serving different query access patterns (large
sequential aggregation/scan/offline training vs random access lookups/range queries
vs search)
• real time processing in async mode (text + sparse + python injection + predict)
• interactive visualisation (plot)
9.
10. Auto Curious
Autonomous data science and analytics discovery
Powers insights businesses at scale
Auto-curious question graph powers self service advanced analytics
Harnesses interactions from user interaction and learns
Auto Curious composes DSL and directs both the engine and the presentation
11.
12.
13.
14.
15.
16. UBIX Applications
Platform Workbench
Work directly with the data.
Low-level analytics for the power user.
AC Model Workbench
Test individual workflows and models.
Audit and adjust workflow parameters.
Data Exploration
Allow any user to visually explore and
analyze the data.
Driven by the Question Graph.
Notas del editor
Autocurious (AC) is designed to automate the construction of analytical (data science) workflows and associated analytical decision-making.
Assuming all has gone well, the workbenches and the airline demo will have been shown, so they’ll already seen these and they need no further callout.
Logical Architecture Diagram
Physical Architecture Diagram
Autocurious (AC) is designed to automate the construction of analytical (data science) workflows and associated analytical decision-making.
Analytical workflows can be thought of as a (non-linear) sequence of tasks that map to key distinct phases in a given workflow.
model construction phase: data is transformed (schematized) into a feature space suitable for training machine learning models
“task-to-task” loops
In AC the resulting analytical workflow tasks reside a goal hierarchy where goals
contain sub-goals. At the leaf nodes of the goal hierarchy are task execution “blocks”
that generate actual commands for the analysis. Each task can involve one or more
one or more decisions the determine how to conduct the analysis.
This dual learning mechanism combines a knowledge-based expert system approach with a data-driven machine learning approach. Both learning mechanisms are used to inform AC’s data science decision-making at any given step in an analytical workflow.
For example, a “schematize model agent” exists for combining expert schematize decisions and data-driven schematize decisions. Similar agents
exist for sampling data, data normalization, training and test set construction, feature selection, algorithm selection, hyper-parameter selection, presentation, et