SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
BigML Education
Flatline
July 2017
BigML Education Program 2Flatline
In This Video
• Introduction
• Define Flatline: Feature Engineering & Filtering
• Define Feature Engineering: why is it important?
• Examples: Automatic Feature Engineering / Filtering
• Basic Features
• Explore built-in functions for FE / Filtering
• Refresh options
• Advanced Features
• Formally introduce Flatline s-expressions
• Lisp-style expressions for FE and filtering.
• Examples of custom transformations
BigML Education Program 3Flatline
Flatline
Introduction
BigML Education Program 4Flatline
What is Flatline?
• DSL:
• Invented by BigML - Programmatic / Optimized for speed
• Transforms datasets into new datasets
• Adding new fields / Filtering
• Transformations are written in lisp-style syntax
• Programmatic Filtering:
• Filtering datasets according to functions that evaluate to
true/false using the row of data as an input.
• Feature Engineering???
Flatline: a domain specific language for feature
engineering and programmatic filtering
BigML Education Program 5Flatline
What is Feature Engineering?
• This is really, really important - more than algorithm selection!
• ML Algorithms have no deeper understanding of data
• Numerical: have a natural order, can be scaled, etc
• Categorical: have discrete values, etc.
• The "magic" is the ability to find patterns quickly and efficiently
• ML Algorithms only know what you tell/show it with data
• Medical: Kg and M, but BMI = Kg/M2 is better
• Lending: Debt and Income, but DTI is better
• Intuition can be risky: remember to prove it with an evaluation!
Feature Engineering: applying domain knowledge of
the data to create new features that allow ML
algorithms to work better, or to work at all.
BigML Education Program 6Flatline
Automatic Transformations
2013-09-25 10:02
DATE-TIME
Date-Time Fields
… year month day hour minute …
… 2013 Sep 25 10 2 …
… … … … … … …
NUM NUMCAT NUM NUM
• Date-Time fields have a lot of information "packed" into them
• Splitting out the time components allows ML algorithms to
discover time-based patterns.
BigML Education Program 7Flatline
Automatic Transformations
Categorical Fields for Clustering/LR
… alchemy_category …
… business …
… recreation …
… health …
… … …
CAT
business health recreation …
… 1 0 0 …
… 0 0 1 …
… 0 1 0 …
… … … … …
NUM NUM NUM
• Clustering and Logistic Regression require numeric fields for
inputs
• Categorical values are transformed to numeric vectors
automatically*
*Note: In BigML, clustering uses k-prototypes and the encoding used for LR can be configured.
BigML Education Program 8Flatline
Automatic Transformations
Be not afraid of greatness:
some are born great, some achieve
greatness, and some have greatness
thrust upon ‘em.
TEXT
Text Fields
… great afraid born achieve …
… 4 1 1 1 …
… … … … … …
NUM NUM NUM NUM
• Unstructured text contains a lot of potentially interesting patterns
• Bag-of-words analysis happens automatically and extracts the
"interesting" tokens in the text
• Another option is Topic Modeling to extract thematic meaning
BigML Education Program 9Flatline
Help ML to Work Better
{
“url":"cbsnews",
"title":"Breaking News Headlines
Business Entertainment World News “,
"body":" news covering all the latest
breaking national and world news
headlines, including politics, sports,
entertainment, business and more.”
}
TEXT
title body
Breaking News… news covering…
… …
TEXT TEXT
When text is not actually unstructured
• In this case, the text field has structure (key/value pairs)
• Extracting the structure as new features may allow the ML
algorithm to work better
BigML Education Program 10Flatline
Help ML to Work at all
When the pattern does not exist
Highway Number Direction Is Long
2 East-West FALSE
4 East-West FALSE
5 North-South TRUE
8 East-West FALSE
10 East-West TRUE
… … …
Goal: Predict principle direction from highway number
( = (mod (field "Highway Number") 2) 0)
BigML Education Program 11Flatline
Flatline
Basic Features
BigML Education Program 12Flatline
Built-ins for Filtering
• Built-in filtering:
• Common filtering options for each field type
• Internally the built-in filtering writes a Flatline expression
• Numeric Fields: Comparison / Equals / Missing / Statistics
• Date-Time: Comparison / Equals / Missing
• Categorical Fields: Specific values / Missing values
• Text Fields: Equals / Contains / Doesn’t contain / Missing
• Refresh Fields:
• Types: recomputes field types. Ex: #classes > 1000
• Preferred: recomputes preferred status
BigML Education Program 13Flatline
Built-ins for Filtering
• boilerplate contains "cup"
• alchemy_category is not "sports" nor "unknown"
• alchemy_category_score >= 0.70
StumbleUpon Filtering Example
BigML Education Program 14Flatline
Built-ins for FE
• Discretize: Converts a numeric value to categorical
• Replace missing values: fixed/max/mean/median/etc
• Normalize: Adjust a numeric value to a specific range of values
while preserving the distribution
• Math: Exponentiation, Logarithms, Squares, Roots, etc
• Types: Force a field value to categorical, integer, or real
• Random: Create random values for introducing noise
• Statistics: Mean, Population
• Refresh Fields:
• Types: recomputes field types. Ex: #classes > 1000
• Preferred: recomputes preferred status
BigML Education Program 15Flatline
Built-in Functions
Discretization
Total Spend
7.342,99
304,12
4,56
345,87
8.546,32
NUM
“Predict will spend
$3,521 with error
$1,232”
Spend Category
Top 33%
Bottom 33%
Bottom 33%
Middle 33%
Top 33%
CAT
“Predict customer
will be Top 33% in
spending”
BigML Education Program 16Flatline
Flatline
Advanced Features
BigML Education Program 17Flatline
Flatline
• Lisp style syntax: Operators come first
• Correct: (+ 1 2) : NOT Correct: (1 + 2)
• Dataset Fields are first-class citizens
• (field “diabetes pedigree”)
• Limited programming language structures
• let, cond, if, maps, list operators, */+-, etc.
• Built-in transformations
• statistics, strings, timestamps, windows
BigML Education Program 18Flatline
Flatline Add Fields
Computing with Existing Features
Debt Income
10.134 100.000
85.234 134.000
8.112 21.500
0 45.900
17.534 52.000
NUM NUM
(/ (field "Debt") (field "Income"))
Debt
Income
Debt to Income Ratio
0,10
0,64
0,38
0
0,34
NUM
BigML Education Program 19Flatline
Flatline Filtering
• Any flatline expression that evaluates to true/false
can be used to filter a dataset
• If the expression is true for the row, it will be
kept in the output dataset
• If the expression is false for the row, it will be
left out from the output dataset
• Filtering can be direct or sometimes in two steps:
1. Add field to dataset
2. Filter new field with a built-in function
BigML Education Program 20Flatline
Flatline Filtering
Filtering by Computation
Debt Income
10.134 100.000
85.234 134.000
8.112 21.500
0 45.900
17.534 52.000
NUM NUM
(< (/ (field "Debt") (field "Income")) 0.35)
Debt
Income
< 0.35
Debt Income
10.134 100.000
85.234 134.000
8.112 21.500
0 45.900
17.534 52.000
NUM NUM
BigML Education Program 21Flatline
Flatline s-expressions
(= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) ))
Name Month - 3 Month - 2 Month - 1
Joe Schmo 123,23 0 0
Jane Plain 0 0 0
Mary Happy 0 55,22 243,33
Tom Thumb 12,34 8,34 14,56
Un-Labelled Data
Labelled data
Name Month - 3 Month - 2 Month - 1 Default
Joe Schmo 123,23 0 0 FALSE
Jane Plain 0 0 0 TRUE
Mary Happy 0 55,22 243,33 FALSE
Tom Thumb 12,34 8,34 14,56 FALSE
Adding Simple Labels to Data
Define "default" as
missing three
payments in a row
BigML Education Program 22Flatline
Flatline s-expressions
date volume price
1 34353 314
2 44455 315
3 22333 315
4 52322 321
5 28000 320
6 31254 319
7 56544 323
8 44331 324
9 81111 287
10 65422 294
11 59999 300
12 45556 302
13 19899 301
Current - (4-day avg)
std dev
Shock: Deviations from a Trend
day-4 day-3 day-2 day-1 4davg
-
314 -
314 315 -
314 315 315 -
314 315 315 321 316,25
315 315 321 320 317,75
315 321 320 319 318,75
BigML Education Program 23Flatline
Flatline s-expressions
Current - (4-day avg)
std dev
Shock: Deviations from a Trend
Current : (field “price”)
4-day avg: (avg-window “price” -4 -1)
std dev: (standard-deviation “price”)
(/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
BigML Education Program 24Flatline
Advanced s-expressions
Moon Phase%
( / ( mod ( - ( / ( epoch ( field "date-field" )) 1000 ) 621300 ) 2551443 ) 2551442 )
Highway isEven?
( = (mod (field "Highway Number") 2) 0)
( let (R 6371000 latA (to-radians {lat-ref}) latB (to-radians ( field "LATITUDE" ) )
latD ( - latB latA ) longD ( to-radians ( - ( field "LONGITUDE" ) {long-ref} ) ) a (
+ ( square ( sin ( / latD 2 ) ) ) ( * (cos latA) (cos latB) (square ( sin ( / longD
2))) ) ) c ( * 2 ( asin ( min (list 1 (sqrt a)) ) ) ) ) ( * R c ) )
Distance Lat/Long <=> Ref (Haversine)
BigML Education Program 25Flatline
WhizzML + Flatline
HAVERSINE
FLATLINE
OUTPUT
DATASET
INPUT
DATASET
LONG Ref
LAT Ref
WHIZZML SCRIPT
GALLERY
BigML Education Program 26Flatline
Summary
• Flatline: programmatic feature engineering / filtering
• Feature Engineering: what is it / why it is important
• Automatic transformations: date-time, text, etc
• Built-in functions: filtering and feature engineering
• Discretization / Normalization / etc.
• Flatline expressions
• Structure
• Examples: Adding fields / filtering
• WhizzML
• Wrapping Flatline
• Quick start with scriptify

Más contenido relacionado

La actualidad más candente

“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...Edge AI and Vision Alliance
 
Data product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics historyData product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics historyRogier Werschkull
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 
Analysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin americaAnalysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin americaLeandro Scalize
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureDATAVERSITY
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data ManagementLuis Ortiz
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep diveDataWorks Summit
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Cloudera, Inc.
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationDenodo
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPranov Mishra
 
Applying Data Science and Analytics in Marketing
Applying Data Science and Analytics in MarketingApplying Data Science and Analytics in Marketing
Applying Data Science and Analytics in MarketingData Con LA
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudKent Graziano
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture Mark Hewitt
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as ProductDATAVERSITY
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse AnalyticsErwin de Kreuk
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Kent Graziano
 

La actualidad más candente (20)

“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
“Responsible AI and ModelOps in Industry: Practical Challenges and Lessons Le...
 
Data product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics historyData product thinking-Will the Data Mesh save us from analytics history
Data product thinking-Will the Data Mesh save us from analytics history
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouse
 
Analysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin americaAnalysis of big data and analytics market in latin america
Analysis of big data and analytics market in latin america
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Enterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data ArchitectureEnterprise Architecture vs. Data Architecture
Enterprise Architecture vs. Data Architecture
 
Master Data Management
Master Data ManagementMaster Data Management
Master Data Management
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
Inside open metadata—the deep dive
Inside open metadata—the deep diveInside open metadata—the deep dive
Inside open metadata—the deep dive
 
Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360Using Big Data to Drive Customer 360
Using Big Data to Drive Customer 360
 
Enabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data VirtualizationEnabling a Data Mesh Architecture with Data Virtualization
Enabling a Data Mesh Architecture with Data Virtualization
 
Prediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom IndustryPrediction of customer propensity to churn - Telecom Industry
Prediction of customer propensity to churn - Telecom Industry
 
Applying Data Science and Analytics in Marketing
Applying Data Science and Analytics in MarketingApplying Data Science and Analytics in Marketing
Applying Data Science and Analytics in Marketing
 
Balance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data CloudBalance agility and governance with #TrueDataOps and The Data Cloud
Balance agility and governance with #TrueDataOps and The Data Cloud
 
Modern Data Architecture
Modern Data Architecture Modern Data Architecture
Modern Data Architecture
 
Data science Framework
Data science FrameworkData science Framework
Data science Framework
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
The ABCs of Treating Data as Product
The ABCs of Treating Data as ProductThe ABCs of Treating Data as Product
The ABCs of Treating Data as Product
 
Lake Database Database Template Map Data in Azure Synapse Analytics
Lake Database  Database Template  Map Data in Azure Synapse AnalyticsLake Database  Database Template  Map Data in Azure Synapse Analytics
Lake Database Database Template Map Data in Azure Synapse Analytics
 
Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)Agile Data Engineering - Intro to Data Vault Modeling (2016)
Agile Data Engineering - Intro to Data Vault Modeling (2016)
 

Similar a BigML Education - Feature Engineering with Flatline

BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBigML, Inc
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingBigML, Inc
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature EngineeringBigML, Inc
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature EngineeringBigML, Inc
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision MakingBigML, Inc
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBigML, Inc
 
VSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML PlatformVSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML PlatformBigML, Inc
 
Practical data science
Practical data sciencePractical data science
Practical data scienceDing Li
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBigML, Inc
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonMark Conway
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data TransformationsBigML, Inc
 
VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsBigML, Inc
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIMEAli Raza Anjum
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering BigML, Inc
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu
 
VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2BigML, Inc
 

Similar a BigML Education - Feature Engineering with Flatline (20)

BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
 
DutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision MakingDutchMLSchool. Automating Decision Making
DutchMLSchool. Automating Decision Making
 
VSSML18. Feature Engineering
VSSML18. Feature EngineeringVSSML18. Feature Engineering
VSSML18. Feature Engineering
 
MLSD18. Feature Engineering
MLSD18. Feature EngineeringMLSD18. Feature Engineering
MLSD18. Feature Engineering
 
MLSEV. Automating Decision Making
MLSEV. Automating Decision MakingMLSEV. Automating Decision Making
MLSEV. Automating Decision Making
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
VSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML PlatformVSSML18. Introduction to Machine Learning and the BigML Platform
VSSML18. Introduction to Machine Learning and the BigML Platform
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
AlphaPy
AlphaPyAlphaPy
AlphaPy
 
AlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in PythonAlphaPy: A Data Science Pipeline in Python
AlphaPy: A Data Science Pipeline in Python
 
VSSML18. Data Transformations
VSSML18. Data TransformationsVSSML18. Data Transformations
VSSML18. Data Transformations
 
VSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data TransformationsVSSML16 L5. Basic Data Transformations
VSSML16 L5. Basic Data Transformations
 
Citizen Data Science Training using KNIME
Citizen Data Science Training using KNIMECitizen Data Science Training using KNIME
Citizen Data Science Training using KNIME
 
Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering Web UI, Algorithms, and Feature Engineering
Web UI, Algorithms, and Feature Engineering
 
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEPredicting Azure Churn with Deep Learning and Explaining Predictions with LIME
Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME
 
When Should I Use Simulation?
When Should I Use Simulation?When Should I Use Simulation?
When Should I Use Simulation?
 
VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2VSSML16 LR2. Summary Day 2
VSSML16 LR2. Summary Day 2
 
MLIntro_ADA.pptx
MLIntro_ADA.pptxMLIntro_ADA.pptx
MLIntro_ADA.pptx
 

Más de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 

Más de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Último

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

BigML Education - Feature Engineering with Flatline

  • 2. BigML Education Program 2Flatline In This Video • Introduction • Define Flatline: Feature Engineering & Filtering • Define Feature Engineering: why is it important? • Examples: Automatic Feature Engineering / Filtering • Basic Features • Explore built-in functions for FE / Filtering • Refresh options • Advanced Features • Formally introduce Flatline s-expressions • Lisp-style expressions for FE and filtering. • Examples of custom transformations
  • 3. BigML Education Program 3Flatline Flatline Introduction
  • 4. BigML Education Program 4Flatline What is Flatline? • DSL: • Invented by BigML - Programmatic / Optimized for speed • Transforms datasets into new datasets • Adding new fields / Filtering • Transformations are written in lisp-style syntax • Programmatic Filtering: • Filtering datasets according to functions that evaluate to true/false using the row of data as an input. • Feature Engineering??? Flatline: a domain specific language for feature engineering and programmatic filtering
  • 5. BigML Education Program 5Flatline What is Feature Engineering? • This is really, really important - more than algorithm selection! • ML Algorithms have no deeper understanding of data • Numerical: have a natural order, can be scaled, etc • Categorical: have discrete values, etc. • The "magic" is the ability to find patterns quickly and efficiently • ML Algorithms only know what you tell/show it with data • Medical: Kg and M, but BMI = Kg/M2 is better • Lending: Debt and Income, but DTI is better • Intuition can be risky: remember to prove it with an evaluation! Feature Engineering: applying domain knowledge of the data to create new features that allow ML algorithms to work better, or to work at all.
  • 6. BigML Education Program 6Flatline Automatic Transformations 2013-09-25 10:02 DATE-TIME Date-Time Fields … year month day hour minute … … 2013 Sep 25 10 2 … … … … … … … … NUM NUMCAT NUM NUM • Date-Time fields have a lot of information "packed" into them • Splitting out the time components allows ML algorithms to discover time-based patterns.
  • 7. BigML Education Program 7Flatline Automatic Transformations Categorical Fields for Clustering/LR … alchemy_category … … business … … recreation … … health … … … … CAT business health recreation … … 1 0 0 … … 0 0 1 … … 0 1 0 … … … … … … NUM NUM NUM • Clustering and Logistic Regression require numeric fields for inputs • Categorical values are transformed to numeric vectors automatically* *Note: In BigML, clustering uses k-prototypes and the encoding used for LR can be configured.
  • 8. BigML Education Program 8Flatline Automatic Transformations Be not afraid of greatness: some are born great, some achieve greatness, and some have greatness thrust upon ‘em. TEXT Text Fields … great afraid born achieve … … 4 1 1 1 … … … … … … … NUM NUM NUM NUM • Unstructured text contains a lot of potentially interesting patterns • Bag-of-words analysis happens automatically and extracts the "interesting" tokens in the text • Another option is Topic Modeling to extract thematic meaning
  • 9. BigML Education Program 9Flatline Help ML to Work Better { “url":"cbsnews", "title":"Breaking News Headlines Business Entertainment World News “, "body":" news covering all the latest breaking national and world news headlines, including politics, sports, entertainment, business and more.” } TEXT title body Breaking News… news covering… … … TEXT TEXT When text is not actually unstructured • In this case, the text field has structure (key/value pairs) • Extracting the structure as new features may allow the ML algorithm to work better
  • 10. BigML Education Program 10Flatline Help ML to Work at all When the pattern does not exist Highway Number Direction Is Long 2 East-West FALSE 4 East-West FALSE 5 North-South TRUE 8 East-West FALSE 10 East-West TRUE … … … Goal: Predict principle direction from highway number ( = (mod (field "Highway Number") 2) 0)
  • 11. BigML Education Program 11Flatline Flatline Basic Features
  • 12. BigML Education Program 12Flatline Built-ins for Filtering • Built-in filtering: • Common filtering options for each field type • Internally the built-in filtering writes a Flatline expression • Numeric Fields: Comparison / Equals / Missing / Statistics • Date-Time: Comparison / Equals / Missing • Categorical Fields: Specific values / Missing values • Text Fields: Equals / Contains / Doesn’t contain / Missing • Refresh Fields: • Types: recomputes field types. Ex: #classes > 1000 • Preferred: recomputes preferred status
  • 13. BigML Education Program 13Flatline Built-ins for Filtering • boilerplate contains "cup" • alchemy_category is not "sports" nor "unknown" • alchemy_category_score >= 0.70 StumbleUpon Filtering Example
  • 14. BigML Education Program 14Flatline Built-ins for FE • Discretize: Converts a numeric value to categorical • Replace missing values: fixed/max/mean/median/etc • Normalize: Adjust a numeric value to a specific range of values while preserving the distribution • Math: Exponentiation, Logarithms, Squares, Roots, etc • Types: Force a field value to categorical, integer, or real • Random: Create random values for introducing noise • Statistics: Mean, Population • Refresh Fields: • Types: recomputes field types. Ex: #classes > 1000 • Preferred: recomputes preferred status
  • 15. BigML Education Program 15Flatline Built-in Functions Discretization Total Spend 7.342,99 304,12 4,56 345,87 8.546,32 NUM “Predict will spend $3,521 with error $1,232” Spend Category Top 33% Bottom 33% Bottom 33% Middle 33% Top 33% CAT “Predict customer will be Top 33% in spending”
  • 16. BigML Education Program 16Flatline Flatline Advanced Features
  • 17. BigML Education Program 17Flatline Flatline • Lisp style syntax: Operators come first • Correct: (+ 1 2) : NOT Correct: (1 + 2) • Dataset Fields are first-class citizens • (field “diabetes pedigree”) • Limited programming language structures • let, cond, if, maps, list operators, */+-, etc. • Built-in transformations • statistics, strings, timestamps, windows
  • 18. BigML Education Program 18Flatline Flatline Add Fields Computing with Existing Features Debt Income 10.134 100.000 85.234 134.000 8.112 21.500 0 45.900 17.534 52.000 NUM NUM (/ (field "Debt") (field "Income")) Debt Income Debt to Income Ratio 0,10 0,64 0,38 0 0,34 NUM
  • 19. BigML Education Program 19Flatline Flatline Filtering • Any flatline expression that evaluates to true/false can be used to filter a dataset • If the expression is true for the row, it will be kept in the output dataset • If the expression is false for the row, it will be left out from the output dataset • Filtering can be direct or sometimes in two steps: 1. Add field to dataset 2. Filter new field with a built-in function
  • 20. BigML Education Program 20Flatline Flatline Filtering Filtering by Computation Debt Income 10.134 100.000 85.234 134.000 8.112 21.500 0 45.900 17.534 52.000 NUM NUM (< (/ (field "Debt") (field "Income")) 0.35) Debt Income < 0.35 Debt Income 10.134 100.000 85.234 134.000 8.112 21.500 0 45.900 17.534 52.000 NUM NUM
  • 21. BigML Education Program 21Flatline Flatline s-expressions (= 0 (+ (abs ( f "Month - 3" ) ) (abs ( f "Month - 2")) (abs ( f "Month - 1") ) )) Name Month - 3 Month - 2 Month - 1 Joe Schmo 123,23 0 0 Jane Plain 0 0 0 Mary Happy 0 55,22 243,33 Tom Thumb 12,34 8,34 14,56 Un-Labelled Data Labelled data Name Month - 3 Month - 2 Month - 1 Default Joe Schmo 123,23 0 0 FALSE Jane Plain 0 0 0 TRUE Mary Happy 0 55,22 243,33 FALSE Tom Thumb 12,34 8,34 14,56 FALSE Adding Simple Labels to Data Define "default" as missing three payments in a row
  • 22. BigML Education Program 22Flatline Flatline s-expressions date volume price 1 34353 314 2 44455 315 3 22333 315 4 52322 321 5 28000 320 6 31254 319 7 56544 323 8 44331 324 9 81111 287 10 65422 294 11 59999 300 12 45556 302 13 19899 301 Current - (4-day avg) std dev Shock: Deviations from a Trend day-4 day-3 day-2 day-1 4davg - 314 - 314 315 - 314 315 315 - 314 315 315 321 316,25 315 315 321 320 317,75 315 321 320 319 318,75
  • 23. BigML Education Program 23Flatline Flatline s-expressions Current - (4-day avg) std dev Shock: Deviations from a Trend Current : (field “price”) 4-day avg: (avg-window “price” -4 -1) std dev: (standard-deviation “price”) (/ (- ( f "price") (avg-window "price" -4, -1)) (standard-deviation "price"))
  • 24. BigML Education Program 24Flatline Advanced s-expressions Moon Phase% ( / ( mod ( - ( / ( epoch ( field "date-field" )) 1000 ) 621300 ) 2551443 ) 2551442 ) Highway isEven? ( = (mod (field "Highway Number") 2) 0) ( let (R 6371000 latA (to-radians {lat-ref}) latB (to-radians ( field "LATITUDE" ) ) latD ( - latB latA ) longD ( to-radians ( - ( field "LONGITUDE" ) {long-ref} ) ) a ( + ( square ( sin ( / latD 2 ) ) ) ( * (cos latA) (cos latB) (square ( sin ( / longD 2))) ) ) c ( * 2 ( asin ( min (list 1 (sqrt a)) ) ) ) ) ( * R c ) ) Distance Lat/Long <=> Ref (Haversine)
  • 25. BigML Education Program 25Flatline WhizzML + Flatline HAVERSINE FLATLINE OUTPUT DATASET INPUT DATASET LONG Ref LAT Ref WHIZZML SCRIPT GALLERY
  • 26. BigML Education Program 26Flatline Summary • Flatline: programmatic feature engineering / filtering • Feature Engineering: what is it / why it is important • Automatic transformations: date-time, text, etc • Built-in functions: filtering and feature engineering • Discretization / Normalization / etc. • Flatline expressions • Structure • Examples: Adding fields / filtering • WhizzML • Wrapping Flatline • Quick start with scriptify