SlideShare una empresa de Scribd logo
1 de 43
Feature Store
as a Data Foundation for ML
Presented by:
Stepan Pushkarev, CTO @ Provectus
Gandhi Raketla, Senior Solutions Architect @ AWS
1. Introductions
2. Modern Data Lakes and Modern ML Infrastructure
3. Emerging Architectural Shifts
4. Feature Store: 200 LOD overview and reference architecture on
AWS
5. AWS Perspective on Feature Store
Agenda
Introductions
Stepan Pushkarev
Chief Technology
Officer, Provectus
Gandhi Raketla
Senior Solutions
Architect, AWS
German Osin
Senior Solutions
Architect, Provectus
Clients ranging from
fast-growing startups
to large enterprises
450 employees and
growing
Established in 2010
HQ in Palo Alto
Offices across the US,
Canada, and Europe
We are obsessed about leveraging cloud, data, and AI to reimagine the
way businesses operate, compete, and deliver customer value
AI-First Consultancy & Solutions
Provider
Innovative Tech Vendors
Seeking for niche expertise to
differentiate and win the market
Midsize to Large Enterprises
Seeking to accelerate innovation,
achieve operational excellence
Our Clients
Challenges of Modern
Data Platforms
Modern Data Lakes You Know
Common Challenges:
Data Access and Discoverability
1. Data is scattered across multiple data sources
and technologies
2. Tedious process of managing AWS IAM roles,
Amazon S3 policies, API Gateways, Database
permissions
3. Gets even more complicated in AWS multi-
account setup
4. Metadata is not discoverable
5. As a result - all the investments into Data and
ML are killed by data access issues
1. Lack of ownership and domain context —
A disconnect between data producers
and data consumers
2. Backlogged data team struggling to
keep pace with business demands
3. No Contracts between Data and ML
Engineering
4. As a result, fast end-to-end
experimentation is killed by complex
dependencies between teams
Common Challenges:
Monolithic Data Teams
https://martinfowler.com/articles/data-monolith-to-mesh.html
Common Challenges:
ML Experimentation Infrastructure
1. Inherited issues with Data Discovery and
Data Access
2. Reproducibility of datasets, ML pipelines,
ML Environments, and offline experiments
is still an issue
3. Production Experimentation frameworks
are fairly immature yet
4. As a result, the cost of an end-to-end
experiment from data to production ML
metric is 3-6 months
https://hbr.org/2020/03/building-a-culture-of-experimentation
Common Challenges:
Scaling ML Adoption in Production
1. Online serving. There is no unified and consistent
way to access features during model serving.
2. Impossible to reuse features between multiple
training pipelines and ML applications.
3. Monitoring and maintenance of ML Applications.
4. As a result, time and cost to scale from 1 to 100
models in production is growing exponentially.
What is your cost per
ML Model in Production?
Emerging Architectural
Shifts
Emerging Architectural Shifts
Data Lake -> Hudi/Delta Lakes
Hudi/Delta Lakes bring managed ingestion, ACID transactions
and point in time queries into traditional Data Lakes
Data Lake -> Data Mesh
Ownership of data domains, data pipelines, metadata, and API
is shifting from centralized teams to product teams
Data Lake -> Data Infrastructure as a platform
Unified reusable platform components and frameworks across
enterprise
Endpoint Protection -> Global Data Governance
Data Security and privacy measures are becoming centralized
as part of Data Platform
Metadata Store -> Global Data Catalog
User Experience around data discovery, lineage, and versioning
requires investments into metadata-rich Data Catalog
Feature Store
Scaling ML Experimentation and Operations requires a
separate data management layer for ML Features
ML Toolkit -> Complete ML Infrastructure
ML capabilities are democratized for ML Engineers and citizen
Data Scientists
ACID Data Lakes
● Managed Ingestion
● Dataset versioning for ML training
● Cheap “Deletes” (common GDPR use case)
● Audit log to any changes in datasets
● Brings ACID transactions in your data lake
● “Upserts” strategy on data ingestion
● Enables schemas to enforce data quality
Delta/Hudi Lakes
Global Data Governance
Accelerate privacy operations with data you already
have.
Automate business processes, data mapping, and PI
discovery and classification for privacy workflows.
Operationalize policies in a central location.
Govern privacy policies to ensure policies are effectively
managed across the enterprise. Define and document
workflows, traceability views, and business process
registers.
Scale compliance across multiple regulations.
Use a platform designed and built with privacy in mind
that is easily extensible to support new regulations.
AWS Config
AWS Lake Formation
Global Data Catalog
Meta-metadata store:
● Does this data exist? Where is it?
● What is the source of truth of this data?
● Do I have access?
● Who is the owner?
● Who are the users of this data?
● Are there existing assets I can reuse?
● Can I trust this data?
* There are no established leaders in open
source
The Core of MLOps and Reproducible
Experimentation Pipelines
Model Code
ML Pipeline Code
Infrastructure
as a Code
Versioned
Dataset
Production
Metrics & Alerts
Model Artifacts
Prediction
Service
ML Metrics
Automated Pipeline Execution
Pipeline Metadata
Alerts Reports
Feature Store
Orchestration: Idempotent Execution
Feedback Loop for Production Data
Feature Store
Feature Store Value Proposition
A data management layer for machine learning features.
1. Better ROI from feature engineering through reduction of
cost per model — Facilitates collaboration, sharing, and
reusing of features
2. Faster time to market for new models through increased
productivity of ML Engineers - Decoupled storage
implementation and features serving API
● Personalization & Recommendation
Engines
● Dynamic Pricing Optimization
● Supply Chain Optimization
● Logistics and Transportation
Optimization
Feature Store: Canonical Use Cases
● Fraud Detection
● Predictive Maintenance
● Demand Forecasting
* All the use cases where ML models need a
stateful ever changing representation of the
system
● Online Feature Store
Online applications look up for a feature
vector that is sent to an ML model for
predictions
● ML specific Metadata
Enables features discoverability and
reuse
Feature Store: Concepts
● ML Specific API and SDK
High level operations for fetching training
feature sets and online access
● Materialized Versioned Datasets
Maintains versions of featuresets used to
train ML models
Raw
Data Feature StoreFeature Engineering
Training
Serving
Discovery
Platform License Supported Platforms
Feast (now backed by Tecton) Apache V2 AWS (in roadmap), GCP
Uber Michelangelo In-house product N/A
Hopsworks AGPL-V3 AWS, GCP, On-Premises
Tecton Enterprise AWS, GCP & Azure (2021)
Airbnb Zipline In-house product N/A
Comcast In-house product N/A
Netflix Metaflow In-house product N/A
Twitter In-house product N/A
Facebook FBLearner In-house product N/A
Pinterest Galaxy In-house product N/A
Feature Store: Market
Pros:
● Battle-tested with GoJek, Farfetch,
Postmates, and Zulily
● Integrated with Kubeflow
● Good community
Cons (to be addressed in the roadmap):
● GCP only
● Infrastructure-heavy
● Lacks composability
● No Data Versioning
* Now backed by Tecton
* https://blog.feast.dev/post/a-state-of-feast
Feast
Offline Store
(BigQuery)
Online
Serving
Historical
Serving
Feature
Registry
Online Store
(Redis)
Ingestion
Training
Discovery
Serving
Ingestion
API
Ingestion
Pros:
● Integrates with most Python libs for
ingestion and training
● Supports offline store with time travel
● AWS / GCP / Azure / On-Prem Ready
Cons:
● Hard to use out of HopsML
infrastructure
● Online store might not fit all latency
requirements
* Online serving is part of Enterprise version
Hopsworks
Feature
Registry
Offline Store
(Hudi/Hive)
Online
Serving
Historical
Serving
Spark
Online Store
(My SQL)
Training
Discovery
Serving
Pandas
Ingestion
API
Raw
Data
Hot
Storage
Event
Data
Stream Processing
BI Tools
API
Batch Processing Cold
Storage
Workflow Automation
Data
Catalog
Data
Quality
Data
Security
Modern Data Infrastructure
Feature Store
Raw Data
Hot
Storage
Event
Data
Stream Feature
Processing
Training
Serving
Batch Feature
Processing
Cold
Storage
Workflow Automation
Data
Catalog
Data
Quality
Data
Security
Data Infrastructure
1. Start with designing consistent ACID Data Lake before investing
into Feature Store
2. Value from existing open source products does not justify
investments into integration and the dependencies they bring
3. Feature Store must not bring about new infrastructure and
data storage solutions. It has to be a lightweight API and SDK
integrated into your existing data infrastructure.
4. Data Catalog, Data Governance, and Data Quality components
are horizontal for the whole Data Infrastructure, including
Feature Store
5. There are no mature open source or cloud solutions for Global
Data Catalog and Data Quality monitoring.
Lessons Learned
Data Infrastructure with Feature Store
Raw
Data
Hot
Storage
Event
Data
Stream Processing
BI Tools
API
Batch Processing Cold
Storage
Workflow Automation
Training
Serving
Feature
Store API
Data
Catalog
Data
Quality
Data
Security
Reference Architecture
Raw
Data
Hot
Storage
Event
Data
Stream Processing
BI Tools
API
Batch Processing Cold
Storage
Workflow Automation
Training
Serving
Feature
Store API
Data
Catalog
Data
Quality
Data
Security
Reference Architecture:
Components
Cold Storage Hot Storage Data Catalog Data Quality
Great
Expectations
DEEQU
Feature Store API
?Glue Metadata
? ?
Recommendations for going forward with Feature Store:
1. Make sure your existing Data Infrastructure covers
90% of Feature Store requirements (Streaming
Ingestion, Consistency, Catalog, Versioning)
2. Build in-house a lightweight Feature Store API to your
existing storage solutions
3. Collaborate with community and cloud vendors to
maintain compatibility with standards and state of
the art ecosystem
4. Be ready to migrate to managed service or an open
source alternative as the market matures
Recommended Strategy
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and
Trademark 32
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and
Trademark
Gandhi Raketla, Senior Solutions Architect
Feature Store on AWS
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and
Trademark 33
AWS Feature Storage Capabilities
✔ Reuse - Use the existing feature store pipeline developed by data engineers
to re-compute and cache features in a feature store
✔ Store - Store the metadata of features such as a description, documentation,
and statistical measures of features in the feature store.
✔ Discover - Make the metadata searchable through an API to ML practitioners
✔ Govern - Add a data management layer on top of the feature store for
governance and access control
✔ Consume - Allow ML practitioners to query and consume features using an
API to export the features for training or real-time inference
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and
Trademark 34
Components Of Feature Store
Storage
• S3
• DynamoDB
• Redis
• Aurora
Catalog
• Glue Crawler
• Glue ETL
• Glue Catalog
Query/API
• Athena
• Lambda
© 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and
Trademark 35
Storage
Performance
at scale
Consistent, single-digit
millisecond response times
at any scale; build
applications with virtually
unlimited throughput
Serverless architecture
No hardware provisioning,
software patching, or upgrades;
scales up or down
automatically; continuously
backs up your data
Global replication
You can build global
applications with fast access
to local data by easily
replicating tables across
multiple AWS Regions
Enterprise
security
Encrypts all data by
default and fully integrates
with AWS Identity and
Access Management for
robust security
Amazon DynamoDB
Fast and flexible key-value database service for any scale
Read scaling with replicas;
write and memory scaling with
sharding; nondisruptive scaling
Unlimited scale
AWS manages all hardware
and software setup,
configuration, and monitoring
Fully managed
In-memory data store
and cache for sub-millisecond
response times
Consistent high performance
Amazon ElastiCache
Managed, Redis, or Memcached-compatible in-memory data store
Performance
& scalability
5x throughput of standard
MySQL and 3x of standard
PostgreSQL; scale out up
to 15 read replicas
Availability
& durability
Fault-tolerant, self-healing
storage; 6 copies of data across 3
AZs; continuous backup to
Amazon S3
Highly
secure
Network
isolation,
encryption at
rest / in transit
Fully
managed
Managed by Amazon RDS:
On your part, no server provisioning,
software patching, setup,
configuration, or backups
Amazon Aurora
MySQL and PostgreSQL-compatible relational database built for the cloud
Catalog
AWS Glue: Components
Data Catalog
▪ Hive Metastore compatible with enhanced functionality
▪ Crawlers automatically extracts metadata and creates tables
▪ Integrated with Amazon Athena, Amazon Redshift Spectrum
Job Execution
▪ Run jobs on a serverless Spark platform
▪ Provides flexible scheduling
▪ Handles dependency resolution, monitoring and alerting
Job Authoring
▪ Auto-generates ETL code
▪ Build on open frameworks – Python and Spark
▪ Developer-centric – editing, debugging, sharing
Query
Amazon Athena
Pay per query
Pay only for queries run
Save 30–90% on per-query costs
through compression
Use S3 storage
ANSI SQL
JDBC/ODBC drivers
Multiple formats, compression
types, and complex joins and
data types
SQ
L
Serverless: zero infrastructure,
zero administration
Integrated with QuickSight
EasyQuery instantly
Zero setup cost
Point to S3 and start querying
Serverless, interactive query service
Analytics
Questions, details?
We would be happy to answer!
125 University Avenue
Suite 290, Palo Alto
California, 94301
provectus.com

Más contenido relacionado

La actualidad más candente

TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on AzureTrivadis
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageDATAVERSITY
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data ArchitectureGuido Schmutz
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflowDatabricks
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overviewJames Serra
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptxAlex Ivy
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPconfluent
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake OverviewJames Serra
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsDatabricks
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleDatabricks
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020Timothy McAliley
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyNeo4j
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseDatabricks
 

La actualidad más candente (20)

Screw DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOpsScrew DevOps, Let's Talk DataOps
Screw DevOps, Let's Talk DataOps
 
TechEvent Databricks on Azure
TechEvent Databricks on AzureTechEvent Databricks on Azure
TechEvent Databricks on Azure
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
MLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive AdvantageMLOps – Applying DevOps to Competitive Advantage
MLOps – Applying DevOps to Competitive Advantage
 
Big Data Architecture
Big Data ArchitectureBig Data Architecture
Big Data Architecture
 
Databricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With DataDatabricks: A Tool That Empowers You To Do More With Data
Databricks: A Tool That Empowers You To Do More With Data
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
Azure data platform overview
Azure data platform overviewAzure data platform overview
Azure data platform overview
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Ml ops on AWS
Ml ops on AWSMl ops on AWS
Ml ops on AWS
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCPBridge to Cloud: Using Apache Kafka to Migrate to GCP
Bridge to Cloud: Using Apache Kafka to Migrate to GCP
 
Data Lake Overview
Data Lake OverviewData Lake Overview
Data Lake Overview
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
Getting Started with Databricks SQL Analytics
Getting Started with Databricks SQL AnalyticsGetting Started with Databricks SQL Analytics
Getting Started with Databricks SQL Analytics
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020NOVA SQL User Group - Azure Synapse Analytics Overview -  May 2020
NOVA SQL User Group - Azure Synapse Analytics Overview - May 2020
 
Your Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph StrategyYour Roadmap for An Enterprise Graph Strategy
Your Roadmap for An Enterprise Graph Strategy
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 

Similar a Feature Store as a Data Foundation for Machine Learning

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Dataconomy Media
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudDataWorks Summit
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantagePrecisely
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise AnalyticsDATAVERSITY
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceWilfried Hoge
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiFelicia Haggarty
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureDATAVERSITY
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduCloudera, Inc.
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...PwC
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewDataWorks Summit/Hadoop Summit
 
Infra Migration Proposal Draft from Oracle to Snowflake
Infra Migration Proposal Draft from Oracle to SnowflakeInfra Migration Proposal Draft from Oracle to Snowflake
Infra Migration Proposal Draft from Oracle to SnowflakeShruti Chaurasia
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureDmitry Anoshin
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseRizaldy Ignacio
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderDataconomy Media
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceKaran Sachdeva
 
Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018Synergo!
 
Ashish_Maheshwari_Data_Analyst
Ashish_Maheshwari_Data_AnalystAshish_Maheshwari_Data_Analyst
Ashish_Maheshwari_Data_AnalystAshish Maheshwari
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketDremio Corporation
 

Similar a Feature Store as a Data Foundation for Machine Learning (20)

Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
Sudhir Rawat, Sr Techonology Evangelist at Microsoft SQL Business Intelligenc...
 
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the CloudBring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
Bring Your SAP and Enterprise Data to Hadoop, Kafka, and the Cloud
 
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data ArchitectureADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
ADV Slides: When and How Data Lakes Fit into a Modern Data Architecture
 
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive AdvantageFueling AI & Machine Learning: Legacy Data as a Competitive Advantage
Fueling AI & Machine Learning: Legacy Data as a Competitive Advantage
 
2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics2022 Trends in Enterprise Analytics
2022 Trends in Enterprise Analytics
 
InfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experienceInfoSphere BigInsights - Analytics power for Hadoop - field experience
InfoSphere BigInsights - Analytics power for Hadoop - field experience
 
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin MotgiWhither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
Whither the Hadoop Developer Experience, June Hadoop Meetup, Nitin Motgi
 
Amit_Kumar_CV
Amit_Kumar_CVAmit_Kumar_CV
Amit_Kumar_CV
 
When and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data ArchitectureWhen and How Data Lakes Fit into a Modern Data Architecture
When and How Data Lakes Fit into a Modern Data Architecture
 
Simplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache KuduSimplifying Real-Time Architectures for IoT with Apache Kudu
Simplifying Real-Time Architectures for IoT with Apache Kudu
 
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
The Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture ViewThe Future of Apache Hadoop an Enterprise Architecture View
The Future of Apache Hadoop an Enterprise Architecture View
 
Infra Migration Proposal Draft from Oracle to Snowflake
Infra Migration Proposal Draft from Oracle to SnowflakeInfra Migration Proposal Draft from Oracle to Snowflake
Infra Migration Proposal Draft from Oracle to Snowflake
 
Building Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft AzureBuilding Modern Data Platform with Microsoft Azure
Building Modern Data Platform with Microsoft Azure
 
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid WarehouseUsing the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
Using the Power of Big SQL 3.0 to Build a Big Data-Ready Hybrid Warehouse
 
Embedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern StaenderEmbedded-ml(ai)applications - Bjoern Staender
Embedded-ml(ai)applications - Bjoern Staender
 
ICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data ScienceICP for Data- Enterprise platform for AI, ML and Data Science
ICP for Data- Enterprise platform for AI, ML and Data Science
 
Querona Presentation 2018
Querona Presentation 2018Querona Presentation 2018
Querona Presentation 2018
 
Ashish_Maheshwari_Data_Analyst
Ashish_Maheshwari_Data_AnalystAshish_Maheshwari_Data_Analyst
Ashish_Maheshwari_Data_Analyst
 
Options for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current MarketOptions for Data Prep - A Survey of the Current Market
Options for Data Prep - A Survey of the Current Market
 

Más de Provectus

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP SolutionProvectus
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Provectus
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsProvectus
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionProvectus
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondProvectus
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerProvectus
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRProvectus
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...Provectus
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...Provectus
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...Provectus
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...Provectus
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...Provectus
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...Provectus
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...Provectus
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019Provectus
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...Provectus
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019Provectus
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMProvectus
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupProvectus
 

Más de Provectus (20)

Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
 
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
Intelligent Document Processing in Healthcare. Choosing the Right Solutions.
 
Choosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare OrganizationsChoosing the Right Document Processing Solution for Healthcare Organizations
Choosing the Right Document Processing Solution for Healthcare Organizations
 
MLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in ProductionMLOps and Data Quality: Deploying Reliable ML Models in Production
MLOps and Data Quality: Deploying Reliable ML Models in Production
 
AI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and BeyondAI Stack on AWS: Amazon SageMaker and Beyond
AI Stack on AWS: Amazon SageMaker and Beyond
 
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMakerMLOps and Reproducible ML on AWS with Kubeflow and SageMaker
MLOps and Reproducible ML on AWS with Kubeflow and SageMaker
 
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMRCost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
Cost Optimization for Apache Hadoop/Spark Workloads with Amazon EMR
 
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
ODSC webinar "Kubeflow, MLFlow and Beyond — augmenting ML delivery" Stepan Pu...
 
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K..."Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
"Building a Modern Data platform in the Cloud", Alex Casalboni, AWS Dev Day K...
 
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ..."How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
"How to build a global serverless service", Alex Casalboni, AWS Dev Day Kyiv ...
 
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky..."Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
"Automating AWS Infrastructure with PowerShell", Martin Beeby, AWS Dev Day Ky...
 
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2..."Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
"Analyzing your web and application logs", Javier Ramirez, AWS Dev Day Kyiv 2...
 
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma..."Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
"Resiliency and Availability Design Patterns for the Cloud", Sebastien Storma...
 
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ..."Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
"Architecting SaaS solutions on AWS", Oleksandr Mykhalchuk, AWS Dev Day Kyiv ...
 
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
"Developing with .NET Core on AWS", Martin Beeby, AWS Dev Day Kyiv 2019
 
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
"How to build real-time backends", Martin Beeby, AWS Dev Day Kyiv 2019
 
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti..."Integrate your front end apps with serverless backend in the cloud", Sebasti...
"Integrate your front end apps with serverless backend in the cloud", Sebasti...
 
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
"Scaling ML from 0 to millions of users", Julien Simon, AWS Dev Day Kyiv 2019
 
How to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAMHow to implement authorization in your backend with AWS IAM
How to implement authorization in your backend with AWS IAM
 
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC MeetupYurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
Yurii Gavrilin | ML Interpretability: From A to Z | Kazan ODSC Meetup
 

Último

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Feature Store as a Data Foundation for Machine Learning

  • 1. Feature Store as a Data Foundation for ML Presented by: Stepan Pushkarev, CTO @ Provectus Gandhi Raketla, Senior Solutions Architect @ AWS
  • 2. 1. Introductions 2. Modern Data Lakes and Modern ML Infrastructure 3. Emerging Architectural Shifts 4. Feature Store: 200 LOD overview and reference architecture on AWS 5. AWS Perspective on Feature Store Agenda
  • 3. Introductions Stepan Pushkarev Chief Technology Officer, Provectus Gandhi Raketla Senior Solutions Architect, AWS German Osin Senior Solutions Architect, Provectus
  • 4. Clients ranging from fast-growing startups to large enterprises 450 employees and growing Established in 2010 HQ in Palo Alto Offices across the US, Canada, and Europe We are obsessed about leveraging cloud, data, and AI to reimagine the way businesses operate, compete, and deliver customer value AI-First Consultancy & Solutions Provider
  • 5. Innovative Tech Vendors Seeking for niche expertise to differentiate and win the market Midsize to Large Enterprises Seeking to accelerate innovation, achieve operational excellence Our Clients
  • 7. Modern Data Lakes You Know
  • 8. Common Challenges: Data Access and Discoverability 1. Data is scattered across multiple data sources and technologies 2. Tedious process of managing AWS IAM roles, Amazon S3 policies, API Gateways, Database permissions 3. Gets even more complicated in AWS multi- account setup 4. Metadata is not discoverable 5. As a result - all the investments into Data and ML are killed by data access issues
  • 9. 1. Lack of ownership and domain context — A disconnect between data producers and data consumers 2. Backlogged data team struggling to keep pace with business demands 3. No Contracts between Data and ML Engineering 4. As a result, fast end-to-end experimentation is killed by complex dependencies between teams Common Challenges: Monolithic Data Teams https://martinfowler.com/articles/data-monolith-to-mesh.html
  • 10. Common Challenges: ML Experimentation Infrastructure 1. Inherited issues with Data Discovery and Data Access 2. Reproducibility of datasets, ML pipelines, ML Environments, and offline experiments is still an issue 3. Production Experimentation frameworks are fairly immature yet 4. As a result, the cost of an end-to-end experiment from data to production ML metric is 3-6 months https://hbr.org/2020/03/building-a-culture-of-experimentation
  • 11. Common Challenges: Scaling ML Adoption in Production 1. Online serving. There is no unified and consistent way to access features during model serving. 2. Impossible to reuse features between multiple training pipelines and ML applications. 3. Monitoring and maintenance of ML Applications. 4. As a result, time and cost to scale from 1 to 100 models in production is growing exponentially. What is your cost per ML Model in Production?
  • 13. Emerging Architectural Shifts Data Lake -> Hudi/Delta Lakes Hudi/Delta Lakes bring managed ingestion, ACID transactions and point in time queries into traditional Data Lakes Data Lake -> Data Mesh Ownership of data domains, data pipelines, metadata, and API is shifting from centralized teams to product teams Data Lake -> Data Infrastructure as a platform Unified reusable platform components and frameworks across enterprise Endpoint Protection -> Global Data Governance Data Security and privacy measures are becoming centralized as part of Data Platform Metadata Store -> Global Data Catalog User Experience around data discovery, lineage, and versioning requires investments into metadata-rich Data Catalog Feature Store Scaling ML Experimentation and Operations requires a separate data management layer for ML Features ML Toolkit -> Complete ML Infrastructure ML capabilities are democratized for ML Engineers and citizen Data Scientists
  • 14. ACID Data Lakes ● Managed Ingestion ● Dataset versioning for ML training ● Cheap “Deletes” (common GDPR use case) ● Audit log to any changes in datasets ● Brings ACID transactions in your data lake ● “Upserts” strategy on data ingestion ● Enables schemas to enforce data quality Delta/Hudi Lakes
  • 15. Global Data Governance Accelerate privacy operations with data you already have. Automate business processes, data mapping, and PI discovery and classification for privacy workflows. Operationalize policies in a central location. Govern privacy policies to ensure policies are effectively managed across the enterprise. Define and document workflows, traceability views, and business process registers. Scale compliance across multiple regulations. Use a platform designed and built with privacy in mind that is easily extensible to support new regulations. AWS Config AWS Lake Formation
  • 16. Global Data Catalog Meta-metadata store: ● Does this data exist? Where is it? ● What is the source of truth of this data? ● Do I have access? ● Who is the owner? ● Who are the users of this data? ● Are there existing assets I can reuse? ● Can I trust this data? * There are no established leaders in open source
  • 17. The Core of MLOps and Reproducible Experimentation Pipelines Model Code ML Pipeline Code Infrastructure as a Code Versioned Dataset Production Metrics & Alerts Model Artifacts Prediction Service ML Metrics Automated Pipeline Execution Pipeline Metadata Alerts Reports Feature Store Orchestration: Idempotent Execution Feedback Loop for Production Data
  • 19. Feature Store Value Proposition A data management layer for machine learning features. 1. Better ROI from feature engineering through reduction of cost per model — Facilitates collaboration, sharing, and reusing of features 2. Faster time to market for new models through increased productivity of ML Engineers - Decoupled storage implementation and features serving API
  • 20. ● Personalization & Recommendation Engines ● Dynamic Pricing Optimization ● Supply Chain Optimization ● Logistics and Transportation Optimization Feature Store: Canonical Use Cases ● Fraud Detection ● Predictive Maintenance ● Demand Forecasting * All the use cases where ML models need a stateful ever changing representation of the system
  • 21. ● Online Feature Store Online applications look up for a feature vector that is sent to an ML model for predictions ● ML specific Metadata Enables features discoverability and reuse Feature Store: Concepts ● ML Specific API and SDK High level operations for fetching training feature sets and online access ● Materialized Versioned Datasets Maintains versions of featuresets used to train ML models Raw Data Feature StoreFeature Engineering Training Serving Discovery
  • 22. Platform License Supported Platforms Feast (now backed by Tecton) Apache V2 AWS (in roadmap), GCP Uber Michelangelo In-house product N/A Hopsworks AGPL-V3 AWS, GCP, On-Premises Tecton Enterprise AWS, GCP & Azure (2021) Airbnb Zipline In-house product N/A Comcast In-house product N/A Netflix Metaflow In-house product N/A Twitter In-house product N/A Facebook FBLearner In-house product N/A Pinterest Galaxy In-house product N/A Feature Store: Market
  • 23. Pros: ● Battle-tested with GoJek, Farfetch, Postmates, and Zulily ● Integrated with Kubeflow ● Good community Cons (to be addressed in the roadmap): ● GCP only ● Infrastructure-heavy ● Lacks composability ● No Data Versioning * Now backed by Tecton * https://blog.feast.dev/post/a-state-of-feast Feast Offline Store (BigQuery) Online Serving Historical Serving Feature Registry Online Store (Redis) Ingestion Training Discovery Serving Ingestion API Ingestion
  • 24. Pros: ● Integrates with most Python libs for ingestion and training ● Supports offline store with time travel ● AWS / GCP / Azure / On-Prem Ready Cons: ● Hard to use out of HopsML infrastructure ● Online store might not fit all latency requirements * Online serving is part of Enterprise version Hopsworks Feature Registry Offline Store (Hudi/Hive) Online Serving Historical Serving Spark Online Store (My SQL) Training Discovery Serving Pandas Ingestion API
  • 25. Raw Data Hot Storage Event Data Stream Processing BI Tools API Batch Processing Cold Storage Workflow Automation Data Catalog Data Quality Data Security Modern Data Infrastructure
  • 26. Feature Store Raw Data Hot Storage Event Data Stream Feature Processing Training Serving Batch Feature Processing Cold Storage Workflow Automation Data Catalog Data Quality Data Security Data Infrastructure
  • 27. 1. Start with designing consistent ACID Data Lake before investing into Feature Store 2. Value from existing open source products does not justify investments into integration and the dependencies they bring 3. Feature Store must not bring about new infrastructure and data storage solutions. It has to be a lightweight API and SDK integrated into your existing data infrastructure. 4. Data Catalog, Data Governance, and Data Quality components are horizontal for the whole Data Infrastructure, including Feature Store 5. There are no mature open source or cloud solutions for Global Data Catalog and Data Quality monitoring. Lessons Learned
  • 28. Data Infrastructure with Feature Store Raw Data Hot Storage Event Data Stream Processing BI Tools API Batch Processing Cold Storage Workflow Automation Training Serving Feature Store API Data Catalog Data Quality Data Security
  • 29. Reference Architecture Raw Data Hot Storage Event Data Stream Processing BI Tools API Batch Processing Cold Storage Workflow Automation Training Serving Feature Store API Data Catalog Data Quality Data Security
  • 30. Reference Architecture: Components Cold Storage Hot Storage Data Catalog Data Quality Great Expectations DEEQU Feature Store API ?Glue Metadata ? ?
  • 31. Recommendations for going forward with Feature Store: 1. Make sure your existing Data Infrastructure covers 90% of Feature Store requirements (Streaming Ingestion, Consistency, Catalog, Versioning) 2. Build in-house a lightweight Feature Store API to your existing storage solutions 3. Collaborate with community and cloud vendors to maintain compatibility with standards and state of the art ecosystem 4. Be ready to migrate to managed service or an open source alternative as the market matures Recommended Strategy
  • 32. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 32 © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark Gandhi Raketla, Senior Solutions Architect Feature Store on AWS
  • 33. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 33 AWS Feature Storage Capabilities ✔ Reuse - Use the existing feature store pipeline developed by data engineers to re-compute and cache features in a feature store ✔ Store - Store the metadata of features such as a description, documentation, and statistical measures of features in the feature store. ✔ Discover - Make the metadata searchable through an API to ML practitioners ✔ Govern - Add a data management layer on top of the feature store for governance and access control ✔ Consume - Allow ML practitioners to query and consume features using an API to export the features for training or real-time inference
  • 34. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 34 Components Of Feature Store Storage • S3 • DynamoDB • Redis • Aurora Catalog • Glue Crawler • Glue ETL • Glue Catalog Query/API • Athena • Lambda
  • 35. © 2020, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Amazon Confidential and Trademark 35 Storage
  • 36. Performance at scale Consistent, single-digit millisecond response times at any scale; build applications with virtually unlimited throughput Serverless architecture No hardware provisioning, software patching, or upgrades; scales up or down automatically; continuously backs up your data Global replication You can build global applications with fast access to local data by easily replicating tables across multiple AWS Regions Enterprise security Encrypts all data by default and fully integrates with AWS Identity and Access Management for robust security Amazon DynamoDB Fast and flexible key-value database service for any scale
  • 37. Read scaling with replicas; write and memory scaling with sharding; nondisruptive scaling Unlimited scale AWS manages all hardware and software setup, configuration, and monitoring Fully managed In-memory data store and cache for sub-millisecond response times Consistent high performance Amazon ElastiCache Managed, Redis, or Memcached-compatible in-memory data store
  • 38. Performance & scalability 5x throughput of standard MySQL and 3x of standard PostgreSQL; scale out up to 15 read replicas Availability & durability Fault-tolerant, self-healing storage; 6 copies of data across 3 AZs; continuous backup to Amazon S3 Highly secure Network isolation, encryption at rest / in transit Fully managed Managed by Amazon RDS: On your part, no server provisioning, software patching, setup, configuration, or backups Amazon Aurora MySQL and PostgreSQL-compatible relational database built for the cloud
  • 40. AWS Glue: Components Data Catalog ▪ Hive Metastore compatible with enhanced functionality ▪ Crawlers automatically extracts metadata and creates tables ▪ Integrated with Amazon Athena, Amazon Redshift Spectrum Job Execution ▪ Run jobs on a serverless Spark platform ▪ Provides flexible scheduling ▪ Handles dependency resolution, monitoring and alerting Job Authoring ▪ Auto-generates ETL code ▪ Build on open frameworks – Python and Spark ▪ Developer-centric – editing, debugging, sharing
  • 41. Query
  • 42. Amazon Athena Pay per query Pay only for queries run Save 30–90% on per-query costs through compression Use S3 storage ANSI SQL JDBC/ODBC drivers Multiple formats, compression types, and complex joins and data types SQ L Serverless: zero infrastructure, zero administration Integrated with QuickSight EasyQuery instantly Zero setup cost Point to S3 and start querying Serverless, interactive query service Analytics
  • 43. Questions, details? We would be happy to answer! 125 University Avenue Suite 290, Palo Alto California, 94301 provectus.com