SlideShare a Scribd company logo
1 of 55
Download to read offline
https://tag.bio • spadhi@tag.bio
Join us: Tag.bio community on Slack
Tag.bio: Self Service Data Mesh Platform
Your questions. Your data. Your answers.
NSF Big Data Hub: Data Sharing and Cyberinfrastructure Meeting
Sanjay Padhi
Chief Technologist
Executive Vice President
Abstract:
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data
warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as
domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products
combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights
using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned,
reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive
complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the
platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are
using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without
explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy
centric data products (confidential computing) as well as integration with cloud services
2
Agenda
● Introduction
● Data as a Product
● Data Products in a Mesh
● Platform for Collaboration
● Platform for Developers and Integrators
● Demo: Analysis Platform and Developer Studio
● Partnerships with Cloud providers and NIH STRIDES
● Q&A
3
Source: Computing Perspectives: 25th International Conference on Computing in High-Energy and Nuclear Physics, 2021 4
CERN: Project Approach with Distributed Storage
Distributed data management and storage is expensive – hardware and operations
Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices
NIH Data Commons: Project approach with Data Lake(s)
Research projects ain’t cheap; the average award for an NIH grant is about half a million dollars.
5
Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices
Data Lake based approach with workspaces and Jupyter Notebooks for analysis
6
Source: Susan Gregurick (2020): STRIDES and NIH-supported biomedical data sharing
As of July 2020
It takes months-to-years to derive insights
7
NIH STRIDES (2018 - ): Turning Research Data Into Knowledge and Discovery
8
Consumers
millions
data
DATA DATA
Life Sciences: Data Growth during Drug development process
9
Data Warehouse(s)
Source: Databricks
Structured Data
Historical - used 40+ years
Coupled Compute and Storage into a single entity: Multiple Data Warehouses
- Metadata layer (where data is located)
- A data model – an abstraction in the data warehouse
- Data lineage – the tale of the origins and transformations of data in the
warehouse
- Summarization – algorithmic work designed to create the data
- KPIs – where are key performance indicators found
- ETL – enabled application data to be transformed into corporate data
Limitations:
- AI/ML introduce iterative algorithms with direct data access (not always SQL based)
- variety of datasets that are not always structured (text, IoT, Objects, Binary)
10
Data Lakes and Lake-houses
Source: Databricks
Data Architecture(s)
11
● Data Warehouse(s) - Direct coupling between compute and storage
● Distributed to Centralized Data Storage and Compute
● Data Lakes
● Date Lakehouse
● Data Products and Mesh
Ways to communicate (information sharing) via APIs also evolved:
● Salesforce (2000) - added APIs on top of applications
● Facebook (2006) - gave developers access to user informations (photo, profiles, events)
● Google (2006) - share massive geographical data via APIs
● Twilio (2008) - Created an API for their entire product line (Calls, Texts)
12
Project Vs Product Approach towards Data Architecture
Data Product
Data products represent a harmonized, decentralized application layer on top of disparate data sources.
Along with employing a universal “smart” API, they also present a simple, clean, standardized data model for apps and data scientists who
would do queries and extract data frames.
Apps
Data to Data Product
Data as a Product - Tag.bio
13
1. Data (data engineers)
2. Algorithms (data scientists)
3. Analysis apps (domain experts)
Smart API
Data
Map
Algorithms
Analysis apps
2
3
1
Tag.bio Data Products
Bringing together 3 things and 3 groups
14
Components
15
All data products are built with 4 components:
1. Source data in a schema
2. Runtime business logic that can be performed
on source data upon request
3. Smart API to invoke requests and return
responses
4. SDKs/Clients which enable communication
between other systems and the API
Data
Map
Algorithms
Smart API
1
2
3
4
16
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Cross-Functional Data Team/Role:
B. Data Scientist
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Data Sources
Siloed data
Data
warehouses
Data lakes
Data products
DNA-Seq
RNA-Seq
Proteomics
Flow cytometry
Clinical trials
Data Types
Data Formats
CSV
JSON
SPARK
XML
SQL
Machine behavior
& maintenance
Other data types
Emerging data
types
17
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Cross-Functional Data Team:
B. Data Scientist: Integrated (ML) algorithms with interface to
R, Python, ML/AI as analysis apps that
researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
18
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Cross-Functional Data Team:
Uses no-code, guided analysis apps to ask
& answer their own questions.
B. Data Scientist: Integrates R, Python, ML/AI as analysis apps
that researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher:
Single Cell Gene
Expression
Rmarkdown Gene
Signature Report
Elastic Net Cross
Validation
19
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Cross-Functional Data Team:
Uses no-code, guided analysis apps to ask
& answer their own questions.
B. Data Scientist: Integrates R, Python, ML/AI as analysis apps
that researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher:
Maximize the value of your data
with domain-driven data products
S
e
q
u
e
n
c
e
/
g
r
a
p
h
C
o
m
p
a
r
i
s
o
n
S
t
a
t
s
C
l
u
s
t
e
r
i
n
g
Module API
Client
JSON
XML
CSV
SQL
Spark
Key-value
A
n
a
l
y
s
i
s
(
A
P
I
)
R
e
g
r
e
s
s
i
o
n
Prediction
Exploration
Data extraction
M
a
c
h
i
n
e
L
e
a
r
n
i
n
g
How does it work?
Data
Map
Algorithms R & Python Plugin
Data Mapped
1
3
Data Product is a Source
of versioned, immutable,
integrated data.
Developer Studio
(coder)
AI/ML
Analysis Platform
(domain expert UI)
Point and Click
analysis Apps
Domain experts
Notebook
integration
Data
Scientists
Smart API
Data
Map Algorithms
2
20
Data Mesh
It’s a paradigm shift to treat data as a product
Data mesh encompasses data products
that are oriented around domains & owned by cross-functional data teams
21
Zhamak Dehghani: Data Mesh: A Paradigm Shift in Data Architecture
Pharma: Domain Driven Workloads
Drug Development Process
Disparate data types slow the drug development process
22
Clinical
Trials
Preclinical
Basic
Research
Regulatory
Review
RWE &
Patient care
Biomarkers Omics Model
Organisms
Phase I Phase II Phase III Patient
Registries
Phase IV
Regulatory
Submissions
What Happens When You Apply Data Mesh To Pharma?
Biomarkers
Model
Organisms
Phase I
Drug Development Process
Harmonized, connected data sources accelerate drug development
Phase II
Phase III
Omics
Patient
Registries
Phase IV
23
Clinical
Trials
Preclinical
Basic
Research
Regulatory
Review
RWE &
Patient care
Regulatory
What Happens When You Turn Data into a Product?
Streamlined data analysis process
VS.
Data Scientist
Researchers
Data Engineer
Data Warehouses
Analysis Platform
Data Product
Data Product
?
Data Mesh
24
Data Lakes
Siloed Data
Data Product
Months Minutes
Researchers
? ? ? ?!
?!
Data Mesh
Distributed data products
connected into a data mesh
2
25
A customizable self service (end-to-end) data mesh platform
What Is Tag.bio?
Data Product
Domain-driven, harmonized &
decentralized application layer
1
Analysis Environment
Data analysis environment for
researchers & data scientists
3
Data Product
any
cloud
26
Data products deployed in an interoperable data mesh
Tag.bio Data Mesh
Smart API
Data
Map
Algorithms
Analysis apps
Data Product
Data mesh enables organizations to:
● Connect data sources without moving data
● Rapidly add new data types
● Connect all data sources to accelerate the
drug development cycle
Data Product
Data Product
Data Product
27
Data analysis environment to access data mesh & use data products
Analysis Environment
Analysis Platform
for Researchers
Use data products with
no-code analysis apps that
speak their language.
Collaborate with Data Scientist
on how apps should work.
Developer Studio
for Data Scientists
Build data products using a
familiar, Jupyter
notebook-based setting.
Plug them into the Analysis
Platform for researchers to use.
28
How Are Organizations
Using Tag.bio’s
- Data Mesh Platform
29
Top 5 Pharma: Translational Oncology
Harmonized 10+ Clinical Trials
Working towards comprehensive
harmonization of
all past & future trials
Example: Phase-III biomarker analysis
Reference: https://www.nature.com/articles/s41591-020-1044-8 Figure 1b
https://demo.tag.bio/node/fc-nct02684006-refined/cox_survival_protocol/results?
param_reconfig=2784
30
Example: TCGA Pan-Cancer ATLAS - UMAP Expression Clustering
Reference: https://pubmed.ncbi.nlm.nih.gov/29628290/
Analyzing 1000s
of Flow Cytometry
Samples
The Jackson Laboratory
Enabling users to analyze
samples from various
immunocompromised mouse
strains with xenografts from
human donors
32
HIPAA and California’s Confidentiality of Medical Information Act (CMIA) Compliant environment
https://medschool.ucsd.edu/research/actri/Informatics/Health-Data-Portal/services/Pages/Virtual-Research-Desktop-VRD.aspx
https://campuslisa.ucsd.edu/_files/2020%20Campus%20LISA_HC_Data_mesh_.pdf
Data Products in action at UCSD
33
34
More Examples
Analyzing Phase IV & RWE Data
Top 50 Pharma
Looking at both drug & medication-adherence device clinical trials in
relation to schizophrenia
Immunotherapy & Single-Cell Omics
Cell Therapy Biotech
Deploying an array of proprietary & public-domain data products —
enabling users to investigate & discover gene expression markers with
respect to cell types
Showing how our customers fit into Drug Dev lifecycle
Biotech’s
Cell Therapy,Transplant
Large Pharma’s
Immunology, Oncology, and Neurology
CRO
RWE
CRO
Omics, IHC, TCR
Basic Research
Mouse Models and Other
AMCs - UCSD, UCSF
Value based Healthcare and
Patient Registries
35
36
Next Stage Of Data Evolution
1. Harmonize Data 2. Connect Data Products 3. Accelerate Outputs
Data Warehouses
Data Lakes
Siloed Data
Flat Files
Data Product
Data Product
Data Product
Smart API
Data
Map
Algorithms
Analysis apps
Data Product
Real-time answers,
self-service analysis
Validations, publications,
submissions.
Map data into data products FAIR data (findable, accessible, interoperable, reusable) Saved, shareable, reproducible, full QC
Tag.bio for Collaboration
37
Clinical Trials
Population Health
Clinical Decisions Discovery Biology
Data Mesh
Data Product 1
Data Product 2
Data Product 5
Data Product 3
Data Product 4
The data mesh connects groups to collaborative analysis resources to
form a data driven culture
Collaboration within an organization
38
Different types of data product act together as a
functional data mesh
Annotation
i.e. Gene, Variant,
Demographic, Identifying
data
Proprietary annotation
Domain Specific
Analysis
(Pan-Cancer TCGA
Patient Healthcare)
Usage
Full history of all
user activity
39
Organization 2
Governed access to selected data
products and apps
Clinical Research
COVID
Patient Registries
Oncology
Chronic Inflammation
Autoimmunity
Organization 1
Governed access to selected
data products and apps
Data Mesh
Data Product
1
Data Product
2
Data Product
5
Data Product 3
Data Product 4
How organizations collaborate via data product
Data Products
(in cloud account of organization)
Collaborator
(VPC/Private Link access to data products) 40
41
Tag.bio data exchange: Collaboration with Parkinson’s Foundation to provide data products to researchers
Tag.bio for developers
42
43
A two sided data environment
to enable real time collaboration
Analysis Platform
for Domain Experts:
No-code analysis apps
that speak your
language
Developer Studio
for Data Scientists:
Familiar Jupyter
Notebook-based
Developer Studio
Integration with Cloud Services: AI/ML
44
45
45
Integration with AWS Services: AI/ML
46
FHIR: Integration with Amazon HealthLake
47
Monitoring and Auto-deployment of products
48
Data Portal (domain expert)
http://demo.tag.bio/
Demo
49
Developer Studio (data scientist)
https://jupyter-aws-demo.tag.bio/
50
How To Get Started?
Tag.bio Resource Center: Knowledge base, Training & Tutorials (to build apps and data products)
https://tag.bio/company/contact-us/
AWS Marketplace Offerings
https://aws.amazon.com/marketplace/pp/prodview-dld5ezl4nh6us 52
How can (NIH funded) researchers access Tag.bio?
53
https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
How can NIH ICOs access Tag.bio?
54
https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
55
Data Products Data Mesh Self-Service Platform
Real time questions to answers Connect proprietary and public data Fully versioned and reproducible
Cross study comparison Pull in annotation automatically Aut-deployed, tested and scalable
UI’s for coders and
clickers
Bring the analysis to the
data
Collaboration between users, groups, and
organizations
Apps
Tag.bio is a “datamesh in a box”
Thank You! Questions?

More Related Content

What's hot

Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
Databricks
 

What's hot (20)

[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
 
Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)Building a Logical Data Fabric using Data Virtualization (ASEAN)
Building a Logical Data Fabric using Data Virtualization (ASEAN)
 
Enterprise guide to building a Data Mesh
Enterprise guide to building a Data MeshEnterprise guide to building a Data Mesh
Enterprise guide to building a Data Mesh
 
Neo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data ScienceNeo4j: The path to success with Graph Database and Graph Data Science
Neo4j: The path to success with Graph Database and Graph Data Science
 
Introducing Databricks Delta
Introducing Databricks DeltaIntroducing Databricks Delta
Introducing Databricks Delta
 
Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3Webinar Data Mesh - Part 3
Webinar Data Mesh - Part 3
 
Introducation to metadata
Introducation to metadataIntroducation to metadata
Introducation to metadata
 
Building Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta LakeBuilding Data Quality pipelines with Apache Spark and Delta Lake
Building Data Quality pipelines with Apache Spark and Delta Lake
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Azure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdfAzure+Databricks+Course+Slide+Deck+V4.pdf
Azure+Databricks+Course+Slide+Deck+V4.pdf
 
Considerations for Data Access in the Lakehouse
Considerations for Data Access in the LakehouseConsiderations for Data Access in the Lakehouse
Considerations for Data Access in the Lakehouse
 
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbtSiligong.Data - May 2021 - Transforming your analytics workflow with dbt
Siligong.Data - May 2021 - Transforming your analytics workflow with dbt
 
data-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptxdata-analytics-strategy-ebook.pptx
data-analytics-strategy-ebook.pptx
 
Data as a Product by Wayne Eckerson
Data as a Product by Wayne EckersonData as a Product by Wayne Eckerson
Data as a Product by Wayne Eckerson
 
Building an integrated data strategy
Building an integrated data strategyBuilding an integrated data strategy
Building an integrated data strategy
 
Data Modeling on Azure for Analytics
Data Modeling on Azure for AnalyticsData Modeling on Azure for Analytics
Data Modeling on Azure for Analytics
 
Databricks Delta Lake and Its Benefits
Databricks Delta Lake and Its BenefitsDatabricks Delta Lake and Its Benefits
Databricks Delta Lake and Its Benefits
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Databricks Platform.pptx
Databricks Platform.pptxDatabricks Platform.pptx
Databricks Platform.pptx
 
Peek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and RoadmapPeek into Neo4j Product Strategy and Roadmap
Peek into Neo4j Product Strategy and Roadmap
 

Similar to Tag.bio: Self Service Data Mesh Platform

Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 

Similar to Tag.bio: Self Service Data Mesh Platform (20)

Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021 Tag.bio aws public jun 08 2021
Tag.bio aws public jun 08 2021
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
Platform for Big Data Analytics and Visual Analytics: CSIRO use cases. Februa...
 
The NIH Data Commons - BD2K All Hands Meeting 2015
The NIH Data Commons -  BD2K All Hands Meeting 2015The NIH Data Commons -  BD2K All Hands Meeting 2015
The NIH Data Commons - BD2K All Hands Meeting 2015
 
Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2Bonazzi commons bd2 k ahm 2016 v2
Bonazzi commons bd2 k ahm 2016 v2
 
Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19Big Data Driven Solutions to Combat Covid' 19
Big Data Driven Solutions to Combat Covid' 19
 
A Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate DataA Gen3 Perspective of Disparate Data
A Gen3 Perspective of Disparate Data
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
 
BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands BD2K and the Commons : ELIXR All Hands
BD2K and the Commons : ELIXR All Hands
 
Advanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data VirtualizationAdvanced Analytics and Machine Learning with Data Virtualization
Advanced Analytics and Machine Learning with Data Virtualization
 
NIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data CommonsNIH Data Summit - The NIH Data Commons
NIH Data Summit - The NIH Data Commons
 
Ss eb29
Ss eb29Ss eb29
Ss eb29
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Hughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication RepositoriesHughes RDAP11 Data Publication Repositories
Hughes RDAP11 Data Publication Repositories
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Unlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data VirtualizationUnlock Your Data for ML & AI using Data Virtualization
Unlock Your Data for ML & AI using Data Virtualization
 
Managing R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute InfrastructureManaging R&D Data on Parallel Compute Infrastructure
Managing R&D Data on Parallel Compute Infrastructure
 
Qo Introduction V2
Qo Introduction V2Qo Introduction V2
Qo Introduction V2
 
Analytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data PlatformAnalytical Innovation: How to Build the Next Generation Data Platform
Analytical Innovation: How to Build the Next Generation Data Platform
 

Recently uploaded

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
cnajjemba
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 

Recently uploaded (20)

+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...Sequential and reinforcement learning for demand side management by Margaux B...
Sequential and reinforcement learning for demand side management by Margaux B...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 

Tag.bio: Self Service Data Mesh Platform

  • 1. https://tag.bio • spadhi@tag.bio Join us: Tag.bio community on Slack Tag.bio: Self Service Data Mesh Platform Your questions. Your data. Your answers. NSF Big Data Hub: Data Sharing and Cyberinfrastructure Meeting Sanjay Padhi Chief Technologist Executive Vice President
  • 2. Abstract: The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces. Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services 2
  • 3. Agenda ● Introduction ● Data as a Product ● Data Products in a Mesh ● Platform for Collaboration ● Platform for Developers and Integrators ● Demo: Analysis Platform and Developer Studio ● Partnerships with Cloud providers and NIH STRIDES ● Q&A 3
  • 4. Source: Computing Perspectives: 25th International Conference on Computing in High-Energy and Nuclear Physics, 2021 4 CERN: Project Approach with Distributed Storage Distributed data management and storage is expensive – hardware and operations
  • 5. Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices NIH Data Commons: Project approach with Data Lake(s) Research projects ain’t cheap; the average award for an NIH grant is about half a million dollars. 5
  • 6. Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices Data Lake based approach with workspaces and Jupyter Notebooks for analysis 6
  • 7. Source: Susan Gregurick (2020): STRIDES and NIH-supported biomedical data sharing As of July 2020 It takes months-to-years to derive insights 7 NIH STRIDES (2018 - ): Turning Research Data Into Knowledge and Discovery
  • 8. 8 Consumers millions data DATA DATA Life Sciences: Data Growth during Drug development process
  • 9. 9 Data Warehouse(s) Source: Databricks Structured Data Historical - used 40+ years Coupled Compute and Storage into a single entity: Multiple Data Warehouses - Metadata layer (where data is located) - A data model – an abstraction in the data warehouse - Data lineage – the tale of the origins and transformations of data in the warehouse - Summarization – algorithmic work designed to create the data - KPIs – where are key performance indicators found - ETL – enabled application data to be transformed into corporate data Limitations: - AI/ML introduce iterative algorithms with direct data access (not always SQL based) - variety of datasets that are not always structured (text, IoT, Objects, Binary)
  • 10. 10 Data Lakes and Lake-houses Source: Databricks
  • 11. Data Architecture(s) 11 ● Data Warehouse(s) - Direct coupling between compute and storage ● Distributed to Centralized Data Storage and Compute ● Data Lakes ● Date Lakehouse ● Data Products and Mesh Ways to communicate (information sharing) via APIs also evolved: ● Salesforce (2000) - added APIs on top of applications ● Facebook (2006) - gave developers access to user informations (photo, profiles, events) ● Google (2006) - share massive geographical data via APIs ● Twilio (2008) - Created an API for their entire product line (Calls, Texts)
  • 12. 12 Project Vs Product Approach towards Data Architecture Data Product
  • 13. Data products represent a harmonized, decentralized application layer on top of disparate data sources. Along with employing a universal “smart” API, they also present a simple, clean, standardized data model for apps and data scientists who would do queries and extract data frames. Apps Data to Data Product Data as a Product - Tag.bio 13
  • 14. 1. Data (data engineers) 2. Algorithms (data scientists) 3. Analysis apps (domain experts) Smart API Data Map Algorithms Analysis apps 2 3 1 Tag.bio Data Products Bringing together 3 things and 3 groups 14
  • 15. Components 15 All data products are built with 4 components: 1. Source data in a schema 2. Runtime business logic that can be performed on source data upon request 3. Smart API to invoke requests and return responses 4. SDKs/Clients which enable communication between other systems and the API Data Map Algorithms Smart API 1 2 3 4
  • 16. 16 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Cross-Functional Data Team/Role: B. Data Scientist A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher Smart API Data Map Algorithms Analysis apps A B C Data Sources Siloed data Data warehouses Data lakes Data products DNA-Seq RNA-Seq Proteomics Flow cytometry Clinical trials Data Types Data Formats CSV JSON SPARK XML SQL Machine behavior & maintenance Other data types Emerging data types
  • 17. 17 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Cross-Functional Data Team: B. Data Scientist: Integrated (ML) algorithms with interface to R, Python, ML/AI as analysis apps that researchers can use. A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher Smart API Data Map Algorithms Analysis apps A B C
  • 18. 18 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Smart API Data Map Algorithms Analysis apps A B C Cross-Functional Data Team: Uses no-code, guided analysis apps to ask & answer their own questions. B. Data Scientist: Integrates R, Python, ML/AI as analysis apps that researchers can use. A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher: Single Cell Gene Expression Rmarkdown Gene Signature Report Elastic Net Cross Validation
  • 19. 19 Domain-driven, harmonized, decentralized application layer Tag.bio Data Product Smart API Data Map Algorithms Analysis apps A B C Cross-Functional Data Team: Uses no-code, guided analysis apps to ask & answer their own questions. B. Data Scientist: Integrates R, Python, ML/AI as analysis apps that researchers can use. A. Data Engineer: Maps data into the data product that data scientists can use to build analysis apps. C. Researcher: Maximize the value of your data with domain-driven data products
  • 20. S e q u e n c e / g r a p h C o m p a r i s o n S t a t s C l u s t e r i n g Module API Client JSON XML CSV SQL Spark Key-value A n a l y s i s ( A P I ) R e g r e s s i o n Prediction Exploration Data extraction M a c h i n e L e a r n i n g How does it work? Data Map Algorithms R & Python Plugin Data Mapped 1 3 Data Product is a Source of versioned, immutable, integrated data. Developer Studio (coder) AI/ML Analysis Platform (domain expert UI) Point and Click analysis Apps Domain experts Notebook integration Data Scientists Smart API Data Map Algorithms 2 20
  • 21. Data Mesh It’s a paradigm shift to treat data as a product Data mesh encompasses data products that are oriented around domains & owned by cross-functional data teams 21 Zhamak Dehghani: Data Mesh: A Paradigm Shift in Data Architecture
  • 22. Pharma: Domain Driven Workloads Drug Development Process Disparate data types slow the drug development process 22 Clinical Trials Preclinical Basic Research Regulatory Review RWE & Patient care Biomarkers Omics Model Organisms Phase I Phase II Phase III Patient Registries Phase IV Regulatory Submissions
  • 23. What Happens When You Apply Data Mesh To Pharma? Biomarkers Model Organisms Phase I Drug Development Process Harmonized, connected data sources accelerate drug development Phase II Phase III Omics Patient Registries Phase IV 23 Clinical Trials Preclinical Basic Research Regulatory Review RWE & Patient care Regulatory
  • 24. What Happens When You Turn Data into a Product? Streamlined data analysis process VS. Data Scientist Researchers Data Engineer Data Warehouses Analysis Platform Data Product Data Product ? Data Mesh 24 Data Lakes Siloed Data Data Product Months Minutes Researchers ? ? ? ?! ?!
  • 25. Data Mesh Distributed data products connected into a data mesh 2 25 A customizable self service (end-to-end) data mesh platform What Is Tag.bio? Data Product Domain-driven, harmonized & decentralized application layer 1 Analysis Environment Data analysis environment for researchers & data scientists 3 Data Product
  • 26. any cloud 26 Data products deployed in an interoperable data mesh Tag.bio Data Mesh Smart API Data Map Algorithms Analysis apps Data Product Data mesh enables organizations to: ● Connect data sources without moving data ● Rapidly add new data types ● Connect all data sources to accelerate the drug development cycle Data Product Data Product Data Product
  • 27. 27 Data analysis environment to access data mesh & use data products Analysis Environment Analysis Platform for Researchers Use data products with no-code analysis apps that speak their language. Collaborate with Data Scientist on how apps should work. Developer Studio for Data Scientists Build data products using a familiar, Jupyter notebook-based setting. Plug them into the Analysis Platform for researchers to use.
  • 28. 28 How Are Organizations Using Tag.bio’s - Data Mesh Platform
  • 29. 29 Top 5 Pharma: Translational Oncology Harmonized 10+ Clinical Trials Working towards comprehensive harmonization of all past & future trials
  • 30. Example: Phase-III biomarker analysis Reference: https://www.nature.com/articles/s41591-020-1044-8 Figure 1b https://demo.tag.bio/node/fc-nct02684006-refined/cox_survival_protocol/results? param_reconfig=2784 30
  • 31. Example: TCGA Pan-Cancer ATLAS - UMAP Expression Clustering Reference: https://pubmed.ncbi.nlm.nih.gov/29628290/
  • 32. Analyzing 1000s of Flow Cytometry Samples The Jackson Laboratory Enabling users to analyze samples from various immunocompromised mouse strains with xenografts from human donors 32
  • 33. HIPAA and California’s Confidentiality of Medical Information Act (CMIA) Compliant environment https://medschool.ucsd.edu/research/actri/Informatics/Health-Data-Portal/services/Pages/Virtual-Research-Desktop-VRD.aspx https://campuslisa.ucsd.edu/_files/2020%20Campus%20LISA_HC_Data_mesh_.pdf Data Products in action at UCSD 33
  • 34. 34 More Examples Analyzing Phase IV & RWE Data Top 50 Pharma Looking at both drug & medication-adherence device clinical trials in relation to schizophrenia Immunotherapy & Single-Cell Omics Cell Therapy Biotech Deploying an array of proprietary & public-domain data products — enabling users to investigate & discover gene expression markers with respect to cell types
  • 35. Showing how our customers fit into Drug Dev lifecycle Biotech’s Cell Therapy,Transplant Large Pharma’s Immunology, Oncology, and Neurology CRO RWE CRO Omics, IHC, TCR Basic Research Mouse Models and Other AMCs - UCSD, UCSF Value based Healthcare and Patient Registries 35
  • 36. 36 Next Stage Of Data Evolution 1. Harmonize Data 2. Connect Data Products 3. Accelerate Outputs Data Warehouses Data Lakes Siloed Data Flat Files Data Product Data Product Data Product Smart API Data Map Algorithms Analysis apps Data Product Real-time answers, self-service analysis Validations, publications, submissions. Map data into data products FAIR data (findable, accessible, interoperable, reusable) Saved, shareable, reproducible, full QC
  • 38. Clinical Trials Population Health Clinical Decisions Discovery Biology Data Mesh Data Product 1 Data Product 2 Data Product 5 Data Product 3 Data Product 4 The data mesh connects groups to collaborative analysis resources to form a data driven culture Collaboration within an organization 38
  • 39. Different types of data product act together as a functional data mesh Annotation i.e. Gene, Variant, Demographic, Identifying data Proprietary annotation Domain Specific Analysis (Pan-Cancer TCGA Patient Healthcare) Usage Full history of all user activity 39
  • 40. Organization 2 Governed access to selected data products and apps Clinical Research COVID Patient Registries Oncology Chronic Inflammation Autoimmunity Organization 1 Governed access to selected data products and apps Data Mesh Data Product 1 Data Product 2 Data Product 5 Data Product 3 Data Product 4 How organizations collaborate via data product Data Products (in cloud account of organization) Collaborator (VPC/Private Link access to data products) 40
  • 41. 41 Tag.bio data exchange: Collaboration with Parkinson’s Foundation to provide data products to researchers
  • 43. 43 A two sided data environment to enable real time collaboration Analysis Platform for Domain Experts: No-code analysis apps that speak your language Developer Studio for Data Scientists: Familiar Jupyter Notebook-based Developer Studio
  • 44. Integration with Cloud Services: AI/ML 44
  • 45. 45 45
  • 46. Integration with AWS Services: AI/ML 46
  • 47. FHIR: Integration with Amazon HealthLake 47
  • 49. Data Portal (domain expert) http://demo.tag.bio/ Demo 49 Developer Studio (data scientist) https://jupyter-aws-demo.tag.bio/
  • 50. 50 How To Get Started?
  • 51. Tag.bio Resource Center: Knowledge base, Training & Tutorials (to build apps and data products) https://tag.bio/company/contact-us/
  • 53. How can (NIH funded) researchers access Tag.bio? 53 https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
  • 54. How can NIH ICOs access Tag.bio? 54 https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
  • 55. 55 Data Products Data Mesh Self-Service Platform Real time questions to answers Connect proprietary and public data Fully versioned and reproducible Cross study comparison Pull in annotation automatically Aut-deployed, tested and scalable UI’s for coders and clickers Bring the analysis to the data Collaboration between users, groups, and organizations Apps Tag.bio is a “datamesh in a box” Thank You! Questions?