The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned, reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy centric data products (confidential computing) as well as integration with cloud services
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Tag.bio: Self Service Data Mesh Platform
1. https://tag.bio • spadhi@tag.bio
Join us: Tag.bio community on Slack
Tag.bio: Self Service Data Mesh Platform
Your questions. Your data. Your answers.
NSF Big Data Hub: Data Sharing and Cyberinfrastructure Meeting
Sanjay Padhi
Chief Technologist
Executive Vice President
2. Abstract:
The global need to securely derive (instant) insights, have motivated data architectures from distributed storage, to data lakes, data
warehouses and lake-houses. In this talk we describe Tag.bio, a next generation data mesh platform that embeds vital elements such as
domain centricity/ownership, Data as Products, Self-serve architecture, with a federated computational layer. Tag.bio data products
combine data sets, smart APIs, statistical and machine learning algorithms into decentralized data products for users to discover insights
using FAIR Principles. Researchers can use its point and click (no-code) system to instantly perform analysis and share versioned,
reproducible results. The platform combines a dynamic cohort builder with analysis protocols and applications (low-code) to drive
complex analysis workflows. Applications within data products are fully customizable via R and Python plugins (pro-code), and the
platform supports notebook based developer environments with individual workspaces.
Join us for a talk/demo session on Tag.bio data mesh platform and learn how major pharma industries and university health systems are
using this technology to promote value based healthcare, precision healthcare, find cures for disease, and promote collaboration (without
explicitly moving data around). The talk also outlines Tag.bio secure data exchange features for real world evidence datasets, privacy
centric data products (confidential computing) as well as integration with cloud services
2
3. Agenda
● Introduction
● Data as a Product
● Data Products in a Mesh
● Platform for Collaboration
● Platform for Developers and Integrators
● Demo: Analysis Platform and Developer Studio
● Partnerships with Cloud providers and NIH STRIDES
● Q&A
3
4. Source: Computing Perspectives: 25th International Conference on Computing in High-Energy and Nuclear Physics, 2021 4
CERN: Project Approach with Distributed Storage
Distributed data management and storage is expensive – hardware and operations
5. Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices
NIH Data Commons: Project approach with Data Lake(s)
Research projects ain’t cheap; the average award for an NIH grant is about half a million dollars.
5
6. Source: Robert L. Grossman (2020): The Road from Data Commons to Data Ecosystems: Challenges, Opportunities, and Emerging Best Practices
Data Lake based approach with workspaces and Jupyter Notebooks for analysis
6
7. Source: Susan Gregurick (2020): STRIDES and NIH-supported biomedical data sharing
As of July 2020
It takes months-to-years to derive insights
7
NIH STRIDES (2018 - ): Turning Research Data Into Knowledge and Discovery
9. 9
Data Warehouse(s)
Source: Databricks
Structured Data
Historical - used 40+ years
Coupled Compute and Storage into a single entity: Multiple Data Warehouses
- Metadata layer (where data is located)
- A data model – an abstraction in the data warehouse
- Data lineage – the tale of the origins and transformations of data in the
warehouse
- Summarization – algorithmic work designed to create the data
- KPIs – where are key performance indicators found
- ETL – enabled application data to be transformed into corporate data
Limitations:
- AI/ML introduce iterative algorithms with direct data access (not always SQL based)
- variety of datasets that are not always structured (text, IoT, Objects, Binary)
11. Data Architecture(s)
11
● Data Warehouse(s) - Direct coupling between compute and storage
● Distributed to Centralized Data Storage and Compute
● Data Lakes
● Date Lakehouse
● Data Products and Mesh
Ways to communicate (information sharing) via APIs also evolved:
● Salesforce (2000) - added APIs on top of applications
● Facebook (2006) - gave developers access to user informations (photo, profiles, events)
● Google (2006) - share massive geographical data via APIs
● Twilio (2008) - Created an API for their entire product line (Calls, Texts)
13. Data products represent a harmonized, decentralized application layer on top of disparate data sources.
Along with employing a universal “smart” API, they also present a simple, clean, standardized data model for apps and data scientists who
would do queries and extract data frames.
Apps
Data to Data Product
Data as a Product - Tag.bio
13
14. 1. Data (data engineers)
2. Algorithms (data scientists)
3. Analysis apps (domain experts)
Smart API
Data
Map
Algorithms
Analysis apps
2
3
1
Tag.bio Data Products
Bringing together 3 things and 3 groups
14
15. Components
15
All data products are built with 4 components:
1. Source data in a schema
2. Runtime business logic that can be performed
on source data upon request
3. Smart API to invoke requests and return
responses
4. SDKs/Clients which enable communication
between other systems and the API
Data
Map
Algorithms
Smart API
1
2
3
4
16. 16
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Cross-Functional Data Team/Role:
B. Data Scientist
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Data Sources
Siloed data
Data
warehouses
Data lakes
Data products
DNA-Seq
RNA-Seq
Proteomics
Flow cytometry
Clinical trials
Data Types
Data Formats
CSV
JSON
SPARK
XML
SQL
Machine behavior
& maintenance
Other data types
Emerging data
types
17. 17
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Cross-Functional Data Team:
B. Data Scientist: Integrated (ML) algorithms with interface to
R, Python, ML/AI as analysis apps that
researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
18. 18
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Cross-Functional Data Team:
Uses no-code, guided analysis apps to ask
& answer their own questions.
B. Data Scientist: Integrates R, Python, ML/AI as analysis apps
that researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher:
Single Cell Gene
Expression
Rmarkdown Gene
Signature Report
Elastic Net Cross
Validation
19. 19
Domain-driven, harmonized, decentralized application layer
Tag.bio Data Product
Smart API
Data
Map
Algorithms
Analysis apps
A
B
C
Cross-Functional Data Team:
Uses no-code, guided analysis apps to ask
& answer their own questions.
B. Data Scientist: Integrates R, Python, ML/AI as analysis apps
that researchers can use.
A. Data Engineer: Maps data into the data product that data
scientists can use to build analysis apps.
C. Researcher:
Maximize the value of your data
with domain-driven data products
21. Data Mesh
It’s a paradigm shift to treat data as a product
Data mesh encompasses data products
that are oriented around domains & owned by cross-functional data teams
21
Zhamak Dehghani: Data Mesh: A Paradigm Shift in Data Architecture
22. Pharma: Domain Driven Workloads
Drug Development Process
Disparate data types slow the drug development process
22
Clinical
Trials
Preclinical
Basic
Research
Regulatory
Review
RWE &
Patient care
Biomarkers Omics Model
Organisms
Phase I Phase II Phase III Patient
Registries
Phase IV
Regulatory
Submissions
23. What Happens When You Apply Data Mesh To Pharma?
Biomarkers
Model
Organisms
Phase I
Drug Development Process
Harmonized, connected data sources accelerate drug development
Phase II
Phase III
Omics
Patient
Registries
Phase IV
23
Clinical
Trials
Preclinical
Basic
Research
Regulatory
Review
RWE &
Patient care
Regulatory
24. What Happens When You Turn Data into a Product?
Streamlined data analysis process
VS.
Data Scientist
Researchers
Data Engineer
Data Warehouses
Analysis Platform
Data Product
Data Product
?
Data Mesh
24
Data Lakes
Siloed Data
Data Product
Months Minutes
Researchers
? ? ? ?!
?!
25. Data Mesh
Distributed data products
connected into a data mesh
2
25
A customizable self service (end-to-end) data mesh platform
What Is Tag.bio?
Data Product
Domain-driven, harmonized &
decentralized application layer
1
Analysis Environment
Data analysis environment for
researchers & data scientists
3
Data Product
26. any
cloud
26
Data products deployed in an interoperable data mesh
Tag.bio Data Mesh
Smart API
Data
Map
Algorithms
Analysis apps
Data Product
Data mesh enables organizations to:
● Connect data sources without moving data
● Rapidly add new data types
● Connect all data sources to accelerate the
drug development cycle
Data Product
Data Product
Data Product
27. 27
Data analysis environment to access data mesh & use data products
Analysis Environment
Analysis Platform
for Researchers
Use data products with
no-code analysis apps that
speak their language.
Collaborate with Data Scientist
on how apps should work.
Developer Studio
for Data Scientists
Build data products using a
familiar, Jupyter
notebook-based setting.
Plug them into the Analysis
Platform for researchers to use.
32. Analyzing 1000s
of Flow Cytometry
Samples
The Jackson Laboratory
Enabling users to analyze
samples from various
immunocompromised mouse
strains with xenografts from
human donors
32
33. HIPAA and California’s Confidentiality of Medical Information Act (CMIA) Compliant environment
https://medschool.ucsd.edu/research/actri/Informatics/Health-Data-Portal/services/Pages/Virtual-Research-Desktop-VRD.aspx
https://campuslisa.ucsd.edu/_files/2020%20Campus%20LISA_HC_Data_mesh_.pdf
Data Products in action at UCSD
33
34. 34
More Examples
Analyzing Phase IV & RWE Data
Top 50 Pharma
Looking at both drug & medication-adherence device clinical trials in
relation to schizophrenia
Immunotherapy & Single-Cell Omics
Cell Therapy Biotech
Deploying an array of proprietary & public-domain data products —
enabling users to investigate & discover gene expression markers with
respect to cell types
35. Showing how our customers fit into Drug Dev lifecycle
Biotech’s
Cell Therapy,Transplant
Large Pharma’s
Immunology, Oncology, and Neurology
CRO
RWE
CRO
Omics, IHC, TCR
Basic Research
Mouse Models and Other
AMCs - UCSD, UCSF
Value based Healthcare and
Patient Registries
35
36. 36
Next Stage Of Data Evolution
1. Harmonize Data 2. Connect Data Products 3. Accelerate Outputs
Data Warehouses
Data Lakes
Siloed Data
Flat Files
Data Product
Data Product
Data Product
Smart API
Data
Map
Algorithms
Analysis apps
Data Product
Real-time answers,
self-service analysis
Validations, publications,
submissions.
Map data into data products FAIR data (findable, accessible, interoperable, reusable) Saved, shareable, reproducible, full QC
38. Clinical Trials
Population Health
Clinical Decisions Discovery Biology
Data Mesh
Data Product 1
Data Product 2
Data Product 5
Data Product 3
Data Product 4
The data mesh connects groups to collaborative analysis resources to
form a data driven culture
Collaboration within an organization
38
39. Different types of data product act together as a
functional data mesh
Annotation
i.e. Gene, Variant,
Demographic, Identifying
data
Proprietary annotation
Domain Specific
Analysis
(Pan-Cancer TCGA
Patient Healthcare)
Usage
Full history of all
user activity
39
40. Organization 2
Governed access to selected data
products and apps
Clinical Research
COVID
Patient Registries
Oncology
Chronic Inflammation
Autoimmunity
Organization 1
Governed access to selected
data products and apps
Data Mesh
Data Product
1
Data Product
2
Data Product
5
Data Product 3
Data Product 4
How organizations collaborate via data product
Data Products
(in cloud account of organization)
Collaborator
(VPC/Private Link access to data products) 40
41. 41
Tag.bio data exchange: Collaboration with Parkinson’s Foundation to provide data products to researchers
43. 43
A two sided data environment
to enable real time collaboration
Analysis Platform
for Domain Experts:
No-code analysis apps
that speak your
language
Developer Studio
for Data Scientists:
Familiar Jupyter
Notebook-based
Developer Studio
53. How can (NIH funded) researchers access Tag.bio?
53
https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
54. How can NIH ICOs access Tag.bio?
54
https://cloud.nih.gov/enrollment/ ; email: FPT_NIH_STRIDES@4points.com
55. 55
Data Products Data Mesh Self-Service Platform
Real time questions to answers Connect proprietary and public data Fully versioned and reproducible
Cross study comparison Pull in annotation automatically Aut-deployed, tested and scalable
UI’s for coders and
clickers
Bring the analysis to the
data
Collaboration between users, groups, and
organizations
Apps
Tag.bio is a “datamesh in a box”
Thank You! Questions?