SlideShare una empresa de Scribd logo
1 de 22
Descargar para leer sin conexión
Fine-Grained, Secure and Efficient Data
Provenance on Blockchain Systems
Pingcheng RUAN, Gang CHEN, Tien Tuan Anh DINH,
Qian LIN, Beng Chin OOI, Meihui ZHANG
Blockchain Is a Class of Database
Blockchain
Distributed
Database
Database
Bitcoin
Distributed
Transactional
Systems
Blockchain Basics
• P2P network
– Asynchronous transaction
• Byzantine environment
– Mutual distrusting setup
• Distributed ledger
– Smart contract
• Inherent provenance-preserving
– ONLY for offline analytical query
Contract Txn 1 Txn 2
Token.Transfer( A, B, 10)
Token.Transfer( C, D, 20)Token.Transfer( A, B, 20)
Motivation
Expose provenance information
to smart contracts both
Efficiently Securely
Enabler for provenance-
dependent smart contracts
Enrich the transaction semantics
Provenance-dependent Contracts
• Previous transfer precondition:
– Enough balance from the sender
– CURRENT STATE ONLY
• New transfer precondition:
– Historical balance > threshold
– Recipient not transacted with certain blacklisted addresses recently
– HISTORICAL STATE & PROVENANCE INFO
Workarounds
Workaround 1:
•Dump every thing into current
state
•Effort-needed, expensive,
error-prune
Workaround 2:
• Offline analytics + Online
transactions
• Break of serializability
• Transaction-ordering attacks
Workaround 3:
• Minimum system
instrumentation
• NOT protocol level (e.g.,
Hyperledger Fabric v1.0+)
• Data tampering
Holistic Approach:
• Protocol-level enhancement
 Secure
• Performance-aware
 Efficient
Account1_v1: 10
Account1_v2: 20
Account1_v3: 15
Account2_v2: 12
Challenges
• With clearly-defined transformation semantics
• E.g
• Map and reduce in Hadoop
• Select, join and aggregation in SQL
NO standardized operations
• Tamper evidence
• Integrity proof
Byzantine environment
• Gas mechanism
• Verifier’s dilemma
Ever-growing ledger
Block Structure
Block Header
Prev Hash hash Txn Digest
State Digest PoW Nonce
Txn List
Enhancement Basis (Merkle Tree Variants)
• Limitation
– Latest State only
• Tamper evidence
– Succinct digest (root hash)
– Integrity proof (access path)
Block Header
Account Address and Assoicated
Balance in Global State:
0xABC: 10 0xABCD: 15
0xABCE: 20 0xBC: 25
Previous Block Hash
Transaction MPT Root Hash
Nonce
Receipt MPT Root Hash
State MPT Root Hash = H(Z)
nilA: H(X) B: H(Y)Z
BC: H(V) C: 25X Y
10D: 15 E: 20V
<Updated Chaincode ID>_<Key>:
ccid1_k1 ccid2_k1
ccid3_k1 ccid3_k2
Block Header
State Root Hash
Previous Block Hash
Transaction Root Hash
G
E F
A B C
ccid1_k1
ccid2_k1
ccid3_k1
ccid3_k2
D
Bucket List
(a) (b)
Merkle Patricia Trie Merkle Bucket Tree
LineageChain Overview
Application Layer
• Provenance specification
– User-defined input-output dependency
• Provenance query handler
– Hist(stateID, [blockNum])
 (val, blkStart, txnID)
– Backward(stateID, blkNum)
 List<(depStateID, depBlkNum)>
– Forward(stateID, blkNum)
 List<(depStateID, depBlkNum)>
InputID1
InputID2
OutputID1
OutputID2
OutputID3
Backward
Dependency
Forward
Dependency
Application Layer
Recipient -> Sender
Execution Layer
• Receive
– Contract invocation context
– Provenance specification
• Compute
– Transaction results
– Concrete dependency
• Prepare Merkle DAG
– Introduce one layer of direction
– Hash reference to encode
provenance backward
dependency
Execution Layer
• Forward tracking
– Problem: Undecided forward dependency during state update
• Solution
– Lazily store forward dependency on the successor state entry
Storage Layer
• Problem
– Efficient version-based (historical) query for a state ID
• Solution:
– Deterministic Append-only Skip List
– Hash-based reference
After appending
versions 12 and 16
Evaluation
• MICRO benchmarking (vs. flat storage)
– Preference to recent version query (with DASL)
– More efficient BFS enabled by backtrack (with ForkBase)
• MACRO benchmarking (applied to Hyperledger Fabric v0.6 and
v1.3)
– Negligible runtime overhead
o Tiny proportion of latency
– Negligible storage overhead
o >70% of space for blocks
o 25% for historical states
o 2~4% for DASL indexes and hash pointers
Performance of Provenance Query
• vs. Workaround 2
– Compute data provenance offline and conditionally trigger online transaction
Micro Performance of Provenance Query
• vs. Workaround 1
– Dump everything into the current state
• vs. Workaround 3
– Use Hyperledger Fabric’s built-in HistoryDB
Runtime Overhead
• Transaction processing
Hyperledger Fabric v0.6 Hyperledger Fabric v1.3
Storage Overhead
Conclusion
• LineageChain
– Enabler for provenance-dependent blockchain applications
– Protocol-level enhancement w.r.t. efficiency and security
– Negligible performance and storage overhead
• Key designs
– User-defined dependency specification
– Merkle DAG with dependency tracking
– DASL index to accelerate data provenance query
– Adoption in Hyperledger Fabric (v0.6 & v1.3)
ThankYou!

Más contenido relacionado

La actualidad más candente

KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
Simplilearn
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
Lukas Masuch
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 

La actualidad más candente (20)

Reinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners TutorialReinforcement Learning : A Beginners Tutorial
Reinforcement Learning : A Beginners Tutorial
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...
 
Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331Recent Progress on Object Detection_20170331
Recent Progress on Object Detection_20170331
 
Machine Learning - Object Detection and Classification
Machine Learning - Object Detection and ClassificationMachine Learning - Object Detection and Classification
Machine Learning - Object Detection and Classification
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Introduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi ChenIntroduction to Boosted Trees by Tianqi Chen
Introduction to Boosted Trees by Tianqi Chen
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
Cours integrale riemann
Cours integrale riemannCours integrale riemann
Cours integrale riemann
 
Collaborative filtering
Collaborative filteringCollaborative filtering
Collaborative filtering
 
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methodsData Mining: Concepts and techniques classification _chapter 9 :advanced methods
Data Mining: Concepts and techniques classification _chapter 9 :advanced methods
 
Content based filtering
Content based filteringContent based filtering
Content based filtering
 
SVM
SVM SVM
SVM
 
Vision and Language: Past, Present and Future
Vision and Language: Past, Present and FutureVision and Language: Past, Present and Future
Vision and Language: Past, Present and Future
 
Intro to Neural Networks
Intro to Neural NetworksIntro to Neural Networks
Intro to Neural Networks
 

Similar a Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems

Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward
 

Similar a Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems (20)

Part 4 : reliable transport and sharing resources
Part 4 : reliable transport and sharing resourcesPart 4 : reliable transport and sharing resources
Part 4 : reliable transport and sharing resources
 
Zeus Locality aware distributed transaction upload.pptx
Zeus Locality aware distributed transaction upload.pptxZeus Locality aware distributed transaction upload.pptx
Zeus Locality aware distributed transaction upload.pptx
 
4 transport-sharing
4 transport-sharing4 transport-sharing
4 transport-sharing
 
Blockchain
BlockchainBlockchain
Blockchain
 
Tribeflow on bitcoin data
Tribeflow on bitcoin dataTribeflow on bitcoin data
Tribeflow on bitcoin data
 
Part3-reliable.pptx
Part3-reliable.pptxPart3-reliable.pptx
Part3-reliable.pptx
 
TXGX 2019_Albert_High Availability Architecture of Klaytn Service Chain
TXGX 2019_Albert_High Availability Architecture of Klaytn Service ChainTXGX 2019_Albert_High Availability Architecture of Klaytn Service Chain
TXGX 2019_Albert_High Availability Architecture of Klaytn Service Chain
 
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
Architecture of the Hyperledger Blockchain Fabric - Christian Cachin - IBM Re...
 
Quality of service
Quality of serviceQuality of service
Quality of service
 
Redesigning MPTCP in Edge clouds
Redesigning MPTCP in Edge cloudsRedesigning MPTCP in Edge clouds
Redesigning MPTCP in Edge clouds
 
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
Flink Forward Berlin 2018: Nico Kruber - "Improving throughput and latency wi...
 
Chap 03
Chap 03Chap 03
Chap 03
 
Chap 03
Chap 03Chap 03
Chap 03
 
Transport Layer Description By Varun Tiwari
Transport Layer Description By Varun TiwariTransport Layer Description By Varun Tiwari
Transport Layer Description By Varun Tiwari
 
qos-f05.pdf
qos-f05.pdfqos-f05.pdf
qos-f05.pdf
 
qos-f05.ppt
qos-f05.pptqos-f05.ppt
qos-f05.ppt
 
qos-f05 (2).ppt
qos-f05 (2).pptqos-f05 (2).ppt
qos-f05 (2).ppt
 
qos-f05 (3).ppt
qos-f05 (3).pptqos-f05 (3).ppt
qos-f05 (3).ppt
 
Thaker q3 2008
Thaker q3 2008Thaker q3 2008
Thaker q3 2008
 
BigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In PythonBigchainDB: A Scalable Blockchain Database, In Python
BigchainDB: A Scalable Blockchain Database, In Python
 

Más de Qian Lin

Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
Qian Lin
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Qian Lin
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
Qian Lin
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
Qian Lin
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Qian Lin
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
Qian Lin
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine Performance
Qian Lin
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a Writer
Qian Lin
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
Qian Lin
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
Qian Lin
 

Más de Qian Lin (13)

PaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChatPaxosStore: High-availability Storage Made Practical in WeChat
PaxosStore: High-availability Storage Made Practical in WeChat
 
Trinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory CloudTrinity: A Distributed Graph Engine on a Memory Cloud
Trinity: A Distributed Graph Engine on a Memory Cloud
 
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse MatricesPresto: Distributed Machine Learning and Graph Processing with Sparse Matrices
Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices
 
Adaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable ComputationAdaptive Execution Support for Malleable Computation
Adaptive Execution Support for Malleable Computation
 
C-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the CloudC-Cube: Elastic Continuous Clustering in the Cloud
C-Cube: Elastic Continuous Clustering in the Cloud
 
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected WorldKineograph: Taking the Pulse of a Fast-Changing and Connected World
Kineograph: Taking the Pulse of a Fast-Changing and Connected World
 
Optimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid VirtualizationOptimizing Virtual Machines Using Hybrid Virtualization
Optimizing Virtual Machines Using Hybrid Virtualization
 
Virtual Machine Performance
Virtual Machine PerformanceVirtual Machine Performance
Virtual Machine Performance
 
Be an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a WriterBe an Explorer, Be a Coder, Be a Writer
Be an Explorer, Be a Coder, Be a Writer
 
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data FormatsSciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
SciMATE: A Novel MapReduce-Like Framework for Multiple Scientific Data Formats
 
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
A Survey of Advanced Non-relational Database Systems: Approaches and Applicat...
 
In-situ MapReduce for Log Processing
In-situ MapReduce for Log ProcessingIn-situ MapReduce for Log Processing
In-situ MapReduce for Log Processing
 
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core ProcessorsC-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
C-MR: Continuously Executing MapReduce Workflows on Multi-Core Processors
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Introduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDMIntroduction to use of FHIR Documents in ABDM
Introduction to use of FHIR Documents in ABDM
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 

Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems

  • 1. Fine-Grained, Secure and Efficient Data Provenance on Blockchain Systems Pingcheng RUAN, Gang CHEN, Tien Tuan Anh DINH, Qian LIN, Beng Chin OOI, Meihui ZHANG
  • 2. Blockchain Is a Class of Database Blockchain Distributed Database Database Bitcoin Distributed Transactional Systems
  • 3. Blockchain Basics • P2P network – Asynchronous transaction • Byzantine environment – Mutual distrusting setup • Distributed ledger – Smart contract • Inherent provenance-preserving – ONLY for offline analytical query Contract Txn 1 Txn 2 Token.Transfer( A, B, 10) Token.Transfer( C, D, 20)Token.Transfer( A, B, 20)
  • 4. Motivation Expose provenance information to smart contracts both Efficiently Securely Enabler for provenance- dependent smart contracts Enrich the transaction semantics
  • 5. Provenance-dependent Contracts • Previous transfer precondition: – Enough balance from the sender – CURRENT STATE ONLY • New transfer precondition: – Historical balance > threshold – Recipient not transacted with certain blacklisted addresses recently – HISTORICAL STATE & PROVENANCE INFO
  • 6. Workarounds Workaround 1: •Dump every thing into current state •Effort-needed, expensive, error-prune Workaround 2: • Offline analytics + Online transactions • Break of serializability • Transaction-ordering attacks Workaround 3: • Minimum system instrumentation • NOT protocol level (e.g., Hyperledger Fabric v1.0+) • Data tampering Holistic Approach: • Protocol-level enhancement  Secure • Performance-aware  Efficient Account1_v1: 10 Account1_v2: 20 Account1_v3: 15 Account2_v2: 12
  • 7. Challenges • With clearly-defined transformation semantics • E.g • Map and reduce in Hadoop • Select, join and aggregation in SQL NO standardized operations • Tamper evidence • Integrity proof Byzantine environment • Gas mechanism • Verifier’s dilemma Ever-growing ledger
  • 8. Block Structure Block Header Prev Hash hash Txn Digest State Digest PoW Nonce Txn List
  • 9. Enhancement Basis (Merkle Tree Variants) • Limitation – Latest State only • Tamper evidence – Succinct digest (root hash) – Integrity proof (access path) Block Header Account Address and Assoicated Balance in Global State: 0xABC: 10 0xABCD: 15 0xABCE: 20 0xBC: 25 Previous Block Hash Transaction MPT Root Hash Nonce Receipt MPT Root Hash State MPT Root Hash = H(Z) nilA: H(X) B: H(Y)Z BC: H(V) C: 25X Y 10D: 15 E: 20V <Updated Chaincode ID>_<Key>: ccid1_k1 ccid2_k1 ccid3_k1 ccid3_k2 Block Header State Root Hash Previous Block Hash Transaction Root Hash G E F A B C ccid1_k1 ccid2_k1 ccid3_k1 ccid3_k2 D Bucket List (a) (b) Merkle Patricia Trie Merkle Bucket Tree
  • 11. Application Layer • Provenance specification – User-defined input-output dependency • Provenance query handler – Hist(stateID, [blockNum])  (val, blkStart, txnID) – Backward(stateID, blkNum)  List<(depStateID, depBlkNum)> – Forward(stateID, blkNum)  List<(depStateID, depBlkNum)> InputID1 InputID2 OutputID1 OutputID2 OutputID3 Backward Dependency Forward Dependency
  • 13. Execution Layer • Receive – Contract invocation context – Provenance specification • Compute – Transaction results – Concrete dependency • Prepare Merkle DAG – Introduce one layer of direction – Hash reference to encode provenance backward dependency
  • 14. Execution Layer • Forward tracking – Problem: Undecided forward dependency during state update • Solution – Lazily store forward dependency on the successor state entry
  • 15. Storage Layer • Problem – Efficient version-based (historical) query for a state ID • Solution: – Deterministic Append-only Skip List – Hash-based reference After appending versions 12 and 16
  • 16. Evaluation • MICRO benchmarking (vs. flat storage) – Preference to recent version query (with DASL) – More efficient BFS enabled by backtrack (with ForkBase) • MACRO benchmarking (applied to Hyperledger Fabric v0.6 and v1.3) – Negligible runtime overhead o Tiny proportion of latency – Negligible storage overhead o >70% of space for blocks o 25% for historical states o 2~4% for DASL indexes and hash pointers
  • 17. Performance of Provenance Query • vs. Workaround 2 – Compute data provenance offline and conditionally trigger online transaction
  • 18. Micro Performance of Provenance Query • vs. Workaround 1 – Dump everything into the current state • vs. Workaround 3 – Use Hyperledger Fabric’s built-in HistoryDB
  • 19. Runtime Overhead • Transaction processing Hyperledger Fabric v0.6 Hyperledger Fabric v1.3
  • 21. Conclusion • LineageChain – Enabler for provenance-dependent blockchain applications – Protocol-level enhancement w.r.t. efficiency and security – Negligible performance and storage overhead • Key designs – User-defined dependency specification – Merkle DAG with dependency tracking – DASL index to accelerate data provenance query – Adoption in Hyperledger Fabric (v0.6 & v1.3)