SlideShare una empresa de Scribd logo
1 de 16
A Social Content Delivery Network
for Scientific Cooperation:
Vision, Design, and Architecture
Kyle Chard, Simon Caton, Omer Rana, Daniel S. Katz




                                                     www.ci.anl.gov
                                                     www.ci.uchicago.edu
Introduction
• Collaboration is increasingly data intensive
• To avoid research bottlenecks we need data...
    –   At the right place, at the right time, with appropriate
        access permissions
•   Challenges
    –   Distribution, storage, replication, budget, security, perf
        ormance, locality, reliability, availability..
•   Current approaches to data distribution/sharing?



                                                           www.ci.anl.gov
2       Social CDN -- DataCloud 2012
                                                           www.ci.uchicago.edu
Data (Content) Distribution
•   Other domains use CDNs
    –   E.g. web
        objects, downloads, streaming
        media, social networks
•   But, scientific data is often
    –   BigData
    –   Long tail
    –   Private
    –   Geographically distributed
•   Commercial CDNs infeasible
    and unaffordable for scientific
    data.
                                        www.ci.anl.gov
3       Social CDN -- DataCloud 2012
                                        www.ci.uchicago.edu
Social Content Delivery Network (S-CDN)
•   Utilizes the resources of
    community members
    –   Low cost, distributed
        infrastructure
•   Social network                                  Social Layer
    identifies locations to
    distribute and store
    subsets of data
                                              Resource Layer
    •   Algorithms to partition and
        distribute data based
        relationships with others
•   Built upon the concept
                                       Content Delivery Layer
    of a Social (Data) Cloud

                                               www.ci.anl.gov
4       Social CDN -- DataCloud 2012
                                               www.ci.uchicago.edu
Trust
•   Types of trust for a S-CDN
    1. Infrastructure trust via appropriate security and
       authentication mechanisms as well as policies
    2. Inter-personal trust as an enabler of social
       collaboration.
                 – “a positive expectation or assumption on future outcomes that
                   results from proven contextualized personal interaction-
                   histories”
•   In the context of a S-CDN
    – Leverage trust to select interaction partners
    – Develop “trust models” to aid CDN management
      algorithms

                                                                        www.ci.anl.gov
5        Social CDN -- DataCloud 2012
                                                                        www.ci.uchicago.edu
Motivating Use Case – Medical Imaging (1)




                                            www.ci.anl.gov
6   Social CDN -- DataCloud 2012
                                            www.ci.uchicago.edu
Motivating Use Case – Challenges

    Data Privacy                           Data Access                 Big Data?
     • Storage and transfer                 • Many researchers          • Multiple centers
     • Regulations (HIPAA)                  • Geographically            • Multiple subjects
     • Research IP                            distributed               • Mutliple scans
     • Trust                                • Different institutions    • Mutltple analyses/
                                                                           reconstructions




                                                                                   www.ci.anl.gov
7           Social CDN -- DataCloud 2012
                                                                                   www.ci.uchicago.edu
Motivating Use Case – S-CDN
          •    Trustworthiness: Relationships encoded within a
               real world social/collaboration network and
               previous scientific interactions or institutional
               affiliations

          •    Data availability: Access to those who are
               permitted to view (and need) data when required

          •    Reduced barriers: Collaborative infrastructure and
               potential to aggregate other middleware such as
               authentication, job submission, data staging

          •    Access and data placement: Algorithms that
               leverage properties of the social graph

                                                          www.ci.anl.gov
8   Social CDN -- DataCloud 2012
                                                          www.ci.uchicago.edu
Architecture
                  Trust relationship    •   Storage Servers
                                             –   CDN edge nodes on which
                                                 research datasets (or fragments
                                                 thereof) reside
                                             –   Shared folder used for CDN and
                                                 local storage
                        Trusted third        –   Client to manage and transfer
                           party                 datasets

                                        •   Social Middleware
                                             –   Adds a layer of abstraction
                                                 between users and the S-CDN
                                             –   Provides authentication and
                                                 authorization

                                        •   Allocation Servers
                                             –   Centralized catalogs for global
                                                 datasets
                                             –   Maintain a list of current replicas
                                                 and place, move, update, and
                                                 maintain replicas
                                        •   Implementation?

                                                                   www.ci.anl.gov
9   Social CDN -- DataCloud 2012
                                                                   www.ci.uchicago.edu
Preliminary Investigation
•    Explore data availability using a S-CDN
     –   Based on researcher relationships in a collaboration
•    How can we extract a representation of
     scientific (data) collaboration?
     –   Extrapolate collaborative research from the
         publication history of a scientist
•    Analysis
     – Extract communities with different levels of trust
     – Investigate simple CDN placement using social
       algorithms
                                                       www.ci.anl.gov
10       Social CDN -- DataCloud 2012
                                                       www.ci.uchicago.edu
Community Graphs




                             Baseline    Double Coauthorship   Number of Authors
Authors                        2335             811                  604
Publications                   1163             881                  435
Edges                          17973            5123                 1988

• Baseline: DBLP publications, 3 Degrees, 2009-2010
• Double Coauthorship: At least 2 publications
• Number of Authors: < 6 authors per publication

                                                                      www.ci.anl.gov
11        Social CDN -- DataCloud 2012
                                                                      www.ci.uchicago.edu
Replica Selection
•    Random
     –   Avg Hops: 2.23
•    Node Degree
     –   Highest number of edges
     –   Avg Hops: 1.54
•    Community Node Degree
     –   Highest degree within a community
         (i.e. no adjacent placement)
     –   Avg Hops: 1.38
•    Clustering Coefficient
     – Highest likelihood that an author’s
       coauthors are also connected
     – Avg Hops: 2.62


                                             www.ci.anl.gov
12       Social CDN -- DataCloud 2012
                                             www.ci.uchicago.edu
Results
                                                                                30
                                                                                                        Baseline
                                                                                         Random
                                                                                         Node Degree
                                                                                25
                                                                                         Community Node Degree




                                                         Replica Hit Rate (%)
                                                                                         Clustering Coefficient
                                                                                20

                                                                                15

                                                                                10

                                                                                 5

                                                                                 0
                                                                                     1     2     3      4      5       6       7                        8       9       10
                                                                                                            Number of Replicas

                                            Double Coauthorship                                                                                                     Number of Authors
                       40
                                Random                                                                                                   70
                                                                                                                                                  Random
                       35       Node Degree                                                                                                       Node Degree
                                Community Node Degree                                                                                    60
                       30                                                                                                                         Community Node Degree
Replica Hit Rate (%)




                                Clustering Coefficient                                                            Replica Hit Rate (%)   50       Clustering Coefficient
                       25
                                                                                                                                         40
                       20
                       15                                                                                                                30

                       10                                                                                                                20

                        5                                                                                                                10
                        0                                                                                                                0
                            1     2     3      4      5       6       7                     8     9     10                                    1     2       3       4      5       6       7        8     9     10
                                                   Number of Replicas                                                                                                   Number of Replicas

                                                                                                                                                                                               www.ci.anl.gov
                       13          Social CDN -- DataCloud 2012
                                                                                                                                                                                               www.ci.uchicago.edu
Target users of a Social CDN
1.   Large collaborative project with multiple
     distributed participants
2.   Participants are able to provide some resources to
     the project
3.   Good overall connectivity between participants
4.   Different data set requirements for members of
     the collaboration
5.   Availability of data sets that can be co-hosted by
     other participants
6.   Varying sized data sets – not all of which may be
     able to fit in one place.
                                                www.ci.anl.gov
14     Social CDN -- DataCloud 2012
                                                www.ci.uchicago.edu
Summary
•    Data management across collaborations is difficult
     – Right place, right time, accessible to the right people
     – Complicated by size, security, availability, distance …
•    Social CDN
     – Builds upon the proven CDN model from other domains
     – Relies on user contributed edge nodes
     – Social overlay to incorporate trust and social replica selection
•    Future work
     –   Analysis and formalization of trust as an enabler of collaboration
          o   Further investigation into mechanisms to extract trustworthiness from
              scientific networks.
     – Simulation of a wider range of attributes, such as data access
       algorithms, different research networks, and indicators of trust.
     – Proof of concept implementation



                                                                           www.ci.anl.gov
15       Social CDN -- DataCloud 2012
                                                                           www.ci.uchicago.edu
Thanks

•    Questions?
                                      Resources are idle 40-95%
                1,000,000,000 Users




             On average 190 friends
                                       Users contribute to “good” causes




• Kyle Chard: kyle@ci.uchicago.edu
• http://www.facebook.com/SocialCloudComputing
                                                                           www.ci.anl.gov
16     Social CDN -- DataCloud 2012
                                                                           www.ci.uchicago.edu

Más contenido relacionado

Destacado

Content Delivery Networks
Content Delivery NetworksContent Delivery Networks
Content Delivery NetworksKshitij Agarwal
 
What’s the Difference between an Application Delivery Network and a Content D...
What’s the Difference between an Application Delivery Network and a Content D...What’s the Difference between an Application Delivery Network and a Content D...
What’s the Difference between an Application Delivery Network and a Content D...CDNetworks
 
Content Delivery Network
Content Delivery NetworkContent Delivery Network
Content Delivery Networkdbadiani
 
CDN - Content Delivery Network
CDN - Content Delivery NetworkCDN - Content Delivery Network
CDN - Content Delivery NetworkJobin Joseph
 
How a Content Delivery Network Can Help Speed Up Your Website
How a Content Delivery Network Can Help Speed Up Your WebsiteHow a Content Delivery Network Can Help Speed Up Your Website
How a Content Delivery Network Can Help Speed Up Your WebsiteMediacurrent
 
The Evolution of the Content Delivery Network
The Evolution of the Content Delivery NetworkThe Evolution of the Content Delivery Network
The Evolution of the Content Delivery NetworkCisco Service Provider
 
Joomla Content Delivery Networks
Joomla Content Delivery NetworksJoomla Content Delivery Networks
Joomla Content Delivery NetworksMike Carson
 
Using Content Delivery Networks with Drupal
Using Content Delivery Networks with DrupalUsing Content Delivery Networks with Drupal
Using Content Delivery Networks with Drupalcgmonroe
 
Content Delivery Network
Content Delivery NetworkContent Delivery Network
Content Delivery NetworkShiv Pandey
 

Destacado (9)

Content Delivery Networks
Content Delivery NetworksContent Delivery Networks
Content Delivery Networks
 
What’s the Difference between an Application Delivery Network and a Content D...
What’s the Difference between an Application Delivery Network and a Content D...What’s the Difference between an Application Delivery Network and a Content D...
What’s the Difference between an Application Delivery Network and a Content D...
 
Content Delivery Network
Content Delivery NetworkContent Delivery Network
Content Delivery Network
 
CDN - Content Delivery Network
CDN - Content Delivery NetworkCDN - Content Delivery Network
CDN - Content Delivery Network
 
How a Content Delivery Network Can Help Speed Up Your Website
How a Content Delivery Network Can Help Speed Up Your WebsiteHow a Content Delivery Network Can Help Speed Up Your Website
How a Content Delivery Network Can Help Speed Up Your Website
 
The Evolution of the Content Delivery Network
The Evolution of the Content Delivery NetworkThe Evolution of the Content Delivery Network
The Evolution of the Content Delivery Network
 
Joomla Content Delivery Networks
Joomla Content Delivery NetworksJoomla Content Delivery Networks
Joomla Content Delivery Networks
 
Using Content Delivery Networks with Drupal
Using Content Delivery Networks with DrupalUsing Content Delivery Networks with Drupal
Using Content Delivery Networks with Drupal
 
Content Delivery Network
Content Delivery NetworkContent Delivery Network
Content Delivery Network
 

Similar a A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture

Policy Based Data Management iRODS - Reagan Moore - RDAP12
Policy Based Data Management iRODS - Reagan Moore - RDAP12Policy Based Data Management iRODS - Reagan Moore - RDAP12
Policy Based Data Management iRODS - Reagan Moore - RDAP12ASIS&T
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementMarieke Guy
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchYehia El-khatib
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overviewimgcommcall
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsLiming Zhu
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...SEAD
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information3 Round Stones
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...African Open Science Platform
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRobert H. McDonald
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficePhilip Bourne
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science Robert H. McDonald
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallengesjyotikhadake
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcDataTactics
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingLiming Zhu
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Micah Altman
 
Intelligent Cloud Enablement
Intelligent Cloud EnablementIntelligent Cloud Enablement
Intelligent Cloud EnablementDocuLynx
 
Identity Management for Virtual Organizations: A Model
Identity Management for Virtual Organizations: A ModelIdentity Management for Virtual Organizations: A Model
Identity Management for Virtual Organizations: A ModelVon Welch
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)SEAD
 

Similar a A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture (20)

Policy Based Data Management iRODS - Reagan Moore - RDAP12
Policy Based Data Management iRODS - Reagan Moore - RDAP12Policy Based Data Management iRODS - Reagan Moore - RDAP12
Policy Based Data Management iRODS - Reagan Moore - RDAP12
 
Supporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data ManagementSupporting Libraries in Leading the Way in Research Data Management
Supporting Libraries in Leading the Way in Research Data Management
 
Adoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific ResearchAdoption of Cloud Computing in Scientific Research
Adoption of Cloud Computing in Scientific Research
 
NCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - OverviewNCI Cancer Research Data Commons - Overview
NCI Cancer Research Data Commons - Overview
 
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharingNdsa 2013-abrams-integrating-repositories-for-data-sharing
Ndsa 2013-abrams-integrating-repositories-for-data-sharing
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based Systems
 
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
CNI Fall 2011 Meeting Presentation Margaret Hedstrom & Robert McDonald (Dec. ...
 
Semantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for InformationSemantic Search: We're Living in a Golden Age for Information
Semantic Search: We're Living in a Golden Age for Information
 
Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...Competency framework: engineers, statisticians, data scientists, librarians, ...
Competency framework: engineers, statisticians, data scientists, librarians, ...
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Repository Federation: Towards Data Interoperability
Repository Federation: Towards Data InteroperabilityRepository Federation: Towards Data Interoperability
Repository Federation: Towards Data Interoperability
 
NDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) OfficeNDS Relevant Update from the NIH Data Science (ADDS) Office
NDS Relevant Update from the NIH Data Science (ADDS) Office
 
SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science SEAD Datanet and Sustainability Science
SEAD Datanet and Sustainability Science
 
Data accessibilityandchallenges
Data accessibilityandchallengesData accessibilityandchallenges
Data accessibilityandchallenges
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of Everything
 
Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks Auditing Distributed Preservation Networks
Auditing Distributed Preservation Networks
 
Intelligent Cloud Enablement
Intelligent Cloud EnablementIntelligent Cloud Enablement
Intelligent Cloud Enablement
 
Identity Management for Virtual Organizations: A Model
Identity Management for Virtual Organizations: A ModelIdentity Management for Virtual Organizations: A Model
Identity Management for Virtual Organizations: A Model
 
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
Digital Library Federation - DataNets Panel presentation (Nov. 1st, 2011)
 

Más de Simon Caton

Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...Simon Caton
 
Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Simon Caton
 
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...Simon Caton
 
The Gamification of Well-Being Measures
The Gamification of Well-Being MeasuresThe Gamification of Well-Being Measures
The Gamification of Well-Being MeasuresSimon Caton
 
eSoN Overview Slides
eSoN Overview SlideseSoN Overview Slides
eSoN Overview SlidesSimon Caton
 
Social Cloud talk at KSRI Service Summit 2012
Social Cloud talk at KSRI Service Summit 2012Social Cloud talk at KSRI Service Summit 2012
Social Cloud talk at KSRI Service Summit 2012Simon Caton
 
Collaborative eResearch in a Social Cloud
Collaborative eResearch in a Social CloudCollaborative eResearch in a Social Cloud
Collaborative eResearch in a Social CloudSimon Caton
 
Social Cloud Computing
Social Cloud ComputingSocial Cloud Computing
Social Cloud ComputingSimon Caton
 
A Social Cloud for Public eResearch
A Social Cloud for Public eResearchA Social Cloud for Public eResearch
A Social Cloud for Public eResearchSimon Caton
 
Incentivising Resource Sharing in Social Clouds
Incentivising Resource Sharing in Social CloudsIncentivising Resource Sharing in Social Clouds
Incentivising Resource Sharing in Social CloudsSimon Caton
 
Engineering Incentives in Social Clouds
Engineering Incentives in Social Clouds Engineering Incentives in Social Clouds
Engineering Incentives in Social Clouds Simon Caton
 
Social Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSocial Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSimon Caton
 

Más de Simon Caton (12)

Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
Preference-Based Resource Allocation: Using Heuristics to Solve Two-Sided Mat...
 
Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO Research Discovery, Social Networks and VIVO
Research Discovery, Social Networks and VIVO
 
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
A Simulator for Social Exchanges and Collaborations - Architecture and Case S...
 
The Gamification of Well-Being Measures
The Gamification of Well-Being MeasuresThe Gamification of Well-Being Measures
The Gamification of Well-Being Measures
 
eSoN Overview Slides
eSoN Overview SlideseSoN Overview Slides
eSoN Overview Slides
 
Social Cloud talk at KSRI Service Summit 2012
Social Cloud talk at KSRI Service Summit 2012Social Cloud talk at KSRI Service Summit 2012
Social Cloud talk at KSRI Service Summit 2012
 
Collaborative eResearch in a Social Cloud
Collaborative eResearch in a Social CloudCollaborative eResearch in a Social Cloud
Collaborative eResearch in a Social Cloud
 
Social Cloud Computing
Social Cloud ComputingSocial Cloud Computing
Social Cloud Computing
 
A Social Cloud for Public eResearch
A Social Cloud for Public eResearchA Social Cloud for Public eResearch
A Social Cloud for Public eResearch
 
Incentivising Resource Sharing in Social Clouds
Incentivising Resource Sharing in Social CloudsIncentivising Resource Sharing in Social Clouds
Incentivising Resource Sharing in Social Clouds
 
Engineering Incentives in Social Clouds
Engineering Incentives in Social Clouds Engineering Incentives in Social Clouds
Engineering Incentives in Social Clouds
 
Social Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social NetworksSocial Cloud: Cloud Computing in Social Networks
Social Cloud: Cloud Computing in Social Networks
 

Último

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture

  • 1. A Social Content Delivery Network for Scientific Cooperation: Vision, Design, and Architecture Kyle Chard, Simon Caton, Omer Rana, Daniel S. Katz www.ci.anl.gov www.ci.uchicago.edu
  • 2. Introduction • Collaboration is increasingly data intensive • To avoid research bottlenecks we need data... – At the right place, at the right time, with appropriate access permissions • Challenges – Distribution, storage, replication, budget, security, perf ormance, locality, reliability, availability.. • Current approaches to data distribution/sharing? www.ci.anl.gov 2 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 3. Data (Content) Distribution • Other domains use CDNs – E.g. web objects, downloads, streaming media, social networks • But, scientific data is often – BigData – Long tail – Private – Geographically distributed • Commercial CDNs infeasible and unaffordable for scientific data. www.ci.anl.gov 3 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 4. Social Content Delivery Network (S-CDN) • Utilizes the resources of community members – Low cost, distributed infrastructure • Social network Social Layer identifies locations to distribute and store subsets of data Resource Layer • Algorithms to partition and distribute data based relationships with others • Built upon the concept Content Delivery Layer of a Social (Data) Cloud www.ci.anl.gov 4 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 5. Trust • Types of trust for a S-CDN 1. Infrastructure trust via appropriate security and authentication mechanisms as well as policies 2. Inter-personal trust as an enabler of social collaboration. – “a positive expectation or assumption on future outcomes that results from proven contextualized personal interaction- histories” • In the context of a S-CDN – Leverage trust to select interaction partners – Develop “trust models” to aid CDN management algorithms www.ci.anl.gov 5 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 6. Motivating Use Case – Medical Imaging (1) www.ci.anl.gov 6 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 7. Motivating Use Case – Challenges Data Privacy Data Access Big Data? • Storage and transfer • Many researchers • Multiple centers • Regulations (HIPAA) • Geographically • Multiple subjects • Research IP distributed • Mutliple scans • Trust • Different institutions • Mutltple analyses/ reconstructions www.ci.anl.gov 7 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 8. Motivating Use Case – S-CDN • Trustworthiness: Relationships encoded within a real world social/collaboration network and previous scientific interactions or institutional affiliations • Data availability: Access to those who are permitted to view (and need) data when required • Reduced barriers: Collaborative infrastructure and potential to aggregate other middleware such as authentication, job submission, data staging • Access and data placement: Algorithms that leverage properties of the social graph www.ci.anl.gov 8 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 9. Architecture Trust relationship • Storage Servers – CDN edge nodes on which research datasets (or fragments thereof) reside – Shared folder used for CDN and local storage Trusted third – Client to manage and transfer party datasets • Social Middleware – Adds a layer of abstraction between users and the S-CDN – Provides authentication and authorization • Allocation Servers – Centralized catalogs for global datasets – Maintain a list of current replicas and place, move, update, and maintain replicas • Implementation? www.ci.anl.gov 9 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 10. Preliminary Investigation • Explore data availability using a S-CDN – Based on researcher relationships in a collaboration • How can we extract a representation of scientific (data) collaboration? – Extrapolate collaborative research from the publication history of a scientist • Analysis – Extract communities with different levels of trust – Investigate simple CDN placement using social algorithms www.ci.anl.gov 10 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 11. Community Graphs Baseline Double Coauthorship Number of Authors Authors 2335 811 604 Publications 1163 881 435 Edges 17973 5123 1988 • Baseline: DBLP publications, 3 Degrees, 2009-2010 • Double Coauthorship: At least 2 publications • Number of Authors: < 6 authors per publication www.ci.anl.gov 11 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 12. Replica Selection • Random – Avg Hops: 2.23 • Node Degree – Highest number of edges – Avg Hops: 1.54 • Community Node Degree – Highest degree within a community (i.e. no adjacent placement) – Avg Hops: 1.38 • Clustering Coefficient – Highest likelihood that an author’s coauthors are also connected – Avg Hops: 2.62 www.ci.anl.gov 12 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 13. Results 30 Baseline Random Node Degree 25 Community Node Degree Replica Hit Rate (%) Clustering Coefficient 20 15 10 5 0 1 2 3 4 5 6 7 8 9 10 Number of Replicas Double Coauthorship Number of Authors 40 Random 70 Random 35 Node Degree Node Degree Community Node Degree 60 30 Community Node Degree Replica Hit Rate (%) Clustering Coefficient Replica Hit Rate (%) 50 Clustering Coefficient 25 40 20 15 30 10 20 5 10 0 0 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Number of Replicas Number of Replicas www.ci.anl.gov 13 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 14. Target users of a Social CDN 1. Large collaborative project with multiple distributed participants 2. Participants are able to provide some resources to the project 3. Good overall connectivity between participants 4. Different data set requirements for members of the collaboration 5. Availability of data sets that can be co-hosted by other participants 6. Varying sized data sets – not all of which may be able to fit in one place. www.ci.anl.gov 14 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 15. Summary • Data management across collaborations is difficult – Right place, right time, accessible to the right people – Complicated by size, security, availability, distance … • Social CDN – Builds upon the proven CDN model from other domains – Relies on user contributed edge nodes – Social overlay to incorporate trust and social replica selection • Future work – Analysis and formalization of trust as an enabler of collaboration o Further investigation into mechanisms to extract trustworthiness from scientific networks. – Simulation of a wider range of attributes, such as data access algorithms, different research networks, and indicators of trust. – Proof of concept implementation www.ci.anl.gov 15 Social CDN -- DataCloud 2012 www.ci.uchicago.edu
  • 16. Thanks • Questions? Resources are idle 40-95% 1,000,000,000 Users On average 190 friends Users contribute to “good” causes • Kyle Chard: kyle@ci.uchicago.edu • http://www.facebook.com/SocialCloudComputing www.ci.anl.gov 16 Social CDN -- DataCloud 2012 www.ci.uchicago.edu