SlideShare una empresa de Scribd logo
1 de 36
Data source registrationin the VirtualLaboratory Marek Pomockamajor: applied computer science specialisation: computer techniques in science and technology Faculty of Physics and Applied Computer Science, AGH University of Science and Technology Supervisor: Marian Bubak, Ph.D. Consultants:PiotrNowakowski, M.Sc.                  Daniel Harężlak, M.Sc. Master’s thesis defenseNovember 13, 2009
Outline Introduction to Grid technologies and Virtual Laboratories Motivation and Objectives Conceptual view onto the solution Challenges and solutions Applications Future work Summary References
Grid technologies and Virtual Laboratories 3
Gridis a distributedcomputingarchitecturewithcross-organizationalaccess, providingnontrivialquality of service for participatingactors.
Notable applications include high-energy physics (LHC) Weather forecasting Natural disaster modelling Complex parameter studies in biomedicine and biochemistry Digital image archives
Gridis a computer infrastructure .. dedicated to conducting in-silico research created by many partners who share supercomputers, computer clusters, storage andresearch instruments TASK PCSS ICM WCSS CYFRONET
to create common space for e-Science
which are dynamic by their nature Grid users are Virtual Organizations(VOs) VO approach simplifies access management CYFRONET PSNC CYFRONET PSNC
Examples of Grids EGEE, DEISA TeraGrid Open Science Grid
Virtual Laboratories (VLs)supplyhigher-level services and abstractlow-leveldetailsrelated to Grid services invocations, security etc. awayfromend-users. VirtualLaboratory Gridmiddleware Many VLsendeavor to be general purposein-silico(orvirtual)experiment design and execution environment,  Gridinfrastructure e.g. GridSpace VirtualLaboratory.
Others are often designed for specific purpose such as remote access to scientific instruments (e.g. VLAB) supporting research in meteorology (LEAD) research and decision support in virology (ViroLab)
Virtual experiments in VLs are expressed using script-based languages (e.g. in GridSpace, Athena, Geodise) if (condition) then    … else    … end … or using workflow languages (e.g. in VL-e, VLAB, myExperiment, myGridTaverna, Kepler, Triana, Pegasus) Virtual Laboratory VLs made Grids available to non-computer scientists.  Grid Users
Motivation and Objectives 13
Hello, I’m a chemist. I use Gaussian program and work mostly with files. I’d like to use Grids, but filesystem is far too complex for me. ... the security system is complicated too. Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.
Objectives The objective of the dissertation is to meet these needs by enabling access to LFC data sources from GridSpace scripts concealing most of interactions with Grid Security Infrastructure (GSI). This goal entails several other objectives: Data Source Registry reorganization Integration with GridSpace Engine extending DSR EPE plug-in DAC2 GSEngine LFC DS
Conceptual view onto the solution
Challenges and solutions 17
Not to comprise GSEngine portability Windows Linux Scientific Linux 4 (SL4) UNIX Mac OS X Isolation of platform dependent code into a remote service Solution: GScript LFC integration GSEngine LFC connector LFC client library LFC DS Server Platform independent Platform dependent
Serve multiple users utilizing inherently single user gLite libraries. Solution: ChemPo command wrappers – each command is run in new JVM with prepared UNIX environment. Worker 1 JVM LFC DS Server Cert1 Key1 (Server JVM) Worker 2 JVM Cert2 Key2 Instead of permanent place for a credentials (e.g. ~/.globus/), use temporary files and specify paths dynamically in UNIX environment of created JVM processes.
Enabling access to Grid files without downloading them to GSEngine machine First, download file to LFC DS Server. Then, stream it to client.  Grid File Access Library (GFAL) ChemPo command wrappers do not support such a mode of operation (streaming to client) Vice—versa for sending file to Grid, i.e. stream file to LFC DS Server, then send it to Grid.
Streaming representation in GridSpace scripts Solution: User receives modified version of Ruby IO object (sending file to Grid happens on file close operation while retrieving a file from Grid during object initialization) 		Reading a Grid file ds.open("mpomocka/test_file", "r") do |file| file.each {|line| puts line} end f = ds.open("mpomocka/test_file", :r) f.each {|line| puts line} f.close 		Writing to a Grid file f = ds.open("mpomocka/test_file",:write) f.puts "First line of the file test_file" f.puts "Second line of the file test_file" f.close 		Alternatively ds.open("mpomocka/test_file",:w) do |f| f.puts "Another way to write to a file" f.puts "Note that close is not necessary“ end
Need for a descriptive and intuitive API mimicking Ruby file operations, e.g. exist?, file? e.g. create_directory instead of mkdir DAC2 LFC DS methods Method name, Aliases createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path),isFile, is_filegetFile(filename),get_filegetSize(path),size, size?, get_sizelistFiles(path),list_filesopenFile(path, mode, &b),open, open_filestoreFile(payload, filename),store_filezero?(path)
Securecommunication Security Tunnelling is simpler Transport Layer Security Need to manage keystores Credentials management Proxy certificate generation  Java CoG Kit Data Source Registry Credentials are stored in DSR Credentials can be set static, i.e. shared with other authenticated users
Proxy generated automatically during initialization
Information needs – previous DSR structure did not enable storage of LFC data sources information nor gLite credentials. Solution: RelationalDataSources DataSources DataSources + + LFCDataSources LFCCertData LFCDSConnections Also changes to DAC2 and DSR EPE Plug-in DSR access modules.
GUI for registering data source of new type Created as a new form in EPE DSR Plug-in In addition, some new DSR access methods were created in DSR EPE Plug-in.
Selection of distributed computing approach
Exchanging large files – how to avoid OutOfMemoryerrors? Solution: employ RMIIO library (RemoteInputStream[Server] and RemoteOutputStream[Server] classes) Figure illustrates downloading a file to client
Figure – sending a file from client to server Additional benefits of using RMIIO:  Compressed socket-based communication Automatic retry
Solution scales linearly Figure – download and upload times up to 2Gb when tested locally on ChemPo server
PL-Grid: Polish Infrastructure for Information Science Support in the European Research Space. Chemistry Portal – ChemPo Applications
Finer-grained security Pseudo memory mapped-file API (Pseudo MMAP) Future work
Summary 33
LFC DS Server LFC DS client Java library New DAC2 API DAC2 LFC connector DAC2 LFC DS methods Method name, Aliases createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directory….
Automated and transparent handling of Grid credentials Extended EPE DSR Plug-in Reorganized DSR Schema
References [1]  M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as partof theCracowGridWorkshop ’09, Krakow, Poland, 12-14 October 2009. [2]  M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH.to appear [3]  Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007. [4]  Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global [5]  Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61 [6]  M. Malawski, T. Bartynski, and M. Bubak, "Invocation of operations from script-based grid applications," Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009. 36

Más contenido relacionado

La actualidad más candente

Captura de pacotes no KernelSpace
Captura de pacotes no KernelSpaceCaptura de pacotes no KernelSpace
Captura de pacotes no KernelSpace
PeslPinguim
 
RL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content DeliveryRL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content Delivery
Förderverein Technische Fakultät
 
Toffee – A highly efficient, lossless file format for DIA-MS
Toffee – A highly efficient, lossless file format for DIA-MSToffee – A highly efficient, lossless file format for DIA-MS
Toffee – A highly efficient, lossless file format for DIA-MS
Brett Tully
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCube
FAO
 
OpenPackProcessingAccelearation
OpenPackProcessingAccelearationOpenPackProcessingAccelearation
OpenPackProcessingAccelearation
Craig Nuzzo
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
balmanme
 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORT
Dhrumil Shah
 
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationTurn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Richard Jung
 

La actualidad más candente (20)

Captura de pacotes no KernelSpace
Captura de pacotes no KernelSpaceCaptura de pacotes no KernelSpace
Captura de pacotes no KernelSpace
 
RL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content DeliveryRL-Cache: Learning-Based Cache Admission for Content Delivery
RL-Cache: Learning-Based Cache Admission for Content Delivery
 
Toffee – A highly efficient, lossless file format for DIA-MS
Toffee – A highly efficient, lossless file format for DIA-MSToffee – A highly efficient, lossless file format for DIA-MS
Toffee – A highly efficient, lossless file format for DIA-MS
 
Parallella: Embedded HPC For Everybody
Parallella: Embedded HPC For EverybodyParallella: Embedded HPC For Everybody
Parallella: Embedded HPC For Everybody
 
Triton: A peer-assisted cloud storage systems
Triton: A peer-assisted cloud storage systems Triton: A peer-assisted cloud storage systems
Triton: A peer-assisted cloud storage systems
 
e-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCubee-Infrastructure Integration-with gCube
e-Infrastructure Integration-with gCube
 
Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?Cloud, Fog, or Edge: Where and When to Compute?
Cloud, Fog, or Edge: Where and When to Compute?
 
gLite Data Management System
gLite Data Management SystemgLite Data Management System
gLite Data Management System
 
Ipc
IpcIpc
Ipc
 
TEACHING TCP/IP NETWORKING USING HANDS-ON LABORATORY EXPERIENCE
TEACHING TCP/IP NETWORKING USING HANDS-ON  LABORATORY EXPERIENCETEACHING TCP/IP NETWORKING USING HANDS-ON  LABORATORY EXPERIENCE
TEACHING TCP/IP NETWORKING USING HANDS-ON LABORATORY EXPERIENCE
 
OpenPackProcessingAccelearation
OpenPackProcessingAccelearationOpenPackProcessingAccelearation
OpenPackProcessingAccelearation
 
Inter-Process Communication (IPC) techniques on Mac OS X
Inter-Process Communication (IPC) techniques on Mac OS XInter-Process Communication (IPC) techniques on Mac OS X
Inter-Process Communication (IPC) techniques on Mac OS X
 
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation NetworksAnalyzing Data Movements and Identifying Techniques for Next-generation Networks
Analyzing Data Movements and Identifying Techniques for Next-generation Networks
 
Ipc
IpcIpc
Ipc
 
Chapter 9 security
Chapter 9 securityChapter 9 security
Chapter 9 security
 
Srinivasan2-10-12
Srinivasan2-10-12Srinivasan2-10-12
Srinivasan2-10-12
 
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval TaskDCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
DCU at the NTCIR-9 SpokenDoc Passage Retrieval Task
 
Named Data Networking
Named Data NetworkingNamed Data Networking
Named Data Networking
 
FINAL PROJECT REPORT
FINAL PROJECT REPORTFINAL PROJECT REPORT
FINAL PROJECT REPORT
 
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationTurn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
 

Destacado

баскеее переделать
баскеее переделатьбаскеее переделать
баскеее переделать
guest9e8b92
 

Destacado (8)

Frameworks in java
Frameworks in javaFrameworks in java
Frameworks in java
 
баскеее переделать
баскеее переделатьбаскеее переделать
баскеее переделать
 
Cloud Computing
Cloud ComputingCloud Computing
Cloud Computing
 
Commuting Connections: Carpooling and Cyberspace
Commuting Connections: Carpooling and CyberspaceCommuting Connections: Carpooling and Cyberspace
Commuting Connections: Carpooling and Cyberspace
 
Greek management
Greek managementGreek management
Greek management
 
Computing Science Dissertation
Computing Science DissertationComputing Science Dissertation
Computing Science Dissertation
 
Carsharing, Ridesharing, Carpooling and all...
Carsharing, Ridesharing, Carpooling and all...Carsharing, Ridesharing, Carpooling and all...
Carsharing, Ridesharing, Carpooling and all...
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 

Similar a Dissertation defense

Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Spark Summit
 
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
Darren Carlson
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brussels
Daniel Nüst
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Rafael Ferreira da Silva
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
Dipak Badhe
 

Similar a Dissertation defense (20)

Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
Bringing HPC Algorithms to Big Data Platforms: Spark Summit East talk by Niko...
 
Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...Architecture and Performance of Runtime Environments for Data Intensive Scala...
Architecture and Performance of Runtime Environments for Data Intensive Scala...
 
Computing Outside The Box June 2009
Computing Outside The Box June 2009Computing Outside The Box June 2009
Computing Outside The Box June 2009
 
Computing Outside The Box September 2009
Computing Outside The Box September 2009Computing Outside The Box September 2009
Computing Outside The Box September 2009
 
An Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud StorageAn Efficient PDP Scheme for Distributed Cloud Storage
An Efficient PDP Scheme for Distributed Cloud Storage
 
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
An Ad-hoc Smart Gateway Platform for the Web of Things (IEEE iThings 2013 Bes...
 
Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)Overview of the Data Processing Error Analysis System (DPEAS)
Overview of the Data Processing Error Analysis System (DPEAS)
 
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
Modeling and Simulation of Parallel and Distributed Computing Systems with Si...
 
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
IGUANA: A Generic Framework for Benchmarking the Read-Write Performance of Tr...
 
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
OGCE TeraGrid 2010 Science Gateway Tutorial IntroOGCE TeraGrid 2010 Science Gateway Tutorial Intro
OGCE TeraGrid 2010 Science Gateway Tutorial Intro
 
containerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brusselscontainerit at useR!2017 conference, Brussels
containerit at useR!2017 conference, Brussels
 
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
Running Accurate, Scalable, and Reproducible Simulations of Distributed Syste...
 
Metacomputer Architecture of the Global LambdaGrid
Metacomputer Architecture of the Global LambdaGridMetacomputer Architecture of the Global LambdaGrid
Metacomputer Architecture of the Global LambdaGrid
 
seed block algorithm
seed block algorithmseed block algorithm
seed block algorithm
 
Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)Archiving data from Durham to RAL using the File Transfer Service (FTS)
Archiving data from Durham to RAL using the File Transfer Service (FTS)
 
3rd presentation
3rd presentation3rd presentation
3rd presentation
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Scientific
Scientific Scientific
Scientific
 
grid mining
grid mininggrid mining
grid mining
 
Unit No. 1 Introduction to Java.pptx
Unit No. 1 Introduction to Java.pptxUnit No. 1 Introduction to Java.pptx
Unit No. 1 Introduction to Java.pptx
 

Último

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Dissertation defense

  • 1. Data source registrationin the VirtualLaboratory Marek Pomockamajor: applied computer science specialisation: computer techniques in science and technology Faculty of Physics and Applied Computer Science, AGH University of Science and Technology Supervisor: Marian Bubak, Ph.D. Consultants:PiotrNowakowski, M.Sc. Daniel Harężlak, M.Sc. Master’s thesis defenseNovember 13, 2009
  • 2. Outline Introduction to Grid technologies and Virtual Laboratories Motivation and Objectives Conceptual view onto the solution Challenges and solutions Applications Future work Summary References
  • 3. Grid technologies and Virtual Laboratories 3
  • 4. Gridis a distributedcomputingarchitecturewithcross-organizationalaccess, providingnontrivialquality of service for participatingactors.
  • 5. Notable applications include high-energy physics (LHC) Weather forecasting Natural disaster modelling Complex parameter studies in biomedicine and biochemistry Digital image archives
  • 6. Gridis a computer infrastructure .. dedicated to conducting in-silico research created by many partners who share supercomputers, computer clusters, storage andresearch instruments TASK PCSS ICM WCSS CYFRONET
  • 7. to create common space for e-Science
  • 8. which are dynamic by their nature Grid users are Virtual Organizations(VOs) VO approach simplifies access management CYFRONET PSNC CYFRONET PSNC
  • 9. Examples of Grids EGEE, DEISA TeraGrid Open Science Grid
  • 10. Virtual Laboratories (VLs)supplyhigher-level services and abstractlow-leveldetailsrelated to Grid services invocations, security etc. awayfromend-users. VirtualLaboratory Gridmiddleware Many VLsendeavor to be general purposein-silico(orvirtual)experiment design and execution environment, Gridinfrastructure e.g. GridSpace VirtualLaboratory.
  • 11. Others are often designed for specific purpose such as remote access to scientific instruments (e.g. VLAB) supporting research in meteorology (LEAD) research and decision support in virology (ViroLab)
  • 12. Virtual experiments in VLs are expressed using script-based languages (e.g. in GridSpace, Athena, Geodise) if (condition) then … else … end … or using workflow languages (e.g. in VL-e, VLAB, myExperiment, myGridTaverna, Kepler, Triana, Pegasus) Virtual Laboratory VLs made Grids available to non-computer scientists. Grid Users
  • 14. Hello, I’m a chemist. I use Gaussian program and work mostly with files. I’d like to use Grids, but filesystem is far too complex for me. ... the security system is complicated too. Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.
  • 15. Objectives The objective of the dissertation is to meet these needs by enabling access to LFC data sources from GridSpace scripts concealing most of interactions with Grid Security Infrastructure (GSI). This goal entails several other objectives: Data Source Registry reorganization Integration with GridSpace Engine extending DSR EPE plug-in DAC2 GSEngine LFC DS
  • 16. Conceptual view onto the solution
  • 18. Not to comprise GSEngine portability Windows Linux Scientific Linux 4 (SL4) UNIX Mac OS X Isolation of platform dependent code into a remote service Solution: GScript LFC integration GSEngine LFC connector LFC client library LFC DS Server Platform independent Platform dependent
  • 19. Serve multiple users utilizing inherently single user gLite libraries. Solution: ChemPo command wrappers – each command is run in new JVM with prepared UNIX environment. Worker 1 JVM LFC DS Server Cert1 Key1 (Server JVM) Worker 2 JVM Cert2 Key2 Instead of permanent place for a credentials (e.g. ~/.globus/), use temporary files and specify paths dynamically in UNIX environment of created JVM processes.
  • 20. Enabling access to Grid files without downloading them to GSEngine machine First, download file to LFC DS Server. Then, stream it to client. Grid File Access Library (GFAL) ChemPo command wrappers do not support such a mode of operation (streaming to client) Vice—versa for sending file to Grid, i.e. stream file to LFC DS Server, then send it to Grid.
  • 21. Streaming representation in GridSpace scripts Solution: User receives modified version of Ruby IO object (sending file to Grid happens on file close operation while retrieving a file from Grid during object initialization) Reading a Grid file ds.open("mpomocka/test_file", "r") do |file| file.each {|line| puts line} end f = ds.open("mpomocka/test_file", :r) f.each {|line| puts line} f.close Writing to a Grid file f = ds.open("mpomocka/test_file",:write) f.puts "First line of the file test_file" f.puts "Second line of the file test_file" f.close Alternatively ds.open("mpomocka/test_file",:w) do |f| f.puts "Another way to write to a file" f.puts "Note that close is not necessary“ end
  • 22. Need for a descriptive and intuitive API mimicking Ruby file operations, e.g. exist?, file? e.g. create_directory instead of mkdir DAC2 LFC DS methods Method name, Aliases createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path),isFile, is_filegetFile(filename),get_filegetSize(path),size, size?, get_sizelistFiles(path),list_filesopenFile(path, mode, &b),open, open_filestoreFile(payload, filename),store_filezero?(path)
  • 23. Securecommunication Security Tunnelling is simpler Transport Layer Security Need to manage keystores Credentials management Proxy certificate generation  Java CoG Kit Data Source Registry Credentials are stored in DSR Credentials can be set static, i.e. shared with other authenticated users
  • 24. Proxy generated automatically during initialization
  • 25. Information needs – previous DSR structure did not enable storage of LFC data sources information nor gLite credentials. Solution: RelationalDataSources DataSources DataSources + + LFCDataSources LFCCertData LFCDSConnections Also changes to DAC2 and DSR EPE Plug-in DSR access modules.
  • 26. GUI for registering data source of new type Created as a new form in EPE DSR Plug-in In addition, some new DSR access methods were created in DSR EPE Plug-in.
  • 27. Selection of distributed computing approach
  • 28. Exchanging large files – how to avoid OutOfMemoryerrors? Solution: employ RMIIO library (RemoteInputStream[Server] and RemoteOutputStream[Server] classes) Figure illustrates downloading a file to client
  • 29. Figure – sending a file from client to server Additional benefits of using RMIIO: Compressed socket-based communication Automatic retry
  • 30. Solution scales linearly Figure – download and upload times up to 2Gb when tested locally on ChemPo server
  • 31. PL-Grid: Polish Infrastructure for Information Science Support in the European Research Space. Chemistry Portal – ChemPo Applications
  • 32. Finer-grained security Pseudo memory mapped-file API (Pseudo MMAP) Future work
  • 34. LFC DS Server LFC DS client Java library New DAC2 API DAC2 LFC connector DAC2 LFC DS methods Method name, Aliases createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directory….
  • 35. Automated and transparent handling of Grid credentials Extended EPE DSR Plug-in Reorganized DSR Schema
  • 36. References [1] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as partof theCracowGridWorkshop ’09, Krakow, Poland, 12-14 October 2009. [2] M. Pomocka,  P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH.to appear [3] Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007. [4] Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global [5] Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61 [6] M. Malawski, T. Bartynski, and M. Bubak, "Invocation of operations from script-based grid applications," Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009. 36