Dissertation title and final project: Data source registration in the Virtual Laboratory. The subject of the thesis and related project was to integrate EGEE/WLCG data sources into GridSpace Virtual Laboratory (http://gs.cyfronet.pl/).
Poster presentation entitled Integrating EGEE Storage Services with the Virtual Laboratory:
http://www.plgrid.pl/en/pr_materials/posters
Dissertation available at http://virolab.cyfronet.pl/trac/vlvl#MasterofScienceThesesrelatedtoViroLab
How to Troubleshoot Apps for the Modern Connected Worker
Dissertation defense
1. Data source registrationin the VirtualLaboratory Marek Pomockamajor: applied computer science specialisation: computer techniques in science and technology Faculty of Physics and Applied Computer Science, AGH University of Science and Technology Supervisor: Marian Bubak, Ph.D. Consultants:PiotrNowakowski, M.Sc. Daniel Harężlak, M.Sc. Master’s thesis defenseNovember 13, 2009
2. Outline Introduction to Grid technologies and Virtual Laboratories Motivation and Objectives Conceptual view onto the solution Challenges and solutions Applications Future work Summary References
5. Notable applications include high-energy physics (LHC) Weather forecasting Natural disaster modelling Complex parameter studies in biomedicine and biochemistry Digital image archives
6. Gridis a computer infrastructure .. dedicated to conducting in-silico research created by many partners who share supercomputers, computer clusters, storage andresearch instruments TASK PCSS ICM WCSS CYFRONET
10. Virtual Laboratories (VLs)supplyhigher-level services and abstractlow-leveldetailsrelated to Grid services invocations, security etc. awayfromend-users. VirtualLaboratory Gridmiddleware Many VLsendeavor to be general purposein-silico(orvirtual)experiment design and execution environment, Gridinfrastructure e.g. GridSpace VirtualLaboratory.
11. Others are often designed for specific purpose such as remote access to scientific instruments (e.g. VLAB) supporting research in meteorology (LEAD) research and decision support in virology (ViroLab)
12. Virtual experiments in VLs are expressed using script-based languages (e.g. in GridSpace, Athena, Geodise) if (condition) then … else … end … or using workflow languages (e.g. in VL-e, VLAB, myExperiment, myGridTaverna, Kepler, Triana, Pegasus) Virtual Laboratory VLs made Grids available to non-computer scientists. Grid Users
14. Hello, I’m a chemist. I use Gaussian program and work mostly with files. I’d like to use Grids, but filesystem is far too complex for me. ... the security system is complicated too. Yes, I do agree. We won’t use Grids until there is an easy way of using Grid file catalogues from virtual experiments.
15. Objectives The objective of the dissertation is to meet these needs by enabling access to LFC data sources from GridSpace scripts concealing most of interactions with Grid Security Infrastructure (GSI). This goal entails several other objectives: Data Source Registry reorganization Integration with GridSpace Engine extending DSR EPE plug-in DAC2 GSEngine LFC DS
18. Not to comprise GSEngine portability Windows Linux Scientific Linux 4 (SL4) UNIX Mac OS X Isolation of platform dependent code into a remote service Solution: GScript LFC integration GSEngine LFC connector LFC client library LFC DS Server Platform independent Platform dependent
19. Serve multiple users utilizing inherently single user gLite libraries. Solution: ChemPo command wrappers – each command is run in new JVM with prepared UNIX environment. Worker 1 JVM LFC DS Server Cert1 Key1 (Server JVM) Worker 2 JVM Cert2 Key2 Instead of permanent place for a credentials (e.g. ~/.globus/), use temporary files and specify paths dynamically in UNIX environment of created JVM processes.
20. Enabling access to Grid files without downloading them to GSEngine machine First, download file to LFC DS Server. Then, stream it to client. Grid File Access Library (GFAL) ChemPo command wrappers do not support such a mode of operation (streaming to client) Vice—versa for sending file to Grid, i.e. stream file to LFC DS Server, then send it to Grid.
21. Streaming representation in GridSpace scripts Solution: User receives modified version of Ruby IO object (sending file to Grid happens on file close operation while retrieving a file from Grid during object initialization) Reading a Grid file ds.open("mpomocka/test_file", "r") do |file| file.each {|line| puts line} end f = ds.open("mpomocka/test_file", :r) f.each {|line| puts line} f.close Writing to a Grid file f = ds.open("mpomocka/test_file",:write) f.puts "First line of the file test_file" f.puts "Second line of the file test_file" f.close Alternatively ds.open("mpomocka/test_file",:w) do |f| f.puts "Another way to write to a file" f.puts "Note that close is not necessary“ end
22. Need for a descriptive and intuitive API mimicking Ruby file operations, e.g. exist?, file? e.g. create_directory instead of mkdir DAC2 LFC DS methods Method name, Aliases createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directoryexist?(path), exist, exists, exist?file?(path),isFile, is_filegetFile(filename),get_filegetSize(path),size, size?, get_sizelistFiles(path),list_filesopenFile(path, mode, &b),open, open_filestoreFile(payload, filename),store_filezero?(path)
23. Securecommunication Security Tunnelling is simpler Transport Layer Security Need to manage keystores Credentials management Proxy certificate generation Java CoG Kit Data Source Registry Credentials are stored in DSR Credentials can be set static, i.e. shared with other authenticated users
25. Information needs – previous DSR structure did not enable storage of LFC data sources information nor gLite credentials. Solution: RelationalDataSources DataSources DataSources + + LFCDataSources LFCCertData LFCDSConnections Also changes to DAC2 and DSR EPE Plug-in DSR access modules.
26. GUI for registering data source of new type Created as a new form in EPE DSR Plug-in In addition, some new DSR access methods were created in DSR EPE Plug-in.
28. Exchanging large files – how to avoid OutOfMemoryerrors? Solution: employ RMIIO library (RemoteInputStream[Server] and RemoteOutputStream[Server] classes) Figure illustrates downloading a file to client
29. Figure – sending a file from client to server Additional benefits of using RMIIO: Compressed socket-based communication Automatic retry
30. Solution scales linearly Figure – download and upload times up to 2Gb when tested locally on ChemPo server
31. PL-Grid: Polish Infrastructure for Information Science Support in the European Research Space. Chemistry Portal – ChemPo Applications
34. LFC DS Server LFC DS client Java library New DAC2 API DAC2 LFC connector DAC2 LFC DS methods Method name, Aliases createDirectory(parent,child),create_directorycreateDirectory(path),create_directorydelete(path),delete_file, deleteFiledeleteFile(filename)directory?(filename),isDirectory, is_directory….
35. Automated and transparent handling of Grid credentials Extended EPE DSR Plug-in Reorganized DSR Schema
36. References [1] M. Pomocka, P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. Poster presented as partof theCracowGridWorkshop ’09, Krakow, Poland, 12-14 October 2009. [2] M. Pomocka, P. Nowakowski, and M. Bubak, Integrating EGEE Storage Services with the Virtual Laboratory. In Marian Bubak, Michał Turała, and Kazimierz Wiatr, editors, Proceedings of Cracow Grid Workshop – CGW’09, October 2009, Krakow, Poland. ACC-Cyfronet AGH.to appear [3] Lana Abadie et al., Grid-Enabled Standards-based Data Management. In Mass Storage Systems and Technologies, 2007. MSST 2007. 24th IEEE Conference on, pages 60–71, Sept. 2007. [4] Marian Bubak et al., Virtual Laboratory for Collaborative Applications, In: M. Cannataro (Ed.) Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine and Healthcare, Information Science Reference, 2009, IGI Global [5] Matthias Assel et al. : A Collaborative Environment Allowing Clinical Investigations on Integrated Biomedical Databases. In Tony Solomonides et al. (Ed.): Healthgrid Research, Innovation and Business Case; Proceedings of HealthGrid 2009, Studies in Health Technology and Informatics, vol 147, IOS Press, ISSN 0926-9630, pp 51 -61 [6] M. Malawski, T. Bartynski, and M. Bubak, "Invocation of operations from script-based grid applications," Future Generation Computer Systems, vol. In Press, Accepted Manuscript, 2009. 36