Introduction to Multilingual Retrieval Augmented Generation (RAG)
Alfredo paganophd 3y
1. Ferrara, Tuesday, January 27, 2015
Corso di Dottorato in Matematica e Informatica
Università degli studi di Ferrara
2nd Year PhD Activities Report (2009)
Alfredo Pagano
Advisor: Prof. Eleonora Luppi
Coadvisor: Dr. Mario Reale
1
2. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
2
Present employment
Sysadmin in charge of installing, supporting, and maintaining servers
and core services at GARR
The aim of Consortium GARR is to plan, manage, and operate the
Italian National Research and Education Network,
implementing the most advanced technical solutions and services.
Networking Support Activity (EGEE-III SA2)
Enabling Grids for E-sciencE (EGEE) is Europe's leading grid
computing project, providing a computing support infrastructure for over
10,000 researchers world-wide, from fields as diverse as high energy
physics, earth and life sciences.
3. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
3
GARR-G: Backbone Physical Infrastructure
GARR-G: PoP-level topology
About 45 GARR PoP
90% in Univ. research institutions
premises
other in telco operators and Internet
eXchange premises
About 60 backbone links
Mainly leased lines from 8 telco
operators
Core links
10 Gbps (STM-64) and 2.5 Gbps (STM-
16)
Edge links
34 Mbps, 155 Mbps and 622 Mbps
Peering links
10 Gbps (STM-64 and
10GigabitEthernet), 2.5 Gbps (STM-16)
and 1 Gbps (1 GigabitEthernet)
Backbone capacity 120 Gbps
Peering capacity 40 Gbps
5. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
5
PhD subject
Motivation, Mission and Scope
The idea: pros and cons
Metrics collected
First results
Next steps
5
6. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
6
PhD subject
1st
year activity - The research activity was
focused on exploiting and testing network
monitoring tools to gather relevant metrics
related to end-to-end performances in Grid
infrastructures.
2nd
year activity – Design and create a first
working version of a Grid Network monitoring
tool based on Grid jobs
7. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
7
Motivation (1/2)
Debugging networks for efficiency
an essential step for those wishing to run data intensive
applications
Optimizing performances of Grid middleware and
applications to make intelligent use of the network
adapting to changing network conditions
Supporting the Grid “utility computing” model
ensuring that the differing network performances required by
particular Grid applications are provided, via measurable SLAs
8. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
8
Motivation (2/2)
An help for Site and Grid operations
Help diagnose performance problems among sites
This transfer is slow, what’s broken? – the network, the server, the middleware…
I can’t see site X, has the network gone down or is it just a particular service or
machine?
My application’s performance varies with time of day – is there a network bottleneck?
Help diagnose problems within sites
Most network problems, especially performance issues, are not backbone related, they
are in the “last mile”
Help with planning and provisioning decisions
Is an SLA I’ve arranged being adhered to by my providers?
For Grid services and middleware
I want to increase the performance of file transfers between sites
I want to know which compute site is “closest” to my data to submit a job to it
9. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
9
The idea: pros and cons
Instead of installing a probe at each site,
run a Grid Job!
Added value:
No installation needed at the sites
Monitoring system running on a proven system (the grid) & possibility to
use grid services
Direct use of grid AuthN and AuthZ
Limits:
The job is not running with root privileges on the Worker Node (WN)
Some low level operations are not permitted
Heterogeneity of the WN environments (OS, 32/64 bits, etc.)
Ex: making the job download-and-run an external binary may be tricky
(except if they are written in an OS independent programming language)
The system has to deal with the grid mechanism overhead (delays…)
10. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
10
The system in action
Site paris-urec-ipv6
UI
Central monitoring
server program (CMSP)
Site A
WN
Job
Site X
WMS
CE
Site B
WN
Job
CE
Site C
WN
Job
CE
Job submission
Socket connection
Ready!
Probe Request
Request:
RTT test to site A
Request:
RTT test to site A
Request:
BW test to site B
Request:
BW test to site B
11. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
11
Some remarks
Chosen design is more efficient than starting a job for each
probe (considering delays)
TCP connection is initiated by the job
No open port needed on the WN -> better for sites security
An authentication mechanism is implemented between the job
and the server
High scalability (Bend and Fend can be easily decoupled)
A job cannot run forever (GlueCEPolicyMaxWallClockTime)
there are two jobs running at each site
A ‘main’ one
A ‘redundant’ one which is waiting and will become ‘main’ when the
other one ends
12. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
12
Round-Trip time, MTU and hop count tests
Site paris-urec-ipv6
UI
Central monitoring
server program (CMSP)
Site B
WN
Job
Site C
CE
Socket connection Probe Request
Request:
RTT test to site C
Request:
RTT test to site C
Probe Result
13. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
13
Round-Trip time, MTU and hop count tests
The ‘RTT’ measure is the time a TCP ‘connect()’ function call takes:
Because a connect() call involves a round-trip of packets:
SYN ->
SYN-ACK <-
ACK ->
Results similar to the ones of ‘ping’
The MTU is given by the IP_MTU socket option
The number of hops is calculated in an iterative way
All these measures require:
To connect to an accessible port (1) on a machine of the remote site
To close the connection (no data is sent)
Note: This (connect/disconnect) is detected in the application log
(1): We use the port of the gatekeeper of the CE since it is known to be
accessible (it is used by gLite)
Round trip
Just sending => no network delay
14. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
14
WN-to-WN BW test (may be obsoleted)
Site paris-urec-ipv6
UI
Central monitoring
server program (CMSP)
Site A
WN
Job
Site C
WN
Job
Probe Request
Request:
BW test to wn-site-C:<p>
Request:
BW test to wn-site-C:<p>
Request:
Open a TCP port <p>
Request:
Open a TCP port <p>
Socket connection
Sending a big
amount of data
Probe Result
15. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
15
GridFTP BW test
Site paris-urec-ipv6
UI
Central monitoring
server program (CMSP)
Site A
WN
Job
Site C
Probe Request
Request:
GridFTP BW test to site
C
Request:
GridFTP BW test to site
C
Socket connection
SE SE
Replication of
a big grid file
Read the
gridFTP log file
Probe Result
16. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
16
WN-to-WN BW test (under discussion)
It requires the remote site to allow incoming TCP
connections to the WN
Not a best practice security policy
Not always possible (WNs behind a NAT)
Workaround are sometimes possible
WN WN transfers doesn’t reflect real use case
The WN network connectivity may not be adapted
17. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
17
Metrics collected & scheduling
Latency test
Ping
Every 5 minutes
Hop list
Traceroute
Every 5 minutes
MTU size
Socket (IP_MTU socket option)
Every 5 minutes
Achievable Bandwidth
TCP throughput transfer via GridFTP transfer between 2 Storage Element
Every 8h
18. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
18
8 Sites involved
A. Paris Urec CNRS
B. IN2P3 Lyon
C. INFN-CNAF
D. INFN-ROMA1
E. INFN-ROMA-CMS
F. GRISU-ENEA-GRID
G. INFN-BARI
H. INFN-CATANIA
19. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
19
Traceroute Paris-Catania
[pagano@ui-ipv6-testbed ~]$ traceroute grid005.ct.infn.it
traceroute to grid005.ct.infn.it (193.206.208.18), 30 hops max, 38 byte packets
1 194.57.137.190 (194.57.137.190) 1.589 ms 1.479 ms 2.696 ms
2 r-interco-urec.reseau.jussieu.fr (134.157.247.38) 0.331 ms 0.273ms 0.348 ms
3 r-jusrap-reel.reseau.jussieu.fr (134.157.254.124) 0.368 ms 0.320ms 0.318 ms
4 interco-6.01-jussieu.rap.prd.fr (195.221.127.181) 0.258 ms 0.330ms 0.255 ms
5 * * *
6 te1-2-paris1-rtr-021.noc.renater.fr (193.51.189.230) 1.224 ms1.127 ms 1.121 ms
MPLS Label=489 CoS=6 TTL=1 S=0
7 te0-0-0-3-paris1-rtr-001.noc.renater.fr (193.51.189.37) 1.182 ms1.298 ms 1.150 ms
12 rt1-mi1-rt-mi2.mi2.garr.net (193.206.134.190) 17.555 ms 17.533ms 17.646 ms
13 rt-mi2-rt-rm2.rm2.garr.net (193.206.134.230) 26.996 ms 27.071 ms 27.183 ms
14 rt-rm2-rt-rm1-l1.rm1.garr.net (193.206.134.117) 27.050 ms 27.160ms 27.062 ms
15 rt-rm1-rt-ct1.ct1.garr.net (193.206.134.6) 44.854 ms 44.882 ms 44.820 ms
16 rt-ct1-ru-infngrid.ct1.garr.net (193.206.137.186) 45.115 ms 45.013 ms 45.009 ms
17 grid005.ct.infn.it (193.206.208.18) 45.014 ms 44.967 ms 44.913 m
8 renater.rt1.par.fr.geant2.net (62.40.124.69) 1.154 ms 1.153 ms 1.128 ms
9 so-7-3-0.rt1.gen.ch.geant2.net (62.40.112.29) 9.950 ms 9.958 ms 10.018 ms
10 so-3-3-0.rt1.mil.it.geant2.net (62.40.112.210) 17.264 ms 17.314ms 17.372 ms
11 garr-gw.rt1.mil.it.geant2.net (62.40.124.130) 17.369 ms 21.514ms 17.368 ms
20. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
20
Frontend view:
Ldap Authentication, based on Google Web Toolkit (GWT) framework
21. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
21
Next steps:
1. Triggering system to alert site and network admins
2. Frontend improvements (plotting graphs)
3. Not only scheduled, but also on-demand measurements
…
24. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
24
GridFTP BW test
This test shows good results
If the GridFTP log file is not accessible (cf. dCache?)
We just do the transfer via globus-url-copy and measure the
time it takes
This is slightly less precise
How many streams should we request in the command line?
globus-url-copy –p <num_streams> […]
25. Ferrara,
Tuesday, January 27, 2015
Alfredo Pagano
25
Network Performance Factors
End System Issues
Network Interface Card and Driver
and their configuration
TCP and its configuration
Operating System and its
configuration
Disk System
Processor speed
Bus speed and capability
Application eg old versions of scp
Network Infrastructure Issues
Obsolete network equipment
Configured bandwidth restrictions
Topology
Security restrictions (e.g., firewalls)
Sub-optimal routing
Transport Protocols
Network Capacity and the
influence of Others!
Many, many TCP connections
Congestion