Make some working directories for this exercise. For the rest of this exercise, all your work should be done in there.
First make a directory on terminable.ci.uchicago.edu.
$mkdir dataex$cd dataex
Then use GRAM to create a working directory on osg-edu.cs.wisc.edu:
$ globus-job-run osg-edu.cs.wisc.edu mkdir /nfs/osgedu/YOURLOGIN
This will give you a directory on osg-edu.cs.wisc.edu which you can use for storing files in this exercise.
Next create some files of different sizes, to use for exercises:
$dd if=/dev/zero of=smallfile-YOURLOGIN bs=1M count=10$dd if=/dev/zero of=mediumfile-YOURLOGIN bs=1M count=50$dd if=/dev/zero of=largefile-YOURLOGIN bs=1M count=200$ls -shtotal 261M 201M largefile-YOURLOGIN 51M mediumfile-YOURLOGIN 11M smallfile-YOURLOGIN
Use globus-url-copy to move your small file from your home directory on terminable.ci.uchicago.edu to your home directory on gridlab2.ci.uchicago.edu.
$globus-url-copy file:///home/YOURLOGIN/dataex/smallfile-YOURLOGIN gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex1$echo $?0
The command echo $? checks to see what the return value was for the previous command. If you see a 0 (zero), then globus-url-copy succeeded. A different number indicates a problem. In that case you should also see an error message.
See how fast the file transfer is happening by using the -vb flag when copying the large file. Since this is a transfer over a local network
[1]
that should not be too busy it should be fairly quick:
$ globus-url-copy -vb file:///home/YOURLOGIN/dataex/largefile-YOURLOGIN gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex1
Source: file:///home/YOURLOGIN/dataex/
Dest: gsiftp://gridlab2.ci.uchicago.edu/home/YOURLOGIN/
largefile-YOURLOGIN -> ex1
207618048 bytes 8.81 MB/sec avg 9.09 MB/sec instA quick reminder on URL formats: We've seen two kind of URLs so far.
file:///home/YOURLOGIN/dataex/largefile - a file called largefile on the local file system, in directory /home/YOURLOGIN/dataex/.
gsiftp://osg-edu.cs.wisc.edu/scratch/YOURLOGIN/ - a directory accessible via gsiftp on the host called osg-edu.cs.wisc.edu in directory /scratch/YOURLOGIN.
Trying using 4 parallel data streams by adding the -p flag with an argument of 4:
Now that you're osg-edu.cs.wisc.edu, use the dd commands you issued earlier in the exercise to create the small, medium, and large files to create those files in your home directory on the Teragrid site.
Use the following globus-url-copy command to transfer the file from terminable.ci.uchicago.edu to the osg-edu.cs.wisc.edu:
$ globus-url-copy -p 4 -vb file:///home/YOURLOGIN/dataex/smallfile-YOURLOGIN gsiftp://osg-edu.cs.wisc.edu/home/etrain99/data/ex1Experiment with transferring different file sizes and numbers of parallel streams, to both local and remote sites and see how the speed varies.
Next try a third-party transfer. You do this by specifying two gsiftp URLs, instead of one gsiftp URL and one file URL.
globus-url-copy will control the transfers but data will not pass through the local machine. Instead, it will go directly between the source and destination machines.
Transfer a file between two remote sites, and see if it is faster than if you had transferred it to terminable.ci.uchicago.edu and then back out again.
Try to make up a command line for this yourself - you should use two gsiftp URLs, instead of a file url and a gsiftp URL.
Next use RFT, the reliable file transfer service, to transfer a block of files between two sites.
First, create a transfer job file, which lists some RFT parameters and all of the files to transfer. You can get an example from /soft/globus-4.0.3-r1/share/globus_wsrf_rft_client/transfer.xfr. Read through this and change the URLs at the end to refer to some files of your choice.
The RFT command and transfer job file documentation is here.
The example above lists one transfer in the last two lines of
the file: from the local machine
to itself, transferring the file /tmp/rftTest.tmp
to rftTest_Done.tmp. You should change the two gsiftp
URLs to two other gsiftp URLs. For example, you could use the URLs that
were used in the previous GridFTP exercise.
You can launch an RFT transfer as follows. The client will periodically output transfer status. You can watch jobs move from the pending state, to the Active state and then to the Finished state.
$cp /soft/globus-4.0.3-r1/share/globus_wsrf_rft_client/transfer.xfr rft.xfr$vi rft.xfr... make your changes ... $rft -h terminable.ci.uchicago.edu -f ./rft.xfrNumber of transfers in this request: 3 Subscribed for overall status Termination time to set: 60 minutes Overall status of transfer: Finished/Active/Failed/Retrying/Pending 0/1/0/0/2 Overall status of transfer: Finished/Active/Failed/Retrying/Pending 1/0/0/0/2 Overall status of transfer: Finished/Active/Failed/Retrying/Pending 1/1/0/0/1 Overall status of transfer: Finished/Active/Failed/Retrying/Pending 2/0/0/0/1 Overall status of transfer: Finished/Active/Failed/Retrying/Pending 2/1/0/0/0 Overall status of transfer: Finished/Active/Failed/Retrying/Pending 3/0/0/0/0 All Transfers are completed
Initally all jobs start in the pending state, move to active state and then hopefully to finished state (but maybe fail, in which case they go to the failed state).
The transfer file has a number of options, documented in-line. You can experiment changing them. Interesting ones to try:
Add more URLs to transfer
Transfer between two remote sites
Use parallel streams
Increase the transfer concurrency
In particular you should check that you understand the difference between parallel streams (the number of streams used when transferring one file) and concurrency (the number of files that can be transferred at once).
The above sections have dealt with moving data around, and always made the assumption that you knew where the files you wanted were located.
Next we will deal with the Replica Location Service (RLS).
$ globus-rls-admin -p rls://terminable.ci.uchicago.edu
ping rls://terminable.ci.uchicago.edu: 0 secondsFirst perform a simple query for an example logical filename that has been placed in the RLS by the instructors:
$globus-rls-cli rls://terminable.ci.uchicago.edurls>query lrc lfn exampleexample: gsiftp://terminable.ci.uchicago.edu/scratch/example example: gsiftp://gridlab2.ci.uchicago.edu/scratch/example
This queries for a logical filename example. The results show that this file can be retrieved via either of two URLs (one in scratch space on terminable.ci.uchicago.edu, and one in scratch space on gridlab2.ci.uchicago.edu).
Now try querying for logical filename another-example.
You can also publish your own logical filename into the RLS, with mappings to physical files, using the create command:
rls> create YOURLOGIN-first-lfn gsiftp://terminable.ci.uchicago.edu/home/YOURLOGIN/dataex/largefile-YOURLOGINThis creates an LFN called YOURLOGIN-first-lfn and then adds a mapping to gsiftp://terminable.ci.uchicago.edu/home/YOURLOGIN/dataex/largefile-YOURLOGIN.
rls> query lrc lfn YOURLOGIN-first-lfn
YOURLOGIN-first-lfn: gsiftp://terminable.ci.uchicago.edu/home/YOURLOGIN/dataex/largefile-YOURLOGINNow copy largefile to another place (on another gridlab machine or on one of the remote sites), and register it into the RLS, with the same LFN. You will need to use the add command instead of the create command, because the LFN already exists and you just need to add a new mapping.
Get a neighbour to query the RLS for your logical filename, and see that the mappings you have made are public for everyone to see.
So far, you have only been using the RLS server on terminable.ci.uchicago.edu. There are servers running on other machines.
Use globus-rls-admin to ping the RLS server on gridlab2.ci.uchicago.edu and check that it is online.
Then, connect to one of the other servers using globus-rls-cli and query for the example LFN that we used above. You should see that there are some other locations from which you can get the example file.
Try adding your own LFN into one of the other servers, using globus-rls-cli.
So far, you have interacted with the Local Replica Catalog on each installation. This is where LFNs and mappings are created.
RLS has another component, called a Replica Location Index. This gathers information from several LRCs, so that you can find LFN mappings from all of those LRCs in one place.
There is an RLI on terminable.ci.uchicago.edu that gathers information from all of the tutorial LRCs.
You can query it like this:
rls>query rli lfn examplerls>query rli lfn exampleexample: rls://terminable.ci.uchicago.edu:39281
What comes back from the RLI is not a list of physical files. Instead, it is a list of LRCs that have some information about the requested LFN.
To find all the replicas, you need to query all of the listed LRCs in turn. In the above paste, only terminable.ci.uchicago.edu is listed. During the tutorial, you will hopefully find that other LRCs also know some information about example LFN.
Although it may not happen immediately, the RLIs will also learn about the logical filenames that you have created for yourself.
Query the RLI to see if the logical names that you added above have appeared in the RLI. If they haven't yet, wait a while and try again.
Next use the -S option to check the status/statistics of each of the two servers. You should see output similar to that below:
$ globus-rls-admin -S rls://terminable.ci.uchicago.edu
Version: 2.1.5
Uptime: 00:28:15
LRC stats
update method: lfnlist
update method: bloomfilter
updates bloomfilter: rls://gridlab2.ci.uchicago.edu:39281 last 06/21/04 22:44:45
lfnlist update interval: 86400
bloomfilter update interval: 900
numlfn: 1
numpfn: 1
nummap: 1
RLI stats
updated by: rls://gridlab2.ci.uchicago.edu:39281 last 06/21/04 22:44:35
updated via bloomfilters
globus-rls-admin -S rls://gk2
Version: 2.1.5
Uptime: 00:32:33
LRC stats
update method: lfnlist
update method: bloomfilter
updates bloomfilter: rls://gridlab2.ci.uchicago.edu:39281 last 06/21/04 22:44:40
lfnlist update interval: 86400
bloomfilter update interval: 900
numlfn: 2
numpfn: 2
nummap: 2
RLI stats
updated by: rls://gridlab2.ci.uchicago.edu:39281 last 06/21/04 22:44:49
updated via bloomfilters