2 GridFTP and file transfer
In order to use GridFTP for file transfer, one needs a GridFTP client program that provides the interface between the user and a remote GridFTP server. There are several clients available for GridFTP, one of which is globus-url-copy, a command line tool which can transfer files using the GridFTP protocol as well as other protocols such as http and ftp. globus-url-copy is distributed with the Globus Toolkit and usually available on machines that have the Globus Toolkit installed.
The basic syntax of the globus-url-copy command is:
globus-url-copy [options] sourceURL destinationURL
where the arguments are described in the following table.
|[options]||The optional command line switches as described in 2.3 Command line options for globus-url-copy|
|sourceURL||The URL of the file(s) to be copied. If it is a directory, it must end with a slash (/), and all files within that directory will be copied.|
|destURL||The URL to which to copy the file(s). To copy several files to one destination URL, destURL must be a directory and be terminated with a slash (/)|
globus-url-copy supports multiple protocols, so the format of the source and destination URLs can be either
when you refer to a local file or directory or
when you refer to a remote file or directory. While globus-url-copy is supporting other protocols such as http, https and ftp as well, in the DEISA infrastructure it is only possible to use the GridFTP protocol: gsiftp://
The port number can be omitted if the GridFTP server’s listens on the default port 2811.
- must be an absolute path for file://
- for gsiftp:// the path can be relative to the user’s home directory, in which case it must start with ~
- must be terminated with a slash (/), if it refers to a directory.
To transfer data with globus-url-copy using the gsiftp:// protocol, the user must have valid credentials, as will be described below. Normally you will use file:// for addressing a local and gsiftp:// for addressing a remote file or directory. However, note that the GridFTP protocol supports so-called third party-transfers where you can transfer data between two remote servers. In this case you have to use gsiftp:// both for the source and the destination URL.
We present the most important command line options. For a much more comprehensive description of available options, see the documentation on the Globus website http://www.globus.org/.
When you use the optional parameters given in the table below, you will get additional information:
Prints usage information for the globus-url-copy program.
Prints the version of the globus-url-copy program.
|During the transfer, displays: (1) number of bytes transferred (2) performance since the last update (every 5 seconds) (3) average performance for the whole transfer|
The following table lists parameters which you can set to optimize the performance of your data transfer:
|-tcp-bs <size>||Specifies the size (in bytes) of the TCP buffer to be used by the underlying GridFTP data channels.|
|-p <number of parallel streams>||Specifies the number of parallel streams to be used in the GridFTP transfer.|
|-stripe||Use this parameter to initiate a “striped” GridFTP transfer that uses more than one node at the source and destination. As multiple nodes contribute to the transfer, each using its own network interface, a larger amount of the network bandwidth can be consumed than with a single system. Thus, at least for “big” (> 100 MB) files, striping can considerably improve performance|
How to choose values for these parameters?
Concerning the first two parameters – TCP buffer size and parallelism –, generally the optimal values depend on factors such as the latency between the source and destination sites, the available bandwidth, network traffic etc. Some of the parameters are fixed (for instance you can measure the latency yourself using ping), others such as the limiting bandwidth are only known to the network administrators at the various DEISA sites. However, as a rule of thumb we recommend to use the following values:
- four parallel streams should be enough.
- for the typical latencies that occur in the DEISA network use 4MB for the TCP buffer size.
If you plan a lot of transfers of big files, it might be advisable to vary the value to see how it influences performance. For instance, a higher TCP buffer size than the recommended one could give you more performance between sites with a larger latency, however, more memory is used, which may affect transfer.
With regard to striping, currently the following DEISA sites are supporting multi-striping: CINECA, IDRIS, RZG and SARA.
In order to make GridFTP usage easier for the DEISA users, we deployed on all DEISA sites a wrapper script, called gscp, around the globus-url-copy command. Reading static parameters such as server names, port numbers, optimal TCP buffer size, etc. from a configuration file, this tool will pre-set most of the values for you, but will give you the freedom to overwrite them. It is possible to use site names used in deisa_service script. For example to copy a file (source.txt) from the current directory to the home directory at SARA's GridFTP server
gscp source.txt sara:target.txt
To copy a file (source.txt) from SARA to current directory the command would be:
gscp sara:source.txt target.txt
For more options please see