Condor-G NMI Release April 2002

Overview
--------

Condor-G is a program that manages both a queue of jobs and the
resources from one or more sites where those jobs can execute. It
communicates with these resources and transfers files to and from these
resources using Globus mechanisms. (In particular, Condor-G uses the GRAM      
protocol for job submission, and runs a local GASS server for file transfers).  

It may seem like Condor-G is a simple replacement for the Globus toolkit's
"globusrun" command. However, Condor-G allows you to submit many jobs at
once and then to monitor those jobs with a convenient interface, receive
notification when jobs complete or fail, and maintain your Globus
credentials which may expire while a job is running. On top of this,
Condor-G is a fault-tolerant system--if your machine crashes, you can still
perform all of these functions when your machine returns to life.

Condor-G is derived from the Condor system, and all of the tools that have
been built on top of Condor will work with Condor-G. For example, the Condor
DAGMan can manage job inter-dependance and provide fault-tolerance at a
higher level.

Installation
------------
This distribution is packaged using the Grid Packaging Technology
(GPT). To install it, you must first install the GPT. You can obtain GPT
2.0 from the following locations:

ftp://ftp.nsf-middleware.org/pub/tools/src/gpt-2.0.tar.gz

GPT Requirements:

The Grid Packaging Tools are written in perl. An installation of Perl
5.005 or greater is required on your system. Perl can be downloaded from 
www.perl.com. 

Setting Up a Packaging Environment:

There is one environment variable (GLOBUS_LOCATION) required to use the
packaging framework. This is the location where the output of 
builds/install of Globus packages will be placed. You can switch between
multiple installations by changing the value of $GLOBUS_LOCATION.

There is an optional environment variable (GPT_LOCATION) that will affect
the packaging infrastructure. This can be used to have the packaging tools
installed in a separate location from Globus packages. You do not have to
however, define GPT_LOCATION. If you do not, the packaging tools will be
installed behind GLOBUS_LOCATION, as is most commonly done. 

In the instructions below, we show how to build if you have set both
GPT_LOCATION and GLOBUS_LOCATION. However, we do not expect that every
user will want to maintain separate directories for Globus and its
packaging toolkit. If you would like, simply replace GPT_LOCATION with
GLOBUS_LOCATION throughout the instructions below to install both the
packaging toolkit as well as the Globus toolkit in the same location, as
is most commonly done. That way, you will only have to set one environment
variable, GLOBUS_LOCATION. 

Installing GPT:

This step is required for both binary and source bundles. Untar the
distribution and enter the following commands.

    % cd gpt-2.0
    % ./build_gpt

Note: If your perl 5.005 executable is not named "perl" or is not in your
command search path, add --with-perl={perl-cmd} to the build_gpt command
to identify the perl executable to be used by the packaging tools. 

All of the perl libraries will be installed in $GPT_LOCATION/lib/perl. All
of the scripts will be installed into $GPT_LOCATION/sbin. (If 
$GPT_LOCATION is not set, $GLOBUS_LOCATION will be used for both of the
above locations.) 

Installing Condor-G Basic
-------------------------
Once the GPT is installed and the GLOBUS_LOCATION (and possibly 
GPT_LOCATION) environment variable is set, you can install the Condor-G
Basic bundle. The following commands assume you're installing the linux
version. Change names as appropriate for other platforms.

    % $GPT_LOCATION/sbin/gpt-install \  
        Condor-G_linux_bundle.tar.gz

Note: Remember to replace GPT_LOCATION with GLOBUS_LOCATION if you did not
create a separate packaging directory.

    % $GPT_LOCATION/sbin/gpt-postinstall

Condor-G includes daemons that must be running on the local machine for
proper operation. Run the following command as root to start the daemons.
Condor-G can be run as a non-root service if it will only be used by a
single user. In that case, run this command as that user.

    % $GLOBUS_LOCATION/sbin/SXXcondor start

To ensure that the Condor-G daemons are restarted automatically when the
machine reboots, copy the script $GLOBUS_LOCATION/sbin/SXXcondor into the
system's startup scripts directory (e.g. /etc/rc.d).

A simple way to tell if Condor-G is running [properly] is to try to access
the job queue.

    % $GLOBUS_LOCATION/bin/condor_q
    
You should see something like the following output:
    
-- Submitter: nostos.cs.wisc.edu : <128.105.165.29:34896> : nostos.cs.wisc.edu
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               

0 jobs; 0 idle, 0 running, 0 held


Examples
--------

A job is submitted for execution to Condor using the condor_submit command.
condor_submit takes as an argument the name of a file called a submit
description file. The following sample submit description file will run a job
on the Origin2000 at NCSA.

    executable = test-progam
    globusscheduler = modi4.ncsa.uiuc.edu/jobmanager
    universe = globus
    output = test.out
    log = test.log
    queue

The executable for this example is transferred from the local machine to the
remote machine. By default, Condor-G transfers the executable. Note that this
executable must be compiled for the correct platform.

The universe is set to globus. A "universe" tells Condor-G how to start and run 
the job. By setting it to globus, Condor-G will use the Globus Toolkit(tm)
to run your job.
 
The globusscheduler tells Condor-G which Globus resource it should use to
run your job. In the example above, it will use the default jobmanager on
the Origin2000 at NCSA.

No input file is specified for the job. Condor transfers the standard output
produced from the remote machine to the local machine, and puts it into 
the file test.out  after the job completes.  The log file is maintained on 
the local machine. Condor-G will write periodic updates about the progress of 
your job into this file.

To submit this job to Condor-G for execution on the remote machine, use

    % condor_submit test.submit

Where test.submit is the name of the submit description file. Example output
from condor_q for this submission looks like:


    % condor_q

-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD               
   7.0   epaulson        3/26 14:08   0+00:00:00 I  0   0.0  test

1 jobs; 1 idle, 0 running, 0 held


After a short time, Globus accepts the job. Again running condor_q will now
result in


    % condor_q

-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
   7.0   epaulson        3/26 14:08   0+00:00:00 R  0   0.0  test

1 jobs; 0 idle, 1 running, 0 held


Then, very shortly after that, the queue will be empty again:


    % condor_q

-- Submitter: wireless48.cs.wisc.edu : <128.105.48.148:33012> : wireless48.cs.wi
 ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD

0 jobs; 0 idle, 0 running, 0 held


A second example of a submit description file runs the Unix ls program on a
different Globus resource.

    executable = /bin/ls
    Transfer_Executable = false
    globusscheduler = vulture.cs.wisc.edu/jobmanager
    universe = globus
    output = ls-test.out
    log = ls-test.log
    queue

In this example, the executable (the binary) is pre-staged. The executable is
on the remote machine, and it is not to be transferred before execution. The
command

    Transfer_Executable = FALSE

within the submit description file identifies the executable as being
pre-staged. In this case, the executable command gives the pathname to the
executable on the remote machine.

Condor-G can set environment variables for your job. Save the following
perl script as "env-test.pl", and run 'chmod 755 env-test.pl' to make it
executable.

#!/usr/bin/env perl

foreach $key (sort keys(%ENV)) 
{
  print "$key = $ENV{$key}\n";
}

exit 0;


Now create the following submit file:

	executable = env-test.pl
	globusscheduler = biron.cs.wisc.edu/jobmanager
	universe = globus
	environment = foo=bar; zot=qux
	output = env-test.out
	log = env-test.log
	queue

(Replace biron.cs.wisc.edu/jobmanager with a resource you're authorized to
use)

When the job has completed, env-test.out should contain something like 
this:

GLOBUS_GRAM_JOB_CONTACT = https://biron.cs.wisc.edu:36213/30905/1020633947/
GLOBUS_GRAM_MYJOB_CONTACT = URLx-nexus://biron.cs.wisc.edu:36214/
GLOBUS_LOCATION = /usr/local/globus
GLOBUS_REMOTE_IO_URL = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633948
HOME = /home/epaulson
LANG = en_US
LOGNAME = epaulson
X509_USER_PROXY = /home/epaulson/.globus/.gass_cache/globus_gass_cache_1020633951
foo = bar
zot = qux


Of particular interest is the GLOBUS_REMOTE_IO_URL environment variable. 
Condor-G automatically starts up a GASS remote I/O server on the submitting
machine. Because of the potential for either side of the connection to
fail, the URL for this server cannot be passed directly to the job. Instead,
it is put into a file, and the GLOBUS_REMOTE_IO_URL environment variable points
to this file. Remote jobs can read this file and use the URL it contains
to access the remote GASS server running inside Condor-G. If the location
of the GASS server changes (for example, if Condor-G restarts) Condor-G will
contact the Globus gatekeeper and update this file on the machine where the
job is running. It is therefore important that all accesses to the remote
GASS server check this file for the latest location.

The following perl script will use the GASS server in Condor-G to copy
input files to execute machine. (In our case, our remote job is just going
to count the number of lines in a file. Hopefully your job will be a bit
more productive.)

#!/usr/bin/env perl

use FileHandle;
use Cwd;

STDOUT->autoflush();
$gassUrl = `cat $ENV{GLOBUS_REMOTE_IO_URL}`;
chomp $gassUrl;

$ENV{LD_LIBRARY_PATH} = $ENV{GLOBUS_LOCATION}. "/lib";
$urlCopy = $ENV{GLOBUS_LOCATION}."/bin/globus-url-copy";

#globus-url-copy needs a full pathname
$pwd = getcwd();
print "$urlCopy  $gassUrl/etc/hosts file://$pwd/temporary.hosts\n\n";
`$urlCopy  $gassUrl/etc/hosts file://$pwd/temporary.hosts`;

open(file, "temporary.hosts");
while(<file>) {
        print $_;
}

exit 0;


Our submit file looks like this:

	executable = gass-example.pl
	globusscheduler = biron.cs.wisc.edu/jobmanager
	universe = globus
	output = gass.out
	log = gass.log
	queue

