Swift tutorial exercises at Mardis Gras Conference 2008


Introduction

Introduction to swift

This hands on tutorial is intended to introduce new users to the basics of a grid workflow system called Swift.

The slides for the presentation part of this tutorial are at http://www.ci.uchicago.edu/~wilde/SwiftTutorial.2008.0131.ppt

Notes information

These notes will guide you through a number of exercises at your own pace. You will be given commands to type, expected output and notes highlighting the key aspects of a particular step.

There are lab assistants to help you with problems or to answer any questions that you have. Do not hesitate to talk to them.

These notes have transcripts from a machine called 192.168.1.104. You might be using a different machine, in which case you should be careful to replace 192.168.1.104 with the name of the machine you are logged in to.

The exercise notes was prepared by running as user train99. The teaching assistants will give you your own login name and number. Make sure to use that in the exercises instead of train99 throughout the school.

You will see various styles of text in the tutorial notes.

Text like this represents output from your computer.

Text like this is input that you should type.

Text like this is a listing of the content of a file. Sometimes you will
need to copy this content into a file in your account as part of the
exercise.

Note

Some notes are highlighted to draw your attention to them. You should pay special attention to text like this.

Caution

Sometimes we have warnings that indicate where even more attention is required because harmful mistakes often occur here.

Connecting to the Linux training hosts

You will be doing all the lab exercises on a set of Linux computers (hosts) named workshop1 and 192.168.1.103.

Each host has a fully qualified host name which uniquely identifies it on the internet; for example: 192.168.1.104).

To access workshop1 from your computer, use secure shell.

SSH from a Windows machine

On a Windows machine, use the PuTTY ssh client. Open PuTTY and enter the hostname of the computer that you will use. PuTTY can be downloaded here.

SSH from a Linux or Macintosh machine

On a Mac, Linux or other unix machine, use a command-line terminal and ssh command-line tool. Open a terminal and type:

$ ssh train99@192.168.1.104

Note

Make sure to replace the login name with your login name, as assigned by the instructors.
The authenticity of host '192.168.1.104 (192.168.1.104)' can't be established. RSA key fingerprint is 36:74:78:a8:ed:6b:38:96:63:20:01:df:46:9b:59:3b. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added '192.168.1.104,192.168.1.104' (RSA) to the list of known hosts. train99@192.168.1.104's password: PASSWORD # not echoed workshop1$

Note

Now you're talking to a shell on the workshop1 server

After the first time you do this, you won't get the "Are you sure..." prompt. Some of you will never see this, as your computers were used for testing this material, and the "yes" reply was already supplied by a tester. So it will look like:

$ ssh train99@192.168.1.104
Password: PASSWORD 
workshop1$

You should be able to reach the other lab host 192.168.1.103 in this way too.

Cut-and-paste practice

To start, practice cutting text from this page to your terminal window. Cut the pwd command from the box below and paste it into your terminal window to execute it. Using cut and paste is a good way to avoid making typos while entering commands from the examples, but make sure that you read and understand the command that you are copying - often there will be parameters that you must change for the command to work properly (such as your username).

$ pwd
/home/train99

A first workflow

The first example program uses an image processing utility to perform a visual special effect on a supplied image file.

Here is the SwiftScript program that we will use:

type imagefile;

(imagefile output) flip(imagefile input) {
  app {
    convert "-rotate" "180" @input @output;
  }
}

imagefile girl <"Gridworkshop3.jpg">;
imagefile flipped <"output.jpg">;

flipped = flip(girl);

When this workflow is run, it has the effect of running this command:

convert -rotate 180 Gridworkshop3.jpg output.jpg

which uses ImageMagick to rotate a supplied image.

ACTION: First prepare your working environment:

$ cp /sw/workflow/Gridworkshop3.jpg .
$ ls *.jpg
Gridworkshop3.jpg

ACTION: Open a new window in your web browser and go to http://192.168.1.104/~train99. You will see a list of files in your training account. Choose Gridworkshop3.jpg

You should see a picture. This is the picture that we will modify in our first workflow.

ACTION: use your favourite text editor to put the above SwiftScript program into a file called flipper.swift.

ACTION: Execute the workflow like this:

$ swift flipper.swift

Swift v0.1-dev

RunID: e1bupgygrzn12
convert started
convert completed

$ ls *.jpg
Gridworkflow3.jpg
output.jpg

A new jpeg has appeared - output.jpg.

ACTION: Open output.jpg in your web browser. You should see that the image is different from the input image - it has been rotated 180 degress.

The structure of this SwiftScript program is a type definition, a procedure definition, a variable definition and then a call to the procedure. We will go over each of those in a bit more detail now:

All data in SwiftScript must have a type. This line defines a new type called imagefile, which will be the type for all of our images.

type imagefile;

Next we define a procedure called flip. This procedure will use the ImageMagick convert application to rotate a picture around by 180 degrees.

(imagefile output) flip(imagefile input) {
  app {
    convert "-rotate" "180" @input @output;
  }
}

To achieve this, it executes the ImageMagick utility 'convert', passing in the appropriate commandline option and the name of the input and output files.

In SwiftScript, the output of a program looks like a return value. It has a type, and also has a variable name (unlike in most other programming languages).

imagefile girl <"Gridworkshop3.jpg">;
imagefile flipped <"output.jpg">;

We define two variables, called girl and flipped. These variables will contain our input and output images, respectively.

We tell swift that the contents of the variables will be stored on disk (rather than in memory) in the files Gridworkshop3.jpg and in output.jpg. This is called mapping.

In this case, the file Gridworkshop3.jpg already exists. It is an input to the workflow. output.jpg does not exist to begin with. It will be created by the workflow.

flipped = flip(girl);

Now we call the flip procedure, with the variable 'girl' as its input and its output going into the variable 'flipped'.

Over the following exercises, we will use this relatively simple SwiftScript program as a base to demonstrate other Swift features.

A second program

Our next example program uses some more SwiftScript syntax to produce images that are rotated by different angles, instead of flipped over all the way.

Here is the program in full. We'll go over it section by section.

type imagefile;

(imagefile output) rotate(imagefile input, int angle) {
  app {
    convert "-rotate" angle @input @output;
  }
}

imagefile girl <"Gridworkshop3.jpg">;

int angles[] = [45, 90, 120];

foreach a in angles {
    imagefile output <single_file_mapper;file=@strcat("rotated-",a,".jpeg")>;
    output = rotate(girl, a);
}
type imagefile;

We keep the same type definition for image files that was used in the previous program.

(imagefile output) rotate(imagefile input, int angle) {
  app {
    convert "-rotate" angle @input @output;
  }
}

This rotate procedure looks very much like the flip procedure from the previous program, but we have added another parameter, called angle. Angle is of type 'int', which is a built-in SwiftScript type for integers. We use that on the commandline instead of a hard coded 180 degrees.

imagefile girl <"Gridworkshop3.jpg">;

Our input image is the same as before.

int angles[] = [45, 90, 120];

Now we define an array of integers, and initialise it with three angles.

foreach a in angles {

Now we have a foreach loop. This loop will execute the loop body once for each element in the angles array. In each iteration, the element will be put in the variable 'a'.

It is important to realise that this is not a sequential loop like a C or Java for() loop. Execution happens in parallel where possible. This loop, for example, will run rotate three times in parallel.

    imagefile output <single_file_mapper;file=@strcat("rotated-",a,".jpeg")>;

Inside the loop body, we have an output variable that is mapped differently for each iteration. We use the single file mapper and the @strcat function to construct a filename and then map that filename to our output variable.

The single file mapper provides a more advanced way of mapping files. Earlier we mapped files by specifying the filename inside of angle brackets. The single file mapper allows us to specify the filename as a SwiftScript expression, so that we do not need to list the filenames explicitly inside the SwiftScript program.

@strcat is a function that can be used in a swift expression to join together several strings into a single string. We use it here to construct the filename for the output of each iteration.

    output = rotate(girl, a);
}

Now we invoke rotate, passing in our input image and the desired rotation angle. We assign the output in the mapped output file. This will happen three times, with a different output filename and a different angle each time.

ACTION: Put the program source into a file called in rotate.swift and execute it with the swift command, like we did for flipper.swift above.

$ ls rotated*
rotated-120.jpeg rotated-45.jpeg  rotated-90.jpeg

ACTION: pick two more angles and modify the swift program so that it generates output for angles of 45 degrees, 90 degrees, 120 degrees, and your two chosen new angles. Run the SwiftScript to check that this works.

Third example

Our third example will introduce some more concepts: complex data types, another mapper and the transformation catalog.

We will use convert from the previous exercise, and add another program slicer which will generate 2-dimensional images from 3-dimensional fMRI brain scan data.

Here's the complete listing:


type imagefile;
type pgmfile;

type voxelfile;
type headerfile;

type volume {
    voxelfile img;
    headerfile hdr;
};

volume reference <simple_mapper;location="Raw/",prefix="reference.">;

(pgmfile outslice) slicer(volume input, string axis, string position)
{
    app {
        slicer @input.img axis position @outslice;
    }
}

(imagefile output) convert(pgmfile inpgm)
{
    app {
        convert @inpgm @output;
    }
}

pgmfile slice;

imagefile slicejpeg <"slice.jpeg">;

slice = slicer(reference, "-x", ".5");

slicejpeg = convert(slice);

Warning

We need to make some changes to other files in addition to putting the above source into a file. Read the following notes carefully to find out what to change.
type imagefile;
type pgmfile;
type voxelfile;
type headerfile;

We define some simple types - imagefile as before, as well as three new ones.

voxelfile and headerfile are files which make up a 3-dimensional fMRI brain scan.

pgmfile is to represent images in PGM format (we've been using imagefile to refer to files in JPEG format).

type volume {
    voxelfile img;
    headerfile hdr;
};

Now we define a complex type to represent a brain scan. Our programs store brain data in two files - a .img file and a .hdr file. A .img file (with type voxelfile) and a .hdr file (with type headerfile) always come in pairs. This complex type defines a volume type, to represent those pairs.

When we pass around variables of type volume, we'll automatically be dealing with both the header file and the image file without having to list them separately.

volume reference <simple_mapper;location="Raw/",prefix="reference.">;

Now we map our input 3-d brain volume. A volume is made of several files, so we can't use the single file mapper any more.

Instead, we will use the simple mapper. This maps data structures into files based on their filename. In the above example, the filenames will be in the Raw/ directory and prefixed with "reference.". The rest of the filename will be the structure member name (img or hdr).

This mapper will map the reference volume like this:

    Swift expression         Filename
    reference.hdr     --->   Raw/reference.hdr
    reference.img     --->   Raw/reference.img

We need to put the appropriate brain scan files into the Raw/ directory so that swift can find them:

ACTION REQUIRED: Type the following:

$ mkdir Raw
$ cp /sw/workflow/data/reference.* Raw/

Now you will have a copy of the reference files in the Raw/ directory.

(imagefile output) convert(pgmfile inpgm)
{
    app {
        convert @inpgm @output;
    }
}

This procedure is similar to the previous flip and rotate procedures. It uses convert to change a file from one file format (.pgm format) to another format (.jpeg format)

(pgmfile outslice) slicer(volume input, string axis, string position)
{
    app {
        slicer @input.img axis position @outslice;
    }
}

Now we define another procedure that uses a new application called slicer. Slicer will take a slice through a supplied brain scan volume and produce a 2d image in PGM format.

We must tell Swift where to find the slicer program by making an entry in the transformation catalog. The transformation catalog maps logical transformation names (such as convert or slicer into unix executable paths (like /usr/bin/convert).

The transformation catalog is in your home directory, in a file called tc.data. There is already one entry there, for convert:

localhost    convert    /usr/bin/convert    INSTALLED INTEL32::LINUX null

ACTION REQUIRED: Open tc.data in your favourite unix text editor, and add a new line to configure the location of slicer. Note that you must use TABS and not spaces to separate the fields:

localhost    slicer    /sw/workflow/app/slicer-swift    INSTALLED INTEL32::LINUX null

For now, ignore all of the fields except the second and the third. The second field slicer specifies a logical transformation name and the third specifies the location of an executable to perform that transformation.

pgmfile slice;

Now we define a variable which will store the sliced PGM image. This will be used to pass the output of slicer as the input to convert.

This is a temporary file internal to the program and we do not care what the filename is. So we do not define a mapping (i.e. we do not have anything in angle brackets after the declaration). This means that swift will choose a unique filename automatically and map slice to that file.

imagefile slicejpeg <"slice.jpeg">;

Now we declare a variable for our output and map it to a filename.

slice = slicer(reference, "-x", ".5");

slicejpeg = convert(slice);

Finally we invoke the two procedures to slice the brain volume and then convert that slice into a jpeg.

ACTION: Place the source above into a file called third.swift and make the other modifications discussed above. Then run the workflow with the swift command, as before.

What output file was created? You can tell from the source code by looking at the line which defines the slicejpeg variable.

Running on another site

So far everything has been run on the local training machines. Swift can run jobs remotely. It will handle both the transfer of files to and from remote storage, and the execution of jobs.

We will run the first example program (flipper.swift) again, but this time on a remote site located in Chicago.

First clear away the output from the first program:

$ rm output.jpg
$ ls output.jpg
ls: output.jpg: No such file or directory

Now we must tell Swift about the other site. This is done through another catalog file, the site catalog.

The site catalog is found in sites.xml

Open sites.xml. There is one entry in there in XML defining the local site. Because this is the only site defined, all execution will happen locally.

The instructors have prepared another site catalog, containing all the details necessary for running on the remote resource. You can find this new site catalog in /sw/workflow/sites-chicago.xml.

ACTION: Copy /sw/workflow/sites-chicago.xml to your home directory and look inside. See how it differs from sites.xml.

In addition to telling swift about the other site in the sites file, we need to tell Swift where to find transformations on the new site.

ACTION: Edit the transformation catalog and add a line to tell Swift where it can find convert on the remote site. Conveniently, in this case convert has the same path when running locally as when running on the Chicago site. Here is the line to add: (don't forget to use tabs)

chicago  convert  /usr/bin/convert   INSTALLED   INTEL32::LINUX  null

Note the different between this line and the existing convert definition in the file. All fields are the same except for the first column, which is the site column. We say 'chicago' here instead of 'localhost'. This matches up with the site name 'chicago' defined in the new site catalog, and identifies the name of the remote site.

Now use the same swift command as before, but with an extra parameter to tell swift to use a different sites file:

$ swift -sites.file ./sites-chicago.xml flipper.swift

If this runs successfully, you should now have an output.jpg file with a flipped picture in it. It should look exactly the same as when run locally. You have used the same program to produce the same output, but used a remote resource to do it.

A final big workflow example

Now we'll make a bigger workflow that will execute 15 jobs to do more complicated processing using more input files.

As before, here is the entire program listing. Afterwards, we will go through the listing step by step.

type voxelfile;
type headerfile;

type pgmfile;
type imagefile;

type warpfile;

type volume {
    voxelfile img;
    headerfile hdr;
};

(warpfile warp) align_warp(volume reference, volume subject, string model, string quick) {
    app {
        align_warp @reference.img @subject.img @warp "-m " model quick;
    }
}

(volume sliced) reslice(warpfile warp, volume subject)
{
    app {
        reslice @warp @sliced.img;
    }
}

(volume sliced) align_and_reslice(volume reference, volume subject, string model, string quick) {
    warpfile warp;
    warp = align_warp(reference, subject, model, quick);
    sliced = reslice(warp, subject);
}


(volume atlas) softmean(volume sliced[])
{
    app {
        softmean @atlas.img "y" "null" @filenames(sliced[*].img);
    }
}


(pgmfile outslice) slicer(volume input, string axis, string position)
{
    app {
        slicer @input.img axis position @outslice;
    }
}

(imagefile outimg) convert(pgmfile inpgm)
{
    app {
        convert @inpgm @outimg;
    }
}

(imagefile outimg) slice_to_jpeg(volume inp, string axis, string position)
{
    pgmfile outslice;
    outslice = slicer(inp, axis, position);
    outimg = convert(outslice);
}

(volume s[]) all_align_reslices(volume reference, volume subjects[]) {

    foreach subject, i in subjects {
        s[i] = align_and_reslice(reference, subjects[i], "12", "-q");
    }

}


volume reference <simple_mapper;location="Raw/",prefix="reference.">;

volume subjects[] <simple_mapper;location="Raw/",prefix="anatomy">;

volume slices[] <simple_mapper;prefix="slice.">;

slices = all_align_reslices(reference, subjects);

volume atlas <simple_mapper;prefix="atlas.">;
atlas = softmean(slices);

string directions[] = [ "x", "y", "z"];

foreach direction in directions {
    imagefile o <single_file_mapper;file=@strcat("atlas-",direction,".jpeg")>;
    string option = @strcat("-",direction);
    o = slice_to_jpeg(atlas, option, ".5");
}

As before, there are some other changes to make to the environment in addition to running the program. These are discussed inline below.

type voxelfile;
type headerfile;

type pgmfile;
type imagefile;

type warpfile;

We keep all of the previous simple types and add a definition for for a new kind of intermediate file - warpfile - which will be used by some new applications that we will invoke.

type volume {
    voxelfile img;
    headerfile hdr;
};

The same complex type as before, a volume consisting of a pair of files - the voxel data and the header data.

(warpfile warp) align_warp(volume reference, volume subject, string model, string quick) {
    app {
        align_warp @reference.img @subject.img @warp "-m " model quick;
    }
}

Now we define a new transformation called align_warp. We haven't used align_warp before, so we need to add in a transformation catalog entry for it. We will be adding some other transformations too, so add those entries now too.

ACTION: Edit the transformation catalog (like in the third exercise). Add entries for the following transformations. The table below lists the path. You must write the appropriate syntax for transformation catalog entries yourself, using the existing entries as examples.

Here is the list of transformations to add:

align_warp (the path is /sw/workflow/app/AIR/bin/align_warp)
reslice   (the path is /sw/workflow/app/AIR/bin/reslice)
softmean  (the path is /sw/workflow/app/softmean-swift)

These programs come from several software packages: the AIR (Automated Image Registration) suite and FSL.

Make sure you have added three entries to the transformation catalog, listing the above three transformations and the appropriate path


(volume sliced) reslice(warpfile warp, volume subject)
{
    app {
        reslice @warp @sliced.img;
    }
}

This adds another transformation, called reslice. We already added the transformation catalog entry for this, in the previous step.


(volume sliced) align_and_reslice(volume reference, volume subject, string model, string quick) {
    warpfile warp;
    warp = align_warp(reference, subject, model, quick);
    sliced = reslice(warp, subject);
}

This is a new kind of procedure, called a compound procedure. A compound procedure does not call applications directly. Instead it calls other procedures, connecting them together with variables. This procedure above calls align_warp and then reslice.


(volume atlas) softmean(volume sliced[])
{
    app {
        softmean @atlas.img "y" "null" @filenames(sliced[*].img);
    }
}

Yet another application procedure. Again, we added the transformation catalog entry for this above. Note the special @filenames ... [*] syntax.


(pgmfile outslice) slicer(volume input, string axis, string position)
{
    app {
        slicer @input.img axis position @outslice;
    }
}

(imagefile outimg) convert(pgmfile inpgm)
{
    app {
        convert @inpgm @outimg;
    }
}

These are two more straightforward application transforms


(imagefile outimg) slice_to_jpeg(volume inp, string axis, string position)
{
    pgmfile outslice;
    outslice = slicer(inp, axis, position);
    outimg = convert(outslice);
}

(volume s[]) all_align_reslices(volume reference, volume subjects[]) {

    foreach subject, i in subjects {
        s[i] = align_and_reslice(reference, subjects[i], "12", "-q");
    }

}

slice_to_jpeg and all_align_reslices are compound procedures. They call other procedures, like align_and_reslice did above. Note how all_align_reslices uses foreach to run the same procedure on each element in an array.

volume reference <simple_mapper;location="Raw/",prefix="reference.">;

The same mapping we used in the previous exercise to map a pair of reference files into the reference variable using a complex type.

volume subjects[] <simple_mapper;location="Raw/",prefix="anatomy">;

Now we map a number of subject images into the subjects array.

ACTION REQUIRED: Copy the subjects data files into your working directory, like this:

$ cp /sw/workflow/data/anatomy* Raw/
$ ls Raw/
anatomy1.hdr  anatomy2.hdr  anatomy3.hdr  anatomy4.hdr  reference.hdr
anatomy1.img  anatomy2.img  anatomy3.img  anatomy4.img  reference.img
volume slices[] <simple_mapper;prefix="slice.">;

Slices will hold intermediate volumes that have been processed by some of our tools. We need to map to tell swift where to put these intermediate files.

We use the simple mapper, which will construct filenames from the structure, using slice. as a prefix.

slices = all_align_reslices(reference, subjects);

volume atlas <simple_mapper;prefix="atlas.">;
atlas = softmean(slices);

string directions[] = [ "x", "y", "z"];

foreach direction in directions {
    imagefile o <single_file_mapper;file=@strcat("atlas-",direction,".jpeg")>;
    string option = @strcat("-",direction);
    o = slice_to_jpeg(atlas, option, ".5");
}

Finally we make a number of actual procedure invocations (and declare a few more variables). The ultimate output of our workflow comes from the o variable inside the foreach loop. This is mapped to a different filename in each iteration, similar to exercise two.

ACTION: Put the workflow into a file called final.swift, and then run the workflow with the swift command. Then open the resulting files - atlas-x.jpeg, atlas-y.jpeg and atlas-z.jpeg.

You should see three brain images, along three different axes.

About these notes

These notes were produced from the Open Science Grid Education, Outreach and Training group SVN repository, at this location and revision:

Path: .
URL: https://svn.ci.uchicago.edu/svn/osgedu/schools/2008/mardisgras
Repository Root: https://svn.ci.uchicago.edu/svn/osgedu
Repository UUID: b4a0e4a1-be33-0410-93ba-8605a86001b8
Revision: 339
Node Kind: directory
Schedule: normal
Last Changed Author: benc@CI.UCHICAGO.EDU
Last Changed Rev: 292
Last Changed Date: 2008-01-31 10:42:15 -0600 (Thu, 31 Jan 2008)