Cheetah installation
Cheetah installation
The latest releases and updates Cheetah are hosted on GitHub:
https://github.com/antonbarty/cheetah
To get a copy:
> git clone https://github.com/antonbarty/cheetah.git
Cheetah is open-source and has been released under the GNU GPL v3 license.
Main branches are:
master = stable version, maybe slightly behind the cutting edge
testing = latest developments, but still under internal testing to make sure improvements haven’t broken anything
An old and infrequently updated version is still found on DESY Stash
2. Dependencies
External libraries
Cheetah requires the following libraries and programs to be installed before cheetah itself can be compiled:
Please make sure these are available before building Cheetah.
If you are compiling on the SLAC machines (eg; psexport.slac.stanford.edu) you can get a copy of cmake this way:
export PATH=~filipe/cmake/bin:${PATH}
If the HDF5 library is not installed in a standard location set the HDF5_ROOT environment variable to point to it, e.g.:
export HDF5_ROOT=${HOME}/local
XTC file reader
In order to read XTC data files produced at SLAC you will also need a copy of the psana front end. This can be obtained from SLAC using the scripts/download_psana.py script.
psana itself is maintained by SLAC and changes from time to time. More information on psana can be found on the Confluence web site at SLAC.
SLAC can translate XTC files into the more portable HDF5 format. Support for reading SLAC-format HDF5 files will be added in the near future. Although slower than reading XTC files, this will relieve the need to port software to read XTC files.
2. Building Cheetah with psana
With all dependencies installed we can start the build:
- Create and go into a build directory:
$ mkdir build
$ cd build
- Run ccmake
$ ccmake ..
- Press "c" to configure.
1. Getting a copy of cheetah
- It's possible that you have to specify ANA_RELEASE manually. It should
point to the ana-current directory, for example on psexport it is
/reg/g/psdm/sw/releases/ana-current/
- You can also specify the CMAKE_INSTALL_PREFIX. I set mine to ~/usr
- If everything went well you should be able to press "g" to generate the Makefiles.
- Now just run make. This will build things and place the result in the
build directory.
$ make
- If you want to install just do.
$ make install
This will place an executable copy of psana in the specified directory ready for use.
3. The psana environment
Reading of XTC files is supported through the use of the psana framework, which is provided and maintained by SLAC. In case of errors with psana make sure to follow the setup instructions on the SLAC web site. The following links may be of use:
•Installing psana at home
Command line script setup
Cheetah is not omniscient - it is necessary to tell Cheetah about the location and format of data files, detector calibration, and what analysis to perform. Meanwhile, the need to use third-party frameworks for reading data files in XTC format complicates the setup procedure as Cheetah has work within workflow constraints and computing environments imposed by others.
The easiest way to hide all of this mess is to use scripts to coordinate the data analysis workflow. Several such scripts can be found in the examples folder. This page will guide you through the key steps in setting up these scripts for an experiment.
1. What the scripts do
The purpose of the workflow scripts is to hide as much execution complexity as possible so that analysis can be started using the single command:
> process <run> <configuration.ini>
Achieving this simplicity of course requires a little bit of setup work, which we describe below.
2. Finding the workflow scripts
Have a look in the examples folder of the cheetah distribution, where you will find examples of the Cheetah analysis workflow. These scripts need to be set up once per experiment, so copy these scripts to somewhere related to your analysis project where they can be edited and retained for later use (for example: a scratch or data directory).
This page describes the analysis-pipeline-anton, although analysis-pipeline-rick is very similar and uses some of the same scripts. The choice is a matter of preference.
Now try executing the process script on some test run. You should see something like this:
3.Setting up the workflow scripts
In order to automate execution and hide as much complexity as possible, Cheetah needs to know the following information:
•Location of the XTC data files
•Where to put the output data files
•Where your copy of Cheetah is installed
•Where to find the Cheetah and psana configuration files
•Which batch queue jobs should be submitted to (if any)
3.1 The process script
Locate the script called ‘process’. This script specifies important things such as data locations and calls the more complicated hitfinder script. The good news is that once everything is working, this is often the only script which needs modifying when changing from one experiment to another.
Edit the following configuration variables to match your experiment:
•XTCDIR points to the data directory containing XTC data files
•H5DIR points to the directory into which Cheetah output will be saved (make sure you have write access and plenty of available space - terabytes of output are not uncommon!)
•CONFIGDIR points to the folder containing your process script. Specifying this as a full path enables this script to be called from other directories.
•PSANA points to the location of the psana executable (usually just psana, but can be a fully expanded path). Use ‘which psana’ to find out which version will be called by default. If in doubt specify a complete path.
The HITFINDER and PSANA_CONFIG fields can probably be left alone - they are there in case your cheetah.ini and psana configuration files are in other locations. Hitfinder itself is a separate script that takes care of finding all XTC files for a given run, creating the data destination directory, putting configuration files in the right location, and calling psana with the appropriate arguments to begin processing. Hitfinder is probably best left alone for now.
Which after some time will probably fall in a heap with an error similar to the one shown here:
Excellent! Getting this error at this point is actually success because it means that all the files are in the right place and Cheetah is being called without error. The hard part is now over and it is time to set up the cheetah.ini file that configures how Cheetah itself runs.
3.2 psana.cfg
At this point we mention that XTC files must be read through psana, thus there is a chance that the psana.cfg file may also need modification. This file specifies the Cheetah module to load for processing, as well as which detectors are read into Cheetah. The detector information in particular may need to be changed for different experiment configurations and different instruments. Hopefully this file does not need to be modified, but now you know it’s there just in case.
3. Configuring Cheetah
It is now time to set up what analysis Cheetah should perform. This is done by editing the cheetah.ini file.
Creating separate ini files to go with different samples is often a good idea as parameters for different samples may be slightly different (see the section on optimisation later on). Using the process script it is very easy to specify different configurations on the command line.
Locate the sample cheetah.ini file in the analysis-pipeline/process directory, create a copy to work with, and open it in your favourite text editor. It should look something like this. All fields are specified in a tag=value format, with ‘#’ at the start of a line denoting a commented out field. The order of parameters is not important. In case of duplicate tags, the later tags overrides earlier values.
Note that certain mistakes in the .ini file cause Cheetah to halt with an error. The most common causes of this are:
•An unknown tag name (usually a typo); or
•A configuration file that can not be found (eg: requested detector geometry or mask can not be found)
These errors are reported to the command line so should be relatively easy to identify.
Assuming we are working with hitfinding nanocrystal data collected using the cspad detector on the CXI instrument at LCLS, here are some key parameters which must be properly set in order for Cheetah to run.
geometry=
Specify the location to a geometry pixelmap file.
There is an example file located in config/geometry/CxiDs1-geometry.h5. Cheetah will run without a geometry file (comment out the line with a #) - but image assembly and anything involving radial averages will not work without knowing the detector geometry.
darkcal=
Specify the location to dark frame calibration of the detector.
There is an example file located in config/darkcal/CxiDs1-darkcal.h5. Cheetah will run without a darkcal file (comment out the line with a #). When using local and persistent background subtraction darkcals can be ignored, as both of these background subtraction methods provide their own offset estimation. However when saving detector raw data a dark calibration is required or else static offsets in detector reading will not be subtracted.
badpixmap=
Specify the location to a pixel map specifying known bad pixels on the detector.
For example, with the cspad data pixels at the edge of each ASIC consistently read unreliable values. Cheetah will run fine without a darkcal file (comment out the line with a #). These pixels are set to zero at the start of analysis.
Once all of this is set up, you should see Cheetah start to run as shown below.
Data should also start to appear in the target directory:
Success!
Now comes the question of optimising the analysis for your particular data set.
4. Using the batch farm
Running Cheetah from the command line is fine for debugging, but rapidly gets tedious once multiple jobs must be processed at once. To distribute the processing load SLAC has a batch processing queue for use with CXI data.
Once everything is debugged, hitfinder can submit jobs to this batch queue using the ‘-q’ option. In the process script, simply comment out the second line and uncomment the last line to switch to batch queue operation.
5. automatic data processing
Data can be processed automatically as soon as it is available on the analysis disk system.
This is coordinated using the autorun.pro script. This script does the following:
•Checks the last XTC file that has finished copying (no .inprogress tag);
•Checks whether Cheetah is already running on that data set
•If not already being processed, starts a hit finder job on the newly available data set
Have a look at the autorun.pro script to figure out operation. This system could probably be refined in the future.
Programming with libCheetah
Cheetah is modular and implemented as a callable library of routines. There is one function which reads custom data formats and passes data on to Cheetah for processing. To implement Cheetah on a data file format that is currently not supported, take a look at the SACLA file reader (as it is much easier to understand than the psana-dependent code required to run at SLAC).
If only certain functionality is required, the underlying routines within Cheetah are, as far as practical, implemented at the lowest level as normal C routines. This makes it comparatively easy to extract individual routines for use elsewhere, should this be useful in any way.