Worksheet 1: File locations and pre-processing





The following exercises demonstrate some of the tools available for data analysis, and how to prepare PRECIS output for analysis. This can be time consuming for large amounts of data, so in this worksheet a small subset is used to demonstrate the steps involved. In the worksheets that follow, data that has already been processed will be used.

PRECIS output files are in PP format, a Met Office binary data format. This worksheet converts data to NetCDF format (the standard data format in climate science) in order that it can be used in post processing packages such as CDO or visualised by NASA Panoply.

Contents


1.1

Data locations and file names

1.2

Basic visualization

1.3

Remove the rim from PP fields

1.4

Select variables and convert PP files to NetCDF


Note: Please ignore the % sign when typing the commands.

1.1 Data locations and file names


Identify and list the names of PRECIS output data in PP format using standard Linux commands.

The dataset used here is a three year subset (1981-1983) of monthly PRECIS data over the CORDEX Africa domain, driven by the HadCMQ0 GCM.

1. a) Firstly set the name of the file path where the data is located. Change this for your computer.

Remember, the % sign is used to indicate the command line prompt. Yours may be $. Do not type % .

Remember also, Linux is case sensitive. DATA is different from Data or data or dAtA, etc.

% DATA=$LOCALDATA/WS

1. b) Move to the directory (i.e. folder) called $DATA/ppfiles/akyjy. This directory contains monthly output data from the experiment with RUNID akyjy (RUNID is the unique identifier for PRECIS experiments).

cd stands for 'change directory' and pwd stands for 'print working directory'. If you are not sure where in your directory tree you are, pwd will tell you.

% cd $DATA/ppfiles/akyjy
% pwd

1. c) List the contents of this directory; ls stands for 'list' and using the -l option gives a 'longer' listing with more information, such as file and size and modification date.

% ls
% ls -l

2. a) List all the files containing data for September.

% ls *sep*
% ls

How many files contain data for September?


_______________________________________________________

Note: The character * translates as 'any characters'

2. b) List all the files containing data from 1982 (i.e. all files which begin akyjya.pmi2.)

% ls akyjya.pmi2???.pp

Note: The character ? translates as 'any single character'

3. ) Move up two levels in the directory tree and list the directories.

% cd ..
% cd ..
% pwd
% ls

The directories daily and monthly contain data used in the worksheets which follow this one.

1.2 Quick Visualization


xconv -i is a utility which allows PP and NetCDF fields to be visualized very quickly. It is very easy to use.

1. ) View the contents of the monthly PRECIS file for July 1983 (i3jul.) What variables does it contain? What are the latitude and longitude dimensions (in number of grid boxes)?

Note: Anything after the character # is just a comment and does not affect the command being run. Do not type # or anything after it.

% cd $DATA/ppfiles/akyjy
% xconv -i akyjya.pmi3jul.pp #monthly file


Variables (Field Title):

__________________________________________________

__________________________________________________

Latitude (ny) and longitude (nx) dimensions:

__________________________________________________

__________________________________________________



1.3 Remove the rim from PP fields

The outer eight grid boxes of all raw PRECIS output data are not suitable for analysis and must be excluded. This Rim area is where the input data from the driving GCM is gradually interpolated to the the resolution of the regional model.

1a. ) Remove the 8-point rim from all data in the $DATA/ppfiles/akyjy directory (with the option of automatically deleting the original full-sized files).

pprr removes a rim from PP fields. Type pprr -helpx for its full functionality.

Note: The –X option to pprr allows files with particular names to be ignored.

% cd $DATA/ppfiles
% pprr -r 8 -X ".rr8.pp" akyjy

1b. ) What are the latitude(ny) and longitude (nx) dimensions (in number of grid boxes) of July 1983 now?

% ls akyjy/*
% xconv -i akyjy/akyjya.pmi3jul.rr8.pp

Latitude and longitude dimensions:

__________________________________________________


Note: In all further exercises, it is assumed that the 8-point rim has been removed from all of the data being used and that the original, full-sized fields are no longer present.

1.4 Select variables and convert PP files to NetCDF

The monthly data we are using has multiple variables in each file, we can use a PRECIS tool to separate the variables. The rest of the worksheets use CDO, a post processing package which needs files to be in NetCDF format. This means it is necessary for us to convert our PP files into NetCDF format.

1a. ) Separate the variables in the all of the monthly files into separate directories.

ppss splits the PP fields in input PP files into subdirectories based on each field's STASH code and processing code. Type ppss -helpx for its full functionality.

% cd $DATA/ppfiles/akyjy
% ppss akyjya.pm?????.rr8.pp

1b. ) Change into the directory containing temperature at 1.5m files and view the December 1981 file.

% stash 03236
% ls
% cd 03236
% xconv -i akyjya.pmi1dec.rr8.03236.pp

Note: 03236 is the STASH code of temperature at 1.5m. What is the STASH code of precipitation?

2. ) Put the December monthly mean temperature files into a single file. This process is necessary for later CDO commands to function properly.

% cat akyjya.pm??dec.rr8.03236.pp > akyjya.pm.i1i3.dec.rr8.03236.pp
% ls -lrt

Note: the cat command concatenates multiple files together.

3. ) Convert the 1981-1983 December monthly mean temperature files to NetCDF format so that we may carry out analysis with CDO.

cfa converts a PP file to NetCDF format (a standard format in climate science), see http://cfpython.bitbucket.org for further details.

% /project/precis/hadts/pputilities/cfa akyjya.pm.i1i3.dec.rr8.03236.pp
% xconv -i akyjya.pm.i1i3.dec.rr8.03236.nc