{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Worksheet 1: File locations and pre-processing\n", "\n", "The following exercises demonstrate some of the tools available for data analysis, and how to prepare PRECIS output for analysis. This can be time consuming for large amounts of data, so in this worksheet a small subset is used to demonstrate the steps involved. In the worksheets that follow, data that has already been processed will be used.\n", "\n", "PRECIS output data tables are in PP format, a Met Office binary data format. This worksheet converts data to NetCDF format (a standard format in climate science) in order that it can be used in post processing packages such as Python and the python Library IRIS.\n", "\n", "__Note:__ In the boxes where there is code or where you are asked to type code, click in the box press 'ctrl' + 'enter' to run the code.\n", "\n", "__Note:__ An explanation '!' mark is needed to run commands on the shell, and is noted where needed.\n", "\n", "__Note:__ Anything after the character `#` is just a comment and does not affect the command being run." ] }, { "cell_type": "raw", "metadata": {}, "source": [ "\n", "## 1.1 Data locations and file names\n", "\n", "__Identify and list the names of PRECIS output data in PP format using\n", "standard Linux commands.__\n", "\n", "__The datasets used within these worksheets are made available through the notebook in order to provide quick and easy access for the purpose of this training. However the controls learned in this worksheet provide useful context for future work in a linux and unix scripting environment.__\n", "\n", "__The dataset used here is a thirty year subset of monthly PRECIS data over Southeast Asia driven by the HadCMQ0 GCM.__\n", "\n", "__1. a) Firstly, find out what location you are currently in by using the 'pwd' command; pwd stands for 'print working directory'.__\n", "\n", "Type '!pwd' on the box below (excluding the ' ').\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type !pwd below and press 'ctrl' + 'enter'\n", "%cd /home/precis/share/UK\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__1. b) List the contents of this directory; ls stands for 'list' and using the -l option gives a 'longer' listing with more information, such as file and size and modification date.__\n", "\n", "Type below '!ls' on one line and then the next '!ls -l' and press 'ctrl' + 'enter'.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type !ls on one line and then !ls -l and press 'ctrl' + 'enter'.\n", "!ls\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "__2. a) Move to the directory (i.e. folder) called 'practise_ppfiles/cahpa'. This directory contains monthly output data from experiment with RUNID cahpa; 'cd' stands for 'change directory.__\n", "\n", "Type below '%cd practise_ppfiles/cahpa' and press 'ctrl' + 'enter'. In notebooks '%' is used to execute in commands in the shell.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type %cd practise_ppfiles/cahpa and press 'ctrl' + 'enter'.\n", "%cd /home/precis/share/UK/practise_ppfiles/cahpa" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__2. b) List the contents of this directory; ls stands for 'list' and using the -l option gives a 'longer' listing with more information, such as file and size and modification date.__\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type shell command here not forgetting the ! and press 'ctrl' + 'enter'.\n", "!ls -l\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "__2. c) List all the files containing data for September.__\n", "\n", "Type below '!ls *sep*'\n", "\n", "How many files contain data for September?\n", "\n", "__Note:__ The character * translates as 'any characters'" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type !ls *sep* and press 'ctrl' + 'enter'.\n", "!ls *sep*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__2. d) List all the files containing data from 1982 (i.e. all files which begin cahpaa.pmi2.)__\n", "\n", "Type below '!ls cahpaa.pmi2???.pp'\n", "\n", "Note:The character ? translates as 'any single character'\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type !ls !ls cahpaa.pmi2???.pp and press 'ctrl' + 'enter'.\n", "!ls cahpaa.pmi2???.pp\n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "__3) Move up two levels in the directory tree and list the directories.__\n", "\n", "Type the following commands one at a time, immediately after proceed to execute the command by pressing 'ctrl' + 'enter'.\n", "\n", "%cd ../../\n", "\n", "!pwd\n", "\n", "!ls\n", "\n", "The directories daily and monthly contain data used in the worksheets which follow this one." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Type one command a time and press 'ctrl' + 'enter' each time.\n", "%cd ../../\n", "!ls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.2 Introduction to python and IRIS\n", "\n", "Python is an interpreted, object-oriented, high-level programming language. Python supports modules and packages, which encourages program modularity and code reuse. \n", "\n", "Python is the programming language used herein. We also use the python library IRIS, which was setup and maintained by the Met Office. \"Iris seeks to provide a powerful, easy to use, and community-driven Python library for analysing and visualising meteorological and oceanographic data sets.\" \n", "\n", "For a brief introduction to IRIS and the cube formatting please read this Introduction page here: http://scitools.org.uk/iris/docs/latest/userguide/iris_cubes.html\n", "\n", "For further future reference please refer to the IRIS website: http://scitools.org.uk/iris/docs/latest/index.html\n", "\n", "__An example of loading a file into iris is shown below in 1.3, click in the box and press 'ctrl' + 'enter' to run the code and take a look at the cube printout. Please ask questions about the output.__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.3 Visualistion\n", "[Iris](http://scitools.org.uk/iris/docs/latest/index.html) is a python library that allows visualisation and manipulation of PP and NetCDF files\n", "\n", "Below is an example of visualising the data as a cube, click in the box and press 'ctrl' + 'enter' to run the code.\n", "\n", "Make a note of: \n", "a) How many latitude data points are there? \n", "b) How many longitude data points are there?\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# Ensure you haver iris 1.10 or greater\n", "import iris\n", "\n", "#filename = iris.sample_data_path('air_temp.pp') # This is the 'filename' to load\n", "filename = '/home/precis/share/UK/air_temp.pp'\n", "cube = iris.load_cube(filename) # This loads the pp file into a cube\n", "\n", "print cube # This prints out the cube format\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1.3 Remove the rim from PP fields\n", "The edges (or rim) of RCM outputs are biased due to the linear relaxation used on certain variables to apply the GCM lateral boundary conditions. This\n", "rim of 8 grid points from each edge needs to be excluded from any\n", "analysis.\n", "\n", "__4. ) Remove the 8-point rim from all data in the `practise_ppfiles/cahpa` directory (with the option of automatically deleting the original full-sized files).__\n", "\n", "First of all in the first code box we will change to the 'practice_ppfiles/cahpa' directory, click in the box and press 'ctrl' + 'enter'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# click in the box and press 'ctrl' + 'enter'\n", "%cd practise_ppfiles/cahpa\n" ] }, { "cell_type": "raw", "metadata": {}, "source": [ "Then this next code box removes the 8-point rim, from all data in that directory. Click in the box and press 'ctrl' + 'enter'." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import iris\n", "import glob\n", "import os\n", "\n", "# find all the files from which to remove the rim\n", "flist = glob.glob('*.pp')\n", "\n", "# checks if output directory exists, if not creates a directory\n", "if not os.path.exists('rr8_removed'):\n", " os.mkdir('rr8_removed')\n", "\n", "for fn in flist:\n", " print fn\n", " # This will load all the variables in the file into a CubeList\n", " datacubes = iris.load(fn)\n", " tempcubelist = []\n", " for cube in datacubes:\n", " temprimcubelist = []\n", " # In case the data are more than 2 dimensional, grab a 2d slice\n", " for yx_slice in cube.slices(['grid_latitude','grid_longitude']):\n", " norimcube = yx_slice[8 : -8 , 8 : -8]\n", " temprimcubelist.append(norimcube)\n", " # Merge the individual 2d slices into a single cube\n", " trimmedcube = iris.cube.CubeList(temprimcubelist).merge_cube()\n", " tempcubelist.append(trimmedcube)\n", " # Write out the trimmed data file\n", " iris.save(tempcubelist, 'rr8_removed/' + fn, append=True)\n", " print 'The 8-point rim has been removed for: ' + fn\n", " " ] }, { "cell_type": "raw", "metadata": {}, "source": [ "## 1.4 Select variables and convert PP files to NetCDF\n", "\n", "The monthly data we are using has multiple variables in each file, we can use a Iris to separate the variables and save them as netCDF files.\n", "\n", "In this example you will see a reference to the cubes attributes 'STASH'. STASH codes are used as a storage handling system for all the variables that the PRECIS model and Met Office UM model provides. Each stash code refers to a variable, therefore in this example below: 03236 - air temperature, 16222 - air pressure, 05216 - precipitation. You will notice that the files have been saved with the relevant STASH code in this example. However you could name the files after the variable names, ie temp, precip, surface pressure for ease.\n", "\n", "__5. ) Separate the variables in all of the monthly files into separate directories and save as netCDF files.__\n", "\n", "Click in the box below and press 'ctrl' + 'enter' to run the code.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import iris\n", "import glob\n", "import os\n", "\n", "flist = glob.glob('rr8_removed/' + '*.pp')\n", "\n", "for fn in flist:\n", " # This will load all the variables in the file into a CubeList\n", " datacubes = iris.load(fn)\n", " \n", " for cube in datacubes:\n", " # get the STASH code\n", " cubeSTASH = cube.attributes['STASH'] \n", "\n", " # create a directory based on the STASH code\n", " dirname = str(cubeSTASH.section).zfill(2)+str(cubeSTASH.item).zfill(3)\n", " # checks if directory exists, if not creates a directory\n", " if not os.path.exists(dirname):\n", " os.mkdir(dirname)\n", " \n", " # for saving replace the *.pp file extension with *.nc\n", " outfile = fn.replace('.pp','.' + dirname + '.nc')\n", " \n", " # save the merged data cube\n", " iris.save(cube, dirname + \"/\" + outfile[12:])\n", "\n", "print 'All netCDF files have been saved to relevant stash code directory' \n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "__6. ) For each variable (Temperature, Precipitation, Surface Pressure) put the monthly files into a single cube and save as netCDF file.__\n", "\n", "The monthly files are for the years 1981, 1982, and 1983, hence the file name saved to including 1981_1983.\n", "\n", "Click in the box below and press 'ctrl' + 'enter' to run the code.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [ "import iris\n", "import glob\n", "\n", "stash_codes = ['03236','05216','16222']\n", "\n", "# loop over each directory stash code\n", "for stash in stash_codes:\n", " \n", " # load the file names into the variable flist\n", " flist = glob.glob(stash + '/' + '*.nc')\n", "\n", " cubelist = [] # an empty array to append the monthly cubes to\n", " for fn in flist:\n", " # There is only on cube so using single cube load function\n", " datacube = iris.load_cube(fn)\n", " cubelist.append(datacube)\n", " \n", " # Merge all the cubes in the list created above\n", " mergedcube = iris.cube.CubeList(cubelist).merge_cube()\n", " \n", " # Can save the data direclty in netCDF format\n", " outfile = stash + '/cahpaa.pm.1981_1983.rr8.' + stash + '.nc'\n", " \n", "\n", " iris.save(mergedcube, outfile)\n", " \n", "print 'Each variable has been saved into a single netCDF file in its relevant stash directory.'\n", "print" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/media/sf_share/UK/practise_ppfiles/cahpa/03236\n", "/media/sf_share/UK/practise_ppfiles/cahpa/03236\n", "cahpaa.pm.1981_1983.rr8.03236.nc cahpaa.pmi2jun.03236.nc\n", "cahpaa.pmi1apr.03236.nc\t\t cahpaa.pmi2mar.03236.nc\n", "cahpaa.pmi1aug.03236.nc\t\t cahpaa.pmi2may.03236.nc\n", "cahpaa.pmi1dec.03236.nc\t\t cahpaa.pmi2nov.03236.nc\n", "cahpaa.pmi1feb.03236.nc\t\t cahpaa.pmi2oct.03236.nc\n", "cahpaa.pmi1jan.03236.nc\t\t cahpaa.pmi2sep.03236.nc\n", "cahpaa.pmi1jul.03236.nc\t\t cahpaa.pmi3apr.03236.nc\n", "cahpaa.pmi1jun.03236.nc\t\t cahpaa.pmi3aug.03236.nc\n", "cahpaa.pmi1mar.03236.nc\t\t cahpaa.pmi3dec.03236.nc\n", "cahpaa.pmi1may.03236.nc\t\t cahpaa.pmi3feb.03236.nc\n", "cahpaa.pmi1nov.03236.nc\t\t cahpaa.pmi3jan.03236.nc\n", "cahpaa.pmi1oct.03236.nc\t\t cahpaa.pmi3jul.03236.nc\n", "cahpaa.pmi1sep.03236.nc\t\t cahpaa.pmi3jun.03236.nc\n", "cahpaa.pmi2apr.03236.nc\t\t cahpaa.pmi3mar.03236.nc\n", "cahpaa.pmi2aug.03236.nc\t\t cahpaa.pmi3may.03236.nc\n", "cahpaa.pmi2dec.03236.nc\t\t cahpaa.pmi3nov.03236.nc\n", "cahpaa.pmi2feb.03236.nc\t\t cahpaa.pmi3oct.03236.nc\n", "cahpaa.pmi2jan.03236.nc\t\t cahpaa.pmi3sep.03236.nc\n", "cahpaa.pmi2jul.03236.nc\n" ] } ], "source": [ "%cd 03236\n", "!ls\n" ] } ], "metadata": { "celltoolbar": "Edit Metadata", "kernelspec": { "display_name": "Python 2", "language": "python", "name": "python2" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.13" }, "widgets": { "state": {}, "version": "1.1.2" } }, "nbformat": 4, "nbformat_minor": 2 }