# Read ODB files with Python

**What is ODB?**
ODB (Observation DataBase) is a file-based database-like system developed at ECMWF to store and retrieve large volumes of meteorological observational and feedback data efficiently for use within the IFS.

Currently, ODB files come in two flavours:

- ODB-1 (the original hierarchical table format capable of running in a parallel environment within IFS)
- ODB-2 (a flat format with a modern API used for archiving in MARS).

Data from ODB can be extracted using the ODB/SQL query language, which is generally a small subset of SQL with some useful extensions.

more information on ODB: [Metview - ODB](https://confluence.ecmwf.int/display/METV/ODB+Overview)

reading with Python: [PyODC](https://pyodc.readthedocs.io/en/latest/)

Library for reading ODB: [ODC](https://odc.readthedocs.io/en/latest/)

## Using Python to read ODB files

`pip install --user pyodc`

there are two interfaces one is slow (pyodc), one is fast (codc), but requires the odc library to be installed.

In [3]:
!module av --no-pager odc

-------- [1;94m/home/swd/spack/share/spack/modules/linux-rhel8-skylake_avx512[0m --------
[1modc[22m/1.4.5-gcc-8.5.0  


In [6]:
!module show --no-pager odc

-------------------------------------------------------------------
[1m/home/swd/spack/share/spack/modules/linux-rhel8-skylake_avx512/odc/1.4.5-gcc-8.5.0[22m:

[92mmodule-whatis[0m	{ECMWF encoding and decoding of observational data in ODB2 format.}
[92mmodule[0m		load eckit/1.24.4-gcc-8.5.0
[92mconflict[0m	odc
[92mprepend-path[0m	--delim : LIBRARY_PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/lib64
[92mprepend-path[0m	--delim : LD_LIBRARY_PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/lib64
[92mprepend-path[0m	--delim : CPATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/include
[92mprepend-path[0m	--delim : INCLUDE /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/include
[92mprepend-path[0m	--delim : PATH /home/swd/spack/opt/

we need to set the environment variable `ODC_DIR` to the prefix, so that the library can be found and codc can be used.

In [1]:
%env ODC_DIR=/home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny

env: ODC_DIR=/home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny


install the package

In [9]:
!pip install --user pyodc

Collecting pyodc
  Downloading pyodc-1.3.0.tar.gz (28 kB)
  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: pyodc
  Building wheel for pyodc (pyproject.toml) ... [?25ldone
[?25h  Created wheel for pyodc: filename=pyodc-1.3.0-py3-none-any.whl size=29866 sha256=a9e092f59bbe53178efcf5a95a88e4682068f9bb36e4113b529d6615fc9ce488
  Stored in directory: /mnt/users/staff/mblaschek/.cache/pip/wheels/9a/08/f0/7fde07980857fb4bec365d72c929d91d7a512c903ae6847e1c
Successfully built pyodc
Installing collected packages: pyodc
Successfully installed pyodc-1.3.0


In [2]:
# import
import pyodc
import codc

reading an example file of 190MB from an Aeolus experiment.

In [3]:
%%time
df_decoded = codc.read_odb('../scratch/data/Aeolus/test20201201.odb', single=True)

CPU times: user 3.75 s, sys: 1.2 s, total: 4.95 s
Wall time: 3.28 s


In [4]:
display(df_decoded)

Unnamed: 0,type,class,stream,andate,antime,reportype,restricted@hdr,enda_member@desc,numtsl@desc,timeslot@timeslot_index,...,arg_lat@sat,t_ref@aeolus_l2b,p_ref@aeolus_l2b,beta@aeolus_l2b,dhlos_dt@aeolus_l2b,dhlos_dp@aeolus_l2b,dhlos_dbeta@aeolus_l2b,horiz_length@aeolus_l2b,vert_length@aeolus_l2b,expver
0,263,2,1247,20201201,0,45001,0,0,25,1,...,5.679873,,,,,,,11243.0,1010.0,hls0
1,263,2,1247,20201201,0,45001,0,0,25,1,...,5.679873,,,,,,,11243.0,1261.0,hls0
2,263,2,1247,20201201,0,45001,0,0,25,1,...,5.679873,,,,,,,14053.0,1009.0,hls0
3,263,2,1247,20201201,0,45001,0,0,25,1,...,5.679873,,,,,,,14053.0,757.0,hls0
4,263,2,1247,20201201,0,45001,0,0,25,1,...,5.679873,,,,,,,8432.0,757.0,hls0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1513516,263,2,1247,20201201,120000,45001,0,0,25,24,...,2.268997,,,,,,,5618.0,1008.0,hls0
1513517,263,2,1247,20201201,120000,45001,0,0,25,24,...,2.268997,,,,,,,2809.0,1008.0,hls0
1513518,263,2,1247,20201201,120000,45001,0,0,25,24,...,2.268997,,,,,,,2809.0,1007.0,hls0
1513519,263,2,1247,20201201,120000,45001,0,0,25,24,...,2.267844,,,,,,,8426.0,504.0,hls0


this is the pure python version of reading odb files

In [5]:
%%time
df_decoded = pyodc.read_odb('../scratch/data/Aeolus/test20201201.odb', single=True)

CPU times: user 1min 4s, sys: 1.79 s, total: 1min 5s
Wall time: 1min 5s


## using ODC

odc is also a [command line tool](https://odc.readthedocs.io/en/latest/content/tools.html) to query an odb file or create a subset.
You need to load the module and then you can execute a command with odc syntax.


In [7]:
%%bash
# just load the module in this cell.
module load odc
# show help
odc help

Loading odc/1.4.5-gcc-8.5.0
  Loading requirement: eckit/1.24.4-gcc-8.5.0


compare:	Compares two ODB files
Usage:
	compare [-excludeColumns <list-of-columns>] [-excludeColumnsTypes <list-of-columns>] [-dontCheckMissing] <file1.odb> <file2.odb>

count:	Counts number of rows in files
Usage:
	count <file.odb>

header:	Shows header(s) and metadata(s) of file
Usage:
	header [-offsets] [-ddl] [-table <table-name-in-the-generated-ddl>] <file-name>

import:	Imports data from a text file
Usage:
	import	[-d delimiter] <input.file> <output.file>

	delimiter can be a single character (e.g.: ',') or TAB. As a data example:

	col1:INTEGER,col2:REAL
	1,2.0
	3,4.0


index:	Creates index of reports for a given file
Usage:
	index <file.odb> [<file.odb.idx>] 

	Specifically the index file is an ODB file with (INTEGER) columns: block_begin, block_length, seqno, n_rows
	One entry is made for each unique seqno - block pair within the source ODB file.


ls:	Shows file's contents
Usage:
	ls [-o <output-file>] <file-name>



mdset:	Creates a new file resetting types or values (consta

In [8]:
%%bash
module load odc
# select only analysis time 0
odc sql 'select * where antime=0' -i ../scratch/data/Aeolus/test20201201.odb -f ascii | head

Loading odc/1.4.5-gcc-8.5.0
  Loading requirement: eckit/1.24.4-gcc-8.5.0


          type	expver    	         class	        stream	        andate	        antime	     reportype	restricted@hdr	enda_member@desc	   numtsl@desc	timeslot@timeslot_index	     seqno@hdr	  bufrtype@hdr	   subtype@hdr	   groupid@hdr	   obstype@hdr	  codetype@hdr	    sensor@hdr	      date@hdr	      time@hdr	   rdbdate@hdr	   rdbtime@hdr	report_status@hdr	report_event1@hdr	             report_rdbflag@hdr	       lat@hdr	       lon@hdr	   lsm@modsurf	seaice@modsurf	  entryno@body	 obsvalue@body	    varno@body	vertco_type@body	vertco_reference_1@body	            datum_anflag@body	datum_status@body	            datum_event1@body	      datum_rdbflag@body	 biascorr@body	biascorr_fg@body	   qc_pge@body	 an_depar@body	 fg_depar@body	obs_error@errstat	final_obs_error@errstat	fg_error@errstat	eda_spread@errstat	   azimuth@sat	  retrtype@hdr	    zenith@sat	     range@sat	   arg_lat@sat	t_ref@aeolus_l2b	p_ref@aeolus_l2b	beta@aeolus_l2b	dhlos_dt@aeolus_l2b	dhlos_dp@aeolus_l2b	dhlos_dbeta@aeolus_l2b	hor