Read ODB files with Python¶
What is ODB? ODB (Observation DataBase) is a file-based database-like system developed at ECMWF to store and retrieve large volumes of meteorological observational and feedback data efficiently for use within the IFS.
Currently, ODB files come in two flavours:
- ODB-1 (the original hierarchical table format capable of running in a parallel environment within IFS)
- ODB-2 (a flat format with a modern API used for archiving in MARS).
Data from ODB can be extracted using the ODB/SQL query language, which is generally a small subset of SQL with some useful extensions.
more information on ODB: Metview - ODB
reading with Python: PyODC
Library for reading ODB: ODC
Using Python to read ODB files¶
pip install --user pyodc
there are two interfaces one is slow (pyodc), one is fast (codc), but requires the odc library to be installed.
!module av --no-pager odc
-------- /home/swd/spack/share/spack/modules/linux-rhel8-skylake_avx512 -------- odc/1.4.5-gcc-8.5.0
!module show --no-pager odc
------------------------------------------------------------------- /home/swd/spack/share/spack/modules/linux-rhel8-skylake_avx512/odc/1.4.5-gcc-8.5.0: module-whatis {ECMWF encoding and decoding of observational data in ODB2 format.} module load eckit/1.24.4-gcc-8.5.0 conflict odc prepend-path --delim : LIBRARY_PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/lib64 prepend-path --delim : LD_LIBRARY_PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/lib64 prepend-path --delim : CPATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/include prepend-path --delim : INCLUDE /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/include prepend-path --delim : PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/bin prepend-path --delim : PKG_CONFIG_PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/lib64/pkgconfig prepend-path --delim : CMAKE_PREFIX_PATH /home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny/. -------------------------------------------------------------------
we need to set the environment variable ODC_DIR
to the prefix, so that the library can be found and codc can be used.
%env ODC_DIR=/home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny
env: ODC_DIR=/home/swd/spack/opt/spack/linux-rhel8-skylake_avx512/gcc-8.5.0/odc-1.4.5-2jkj7xe2uu672npnmxjiw2z7q5gvqvny
install the package
!pip install --user pyodc
Collecting pyodc Downloading pyodc-1.3.0.tar.gz (28 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Requirement already satisfied: pandas in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from pyodc) (1.5.3) Requirement already satisfied: cffi in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from pyodc) (1.15.1) Requirement already satisfied: pycparser in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from cffi->pyodc) (2.21) Requirement already satisfied: pytz>=2020.1 in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from pandas->pyodc) (2022.7.1) Requirement already satisfied: python-dateutil>=2.8.1 in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from pandas->pyodc) (2.8.2) Requirement already satisfied: numpy>=1.21.0 in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from pandas->pyodc) (1.23.5) Requirement already satisfied: six>=1.5 in /home/swd/manual/nwp/2023.1/lib/python3.10/site-packages (from python-dateutil>=2.8.1->pandas->pyodc) (1.16.0) Building wheels for collected packages: pyodc Building wheel for pyodc (pyproject.toml) ... done Created wheel for pyodc: filename=pyodc-1.3.0-py3-none-any.whl size=29866 sha256=a9e092f59bbe53178efcf5a95a88e4682068f9bb36e4113b529d6615fc9ce488 Stored in directory: /mnt/users/staff/mblaschek/.cache/pip/wheels/9a/08/f0/7fde07980857fb4bec365d72c929d91d7a512c903ae6847e1c Successfully built pyodc Installing collected packages: pyodc Successfully installed pyodc-1.3.0
# import
import pyodc
import codc
reading an example file of 190MB from an Aeolus experiment.
%%time
df_decoded = codc.read_odb('../scratch/data/Aeolus/test20201201.odb', single=True)
CPU times: user 3.75 s, sys: 1.2 s, total: 4.95 s Wall time: 3.28 s
display(df_decoded)
type | class | stream | andate | antime | reportype | restricted@hdr | enda_member@desc | numtsl@desc | timeslot@timeslot_index | ... | arg_lat@sat | t_ref@aeolus_l2b | p_ref@aeolus_l2b | beta@aeolus_l2b | dhlos_dt@aeolus_l2b | dhlos_dp@aeolus_l2b | dhlos_dbeta@aeolus_l2b | horiz_length@aeolus_l2b | vert_length@aeolus_l2b | expver | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 263 | 2 | 1247 | 20201201 | 0 | 45001 | 0 | 0 | 25 | 1 | ... | 5.679873 | NaN | NaN | NaN | NaN | NaN | NaN | 11243.0 | 1010.0 | hls0 |
1 | 263 | 2 | 1247 | 20201201 | 0 | 45001 | 0 | 0 | 25 | 1 | ... | 5.679873 | NaN | NaN | NaN | NaN | NaN | NaN | 11243.0 | 1261.0 | hls0 |
2 | 263 | 2 | 1247 | 20201201 | 0 | 45001 | 0 | 0 | 25 | 1 | ... | 5.679873 | NaN | NaN | NaN | NaN | NaN | NaN | 14053.0 | 1009.0 | hls0 |
3 | 263 | 2 | 1247 | 20201201 | 0 | 45001 | 0 | 0 | 25 | 1 | ... | 5.679873 | NaN | NaN | NaN | NaN | NaN | NaN | 14053.0 | 757.0 | hls0 |
4 | 263 | 2 | 1247 | 20201201 | 0 | 45001 | 0 | 0 | 25 | 1 | ... | 5.679873 | NaN | NaN | NaN | NaN | NaN | NaN | 8432.0 | 757.0 | hls0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1513516 | 263 | 2 | 1247 | 20201201 | 120000 | 45001 | 0 | 0 | 25 | 24 | ... | 2.268997 | NaN | NaN | NaN | NaN | NaN | NaN | 5618.0 | 1008.0 | hls0 |
1513517 | 263 | 2 | 1247 | 20201201 | 120000 | 45001 | 0 | 0 | 25 | 24 | ... | 2.268997 | NaN | NaN | NaN | NaN | NaN | NaN | 2809.0 | 1008.0 | hls0 |
1513518 | 263 | 2 | 1247 | 20201201 | 120000 | 45001 | 0 | 0 | 25 | 24 | ... | 2.268997 | NaN | NaN | NaN | NaN | NaN | NaN | 2809.0 | 1007.0 | hls0 |
1513519 | 263 | 2 | 1247 | 20201201 | 120000 | 45001 | 0 | 0 | 25 | 24 | ... | 2.267844 | NaN | NaN | NaN | NaN | NaN | NaN | 8426.0 | 504.0 | hls0 |
1513520 | 263 | 2 | 1247 | 20201201 | 120000 | 45001 | 0 | 0 | 25 | 24 | ... | 2.267844 | NaN | NaN | NaN | NaN | NaN | NaN | 14044.0 | 503.0 | hls0 |
1513521 rows × 61 columns
this is the pure python version of reading odb files
%%time
df_decoded = pyodc.read_odb('../scratch/data/Aeolus/test20201201.odb', single=True)
CPU times: user 1min 4s, sys: 1.79 s, total: 1min 5s Wall time: 1min 5s
using ODC¶
odc is also a command line tool to query an odb file or create a subset. You need to load the module and then you can execute a command with odc syntax.
%%bash
# just load the module in this cell.
module load odc
# show help
odc help
Loading odc/1.4.5-gcc-8.5.0 Loading requirement: eckit/1.24.4-gcc-8.5.0
compare: Compares two ODB files Usage: compare [-excludeColumns <list-of-columns>] [-excludeColumnsTypes <list-of-columns>] [-dontCheckMissing] <file1.odb> <file2.odb> count: Counts number of rows in files Usage: count <file.odb> header: Shows header(s) and metadata(s) of file Usage: header [-offsets] [-ddl] [-table <table-name-in-the-generated-ddl>] <file-name> import: Imports data from a text file Usage: import [-d delimiter] <input.file> <output.file> delimiter can be a single character (e.g.: ',') or TAB. As a data example: col1:INTEGER,col2:REAL 1,2.0 3,4.0 index: Creates index of reports for a given file Usage: index <file.odb> [<file.odb.idx>] Specifically the index file is an ODB file with (INTEGER) columns: block_begin, block_length, seqno, n_rows One entry is made for each unique seqno - block pair within the source ODB file. ls: Shows file's contents Usage: ls [-o <output-file>] <file-name> mdset: Creates a new file resetting types or values (constants only) of columns. Usage: mdset <update-list> <input.odb> <output.odb> <update-list> is a comma separated list of expressions of the form: <column-name> : <type> = <value> <type> can be one of: integer, real, double, string. If ommited, the existing type of the column will not be changed. Both type and value are optional; at least one of the two should be present. For example: odb mdset "expver=' 0008'" input.odb patched.odb merge: Merges rows from files Usage: merge -o <output-file.odb> <input1.odb> <input2.odb> ... or merge -S -o <output-file.odb> <input1.odb> <sql-select1> <input2.odb> <sql-select2> ... set: Creates a new file setting columns to given values Usage: set <update-list> <input.odb> <output.odb> split: Splits file according to given template Usage: split [-no_verification] [-maxopenfiles <N>] <input.odb> <output_template.odb> sql: Executes SQL statement Usage: sql <select-statement> | <script-filename> [-T] Disables printing of column names [-offset <offset>] Start processing file at a given offset [-length <length>] Process only given bytes of data [-N] Do not write NULLs, but proper missing data values [-i <inputfile>] ODB input file [-o <outputfile>] ODB output file [-f default|wide|ascii|odb] ODB output format (odb is binary ODB, ascii and wide are ascii formatted with bitfield definitions in header. Default is ascii on stdout and odb to file) [-delimiter <delim>] Changes the default values' delimiter (TAB by default) delim can be any character or string [--binary|--bin] Print bitfields in binary notation [--no_alignment] Do not align columns [--full_precision] Print with full precision
%%bash
module load odc
# select only analysis time 0
odc sql 'select * where antime=0' -i ../scratch/data/Aeolus/test20201201.odb -f ascii | head
Loading odc/1.4.5-gcc-8.5.0 Loading requirement: eckit/1.24.4-gcc-8.5.0
type expver class stream andate antime reportype restricted@hdr enda_member@desc numtsl@desc timeslot@timeslot_index seqno@hdr bufrtype@hdr subtype@hdr groupid@hdr obstype@hdr codetype@hdr sensor@hdr date@hdr time@hdr rdbdate@hdr rdbtime@hdr report_status@hdr report_event1@hdr report_rdbflag@hdr lat@hdr lon@hdr lsm@modsurf seaice@modsurf entryno@body obsvalue@body varno@body vertco_type@body vertco_reference_1@body datum_anflag@body datum_status@body datum_event1@body datum_rdbflag@body biascorr@body biascorr_fg@body qc_pge@body an_depar@body fg_depar@body obs_error@errstat final_obs_error@errstat fg_error@errstat eda_spread@errstat azimuth@sat retrtype@hdr zenith@sat range@sat arg_lat@sat t_ref@aeolus_l2b p_ref@aeolus_l2b beta@aeolus_l2b dhlos_dt@aeolus_l2b dhlos_dp@aeolus_l2b dhlos_dbeta@aeolus_l2b horiz_length@aeolus_l2b vert_length@aeolus_l2b conf_flag@aeolus_l2b 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 1 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.706610 131.284409 0.000000 0.000000 1 64.730003 187 1 5820.886719 0 12 131584 0 0.000000 0.000000 NULL 63.164753 64.002914 37.153870 37.153870 1.076056 0.976365 1.750565 210065741 0.927293 363476.000000 5.679873 NULL NULL NULL NULL NULL NULL 11243.000000 1010.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 2 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.705090 131.276031 0.000000 0.000000 1 NULL 187 1 7051.859375 0 12 131590 0 0.000000 0.000000 NULL NULL NULL 122.078880 122.078880 1.201507 1.122098 1.750565 210065742 0.927119 364895.000000 5.679873 NULL NULL NULL NULL NULL NULL 11243.000000 1261.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 3 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.703870 131.269333 0.000000 0.000000 1 105.169998 187 1 8552.428711 48 12 131712 0 0.000000 0.000000 NULL 94.133278 95.480774 12.658989 12.658989 1.487877 1.433541 1.750565 210065743 0.927119 366315.000000 5.679873 NULL NULL NULL NULL NULL NULL 14053.000000 1009.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 4 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.702960 131.264297 0.000000 0.000000 1 0.770000 187 1 9995.666016 0 12 131584 0 0.000000 0.000000 NULL -17.179775 -16.547409 9.256155 9.256155 1.853304 1.855463 1.750565 210065744 0.926944 367419.000000 5.679873 NULL NULL NULL NULL NULL NULL 14053.000000 757.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 5 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.702040 131.259262 0.000000 0.000000 1 16.790001 187 1 11442.984375 0 12 131584 0 0.000000 0.000000 NULL -3.046090 -1.749628 5.420001 5.420001 1.909041 1.867477 1.750565 210065745 0.926944 368365.000000 5.679873 NULL NULL NULL NULL NULL NULL 8432.000000 757.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 6 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.701120 131.254227 0.000000 0.000000 1 15.060000 187 1 13047.831055 0 12 131584 0 0.000000 0.000000 NULL -1.049524 0.238602 7.846665 7.846665 1.912062 1.824771 1.750565 210065746 0.926770 369311.000000 5.679873 NULL NULL NULL NULL NULL NULL 2811.000000 757.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 7 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.723190 131.234482 0.000000 0.000000 1 -2.730000 187 1 18914.039062 0 12 131584 0 0.000000 0.000000 NULL -11.962290 -11.183959 24.755920 24.755920 1.733299 1.617335 1.750565 210065747 0.926595 372150.000000 5.679873 NULL NULL NULL NULL NULL NULL 8431.000000 757.000000 1 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 8 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.697460 131.234085 0.000000 0.000000 1 10.090000 187 1 21243.236328 0 12 131584 0 0.000000 0.000000 NULL 1.059619 2.259655 8.975661 8.975661 1.543846 1.432204 1.750565 210065748 0.926421 373096.000000 5.679873 NULL NULL NULL NULL NULL NULL 8432.000000 757.000000 0 263 ' hls0' 2 1247 20201201 0 45001 0 0 25 1 9 23 251 46 15 187 NULL 20201130 210238 20201130 214455 12 2 0 -5.721360 131.224396 0.000000 0.000000 1 25.500000 187 1 23785.876953 0 12 131584 0 0.000000 0.000000 NULL 18.075672 19.681799 11.561797 11.561797 1.464802 1.366932 1.750565 210065749 0.926421 374042.000000 5.679873 NULL NULL NULL NULL NULL NULL 2811.000000 757.000000 1
Created: February 29, 2024