Metadata-Version: 2.4
Name: pyodc
Version: 1.6.0
Summary: A Python interface to odc for encoding/decoding ODB-2 files.
Home-page: https://github.com/ecmwf/pyodc
Author: European Centre for Medium-Range Weather Forecasts (ECMWF)
Author-email: software.support@ecmwf.int
License: Apache License Version 2.0
Keywords: odc odb
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cffi
Requires-Dist: odclib<1.7.0,>=1.6.0
Requires-Dist: findlibs>=0.1.0
Requires-Dist: pandas
Requires-Dist: packaging
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pytest-flakes; extra == "dev"
Dynamic: license-file

# pyodc

[![PyPI](https://img.shields.io/pypi/v/pyodc)](https://pypi.org/project/pyodc/)
[![Build Status](https://img.shields.io/github/workflow/status/ecmwf/pyodc/Continuous%20Integration/develop)](https://github.com/ecmwf/pyodc/actions/workflows/ci.yml)
[![Documentation Status](https://readthedocs.org/projects/pyodc/badge/?version=latest)](https://pyodc.readthedocs.io/en/latest/?badge=latest)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![Licence](https://img.shields.io/github/license/ecmwf/pyodc)](https://github.com/ecmwf/pyodc/blob/develop/LICENSE)

A Python interface to `odc` for encoding/decoding ODB\-2 files.

The package contains two different implementations of the same library:

* `pyodc` is a pure-python encoder and decoder for ODB\-2 data, which encodes data from, and decodes it into pandas data frames
* `codc` is an implementation of the same API as `pyodc` that depends on the ECMWF `odc` library, and comes with _much_ better performance.

Both libraries are be installed by running `pip install pyodc`, and since version 1.6.0, a pre-built wheel version of `odc` will be automatically installed so that `codc` can be used without any additional steps.

[Documentation] [Changelog]

## Dependencies

### Required

* Python 3.x

### Optional

* [odc]
* [pytest]
* [pandoc]
* [Jupyter Notebook]

For `codc` to work, the `odc` library must be compiled and installed on the system and made available to Python. Typically this happens automatically as described above through the dependency on `odclib` which bundles a precompiled version of `odc` as a wheel. If some some reason this doesn't work, there are multiple other ways to make the library visible to pyodc:
* It can be installed as a system library.
* The installation prefix can be passed in the `odc_DIR` or `ODC_DIR` environment variables.
* The library directory can be included in `LD_LIBRARY_PATH.

## Installation

```sh
pip install pyodc
```

Check if the module was installed correctly:

```sh
python
>>> import pyodc as odc # pure python
>>> import codc as odc # faster
```

## Usage

An introductory Jupyter Notebook with helpful usage examples is provided in the root of this repository:

```sh
git clone git@github.com:ecmwf/pyodc.git
cd pyodc
jupyter notebook Introduction.ipynb
```

Note that **codc is not thread safe** so care should be taken when using it with dask. You can set dask to use processses rather than threads by doing:
```
with dask.config.set(scheduler='processes'):
    dask.compute(...)
```

## Development

### Run Unit Tests

To run the unit tests, make sure that the `pytest` module is installed first:

```sh
python -m pytest
```

### Run Unit Tests across multiple python versions with Tox

Tox is a useful tool to quickly run pytest across multiple python versions by managing a set of python environments for you. A tox.ini file is provided that targets python3.8 - 3.12. Note that this will also install older versions of libraries like numpy which helps to catch incompatibilities with older versions of those libraries too.

To run tox, [install it](https://tox.wiki/), modify the `ODC_HOME = ../build` line in tox.ini to point to a build of odc, this will be reused for all the tests. Then run
```sh
tox
```
The first run will take a while for it to install all the environments but after that it's very fast.

### Build Documentation

To build the documentation locally, please install the Python dependencies first:

```sh
cd docs
pip install -r requirements.txt
make html
```

The built HTML documentation will be available under the `docs/_build/html/index.html` path.

## License

This software is licensed under the terms of the Apache Licence Version 2.0 which can be obtained at http://www.apache.org/licenses/LICENSE-2.0.

In applying this licence, ECMWF does not waive the privileges and immunities granted to it by virtue of its status as an intergovernmental organisation nor does it submit to any jurisdiction.

[Documentation]: https://pyodc.readthedocs.io/en/latest/
[Changelog]: ./CHANGELOG.md
[odc]: https://github.com/ecmwf/odc
[pytest]: https://pytest.org
[pandoc]: https://pandoc.org/
[Jupyter Notebook]: https://jupyter.org


# Changelog for pyodc

## 1.6.0

* `pip install pyodc` will now install the C++ backend so `codc` will work immediately.
    * The C++ backed is now installable with pip from `odclib`.
    * Added `findlibs` and `odclib` as dependencies.
    * To force the use of a different `odc` shared library, set the environment variable `ODC_DIR` to the directory containing the shared library. See the [findlibs] documentation for more information.


## 1.5.0

* Add a new LongConstantString codec which permits encoding constant columns where the constant is a string > 8 characters in length.
    * This saves 1 byte per row compared the previous way these columns were encoded.
    * A C++ implementation has been added to ODC at the same time, version 1.6.0
    * Bumped required ODC version to 1.6.0 for feature parity.
    * Decoding data using this codec will work straight away.
    * Encoding data with the new codec is disabled by default and can be enabled with the environment variable "ODC_ENABLE_WRITING_LONG_STRING_CODEC=1".
    * At some point in a future release, encoding will be enabled by default.

* Accept various new datatypes and tighten datatype selection logic (fixes [ODB-559]):
    * Unsigned Integers: uint8 - uint32 (note uint64 is not supported).
    * Signed Integers: int8 - int64.
    * Float32 in addition to float64.
    * Fixed the selection logic for ShortReal2 and ShortReal codecs so the smallest positive normal float32 number `struct.unpack("<f", b"\x00\x00\x80\x00")[0]` can now be used in data.

* Converted to a pyproject.toml based package.

* Fix various warnings:
    * Pandas Deprecation warning about `df.dtypes[0]` needing to become `df.dtypes.iloc[0]`.
    * Pandas Deprecation warning about converting implicitly converting dataframe column dtype.
    * Pandas Future Warning about concatenation with empty or all-NA dataframes.
    * "pkg_resources is deprecated as an API."

## 1.4.1

* Use findlibs instead of custom finder for odc
* Support constant bitfields
* Correct encoding with constant strings > 8 characters in length
* Support pandas native string type
* Fix access to exploded bitfield columns

## 1.1.3

* Improved github/ci integration

## 1.1.2

* Fixed [#6]: pip install breaks codc

## 1.1.1

* Fixed [ODB-534]: PyPI package is missing CHANGELOG

## 1.1.0

* Fixed [ODB-533]: Decode data starting with missing values correctly
* Fixed [ODB-530]: Bitfield column inspection returns incomplete data in pure-Python implementation
* Bumped up required `odc` version number to 1.4.0
* Added missing frame properties accessor to `codc` interface
* Fixed [ODB-525]: Setting odc prefix variable (`odc_DIR`) does not work as expected on macOS
* Fixed [ODB-524]: Keys and values in decoded frame properties are switched on older Python version
* Added test flag to skip `codc` tests on demand (`PYODC_SKIP_CODC`)
* Fixed [ODB-523]: Additional properties parameter is omitted in encode_odb() when string is passed as file
* Fixed package setup metadata
* Added documentation

## 1.0.4

* Correct support for constant codecs
* Decoding by column short name

## 1.0.3

* Specify `odc` library location with `odc/ODC_DIR`
* Correct `setup.py` dependencies to include pandas
* Support missing ConstantString values encoded from ODB1 using the `odb_migrator`

## 1.0.2

* String missing values should be `None` not `NaN`
* Refactor oneshot behaviour (`read_odb_oneshot` --> `read_odb(..., single=True)`)
* Raise correct error on `odc` not found
* Split `codb.py` into a full `codc` module
* Fix miscellaneous bugs

## 1.0.1

* Fixed automatic selection of integral codecs

## 1.0.0

* Initial version


[findlibs]: https://github.com/ecmwf/findlibs/
[#6]: https://github.com/ecmwf/pyodc/issues/6
[ODB-559]: https://jira.ecmwf.int/browse/ODB-559
[ODB-534]: https://jira.ecmwf.int/browse/ODB-534
[ODB-533]: https://jira.ecmwf.int/browse/ODB-533
[ODB-530]: https://jira.ecmwf.int/browse/ODB-530
[ODB-525]: https://jira.ecmwf.int/browse/ODB-525
[ODB-524]: https://jira.ecmwf.int/browse/ODB-524
[ODB-523]: https://jira.ecmwf.int/browse/ODB-523
