Quickstart

Authenticate

In order to use the ODP SDK, you need to authenticate using your provided API-key. This is achieved by setting the api_key-argument when instantiating ODPClient:

from odp_sdk import ODPClient
client = ODPClient(api_key="<my-api-key>")

You can also set the COGNITE_API_KEY environment variable:

$ export COGNITE_API_KEY=<my-api-key>

Download Ocean Data

Downloading ocean data is very easy once you have instantiated the ODPClient. The data is then returned as a Pandas DataFrame

df = client.casts(longitude=[-25, 35], latitude=[50, 80], timespan=["2018-06-01", "2018-06-30"])

It is also possible to specify what parameters to download:

df = client.casts(
    longitude = [-25, 35],
    latitude = [50, 80],
    timespan = ["2018-06-01", "2018-06-30"],
    parameters = ["date", "lon", "lat", "z", "Temperature", "Salinity"
)

In some instances, some filtering is necessary before downloading the data. This is achieved by first listing the available casts:

casts = client.get_available_casts(
    longitude = [-25, 35],
    latitude = [50, 80],
    timespan = ["2018-06-01", "2018-06-30"],
    metadata_parameters = ["extId", "date", "time", "lat", "lon", "country", "Platform", "dataset_code"
)

Then apply any desirable filters before downloading the data:

casts_norway = casts[casts.country == "NORWAY"]
df = client.download_data_from_casts(casts_norway.extId.tolist(),
                                     parameters=["date", "lat", "lon", "z", "Temperature", "Salinity")

You can also download the cast metadata:

df = client.get_metadata(casts_norway.extId.tolist())

API

ODPClient

class odp_sdk.ODPClient(api_key: str = None, project: str = 'odp', client_name: str = 'ODPPythonSDK', base_url: str = None, max_workers: int = None, headers: Dict[str, str] = None, timeout: int = None, token: Union[str, Callable[[], str], None] = None, disable_pypi_version_check: Optional[bool] = None, debug: bool = False, info_odp: bool = True)

Main entrypoint into the Ocean Data Platform SDK. All services are made available through this object.

Download cast data, containing ocean measurements through the water column around the globe.

Example:

from odp_sdk import ODPClient

client = ODPClient(api_key=MY_API_KEY)

df = client.casts(longitude=[-10,35],
                  latitude=[50,80],
                  timespan=['2018-03-01','2018-09-01'])
casts(longitude: Tuple[float, float] = (-180.0, 180.0), latitude: Tuple[float, float] = (-90.0, 90.0), timespan: Tuple[str, str] = ('1700-01-01', '2050-01-01'), n_threads: int = 35, include_flagged_data: bool = True, parameters: List[str] = None) → Optional[pandas.core.frame.DataFrame]

Download cast data within search criteria

Parameters:
  • longitude – list of min and max longitude, i.e [-10,35]
  • latitude – list of min and max latitude, i.e [50,80]
  • timespan – list of min and max datetime string [‘YYYY-MM-DD’] i.e [‘2018-03-01’,‘2018-09-01’]
  • n_threads – Number of threads to use
  • include_flagged_data – Boolean, whether flagged data that is flagged should be included or not
  • parameters – List of parameters to be included in DataFrame. If None all column are included. I.e. parameters=[‘date’,’lon’,’lat’,’Temperature’,’Oxygen’]
Returns:

Pandas DataFrame with cast data

filter_casts(casts: pandas.core.frame.DataFrame, longitude: Tuple[int, int], latitude: Tuple[int, int], timespan: Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp]) → Optional[pandas.core.frame.DataFrame]

Filtering a DataFrame of casts based on longitude, latitude and time

Parameters:
  • casts – DataFrame containing at least cast id, longitude, latitude and time
  • longitude – Tuple of min and max longitude, i.e (-10,35)
  • latitude – Tuple of min and max latitude, i.e (50,80)
  • timespan – Tuple of min and max pd.Timestamp
Returns:

DataFrame of filtered cast

get_available_casts(longitude: Tuple[float, float], latitude: Tuple[float, float], timespan: Tuple[str, str], n_threads: int = 35, meta_parameters: List[str] = None) → pandas.core.frame.DataFrame

Retrieves the available casts within search criteria

Parameters:
  • longitude – Tuple of min and max longitude, i.e (-10.11,35.33)
  • latitude – Tuple of min and max latitude, i.e (50,80)
  • timespan – Tuple of min and max datetime string [‘YYYY-MM-DD’] i.e (‘2018-03-01’,‘2018-09-01’)
  • n_threads
  • meta_parameters – List of column names to be returned. None returns all. i.e meta_parameters=[‘extId’,’lat’,’lon’,’date’, ‘country’, ‘equpment’, ‘Platform’]
Returns:

DataFrame of filtered cast

download_data_from_casts(cast_names: List[str], n_threads: int = 35, parameters: List[str] = None) → pandas.core.frame.DataFrame

Retrieving data from list of level 3 casts

Parameters:
  • cast_names – The externalId of the cast (‘extId’)
  • n_threads – Number of threads to be used for retrieving each cast
  • parameters – List of parameters to be downloaded If None all column are included. I.e. parameters=[‘date’,’lon’,’lat’,’Temperature’,’Oxygen’]
Returns:

Pandas data frame with cast data

get_metadata(cast_names: List[str]) → Union[None, pandas.core.frame.DataFrame]

Returns the metadata associated with the particular cast

Parameters:cast_names – List of cast names (externalId in ODP)
Returns:DataFrame of casts with metadata

Utilities

Advanced Helper Functions

Interpolate Casts to Z

UtilityFunctions.interpolate_casts_to_z(variable, z_int, max_z_extrapolation=3, max_z_copy_single_value=1, kind='linear')

Interpolate profiles in dataframe to prescribed depth level.

Takes a complete dataframe from ODP and interpolates each cast by filtering out the values from each unique cast

Parameters:
  • df – Pandas DataFrame fromODP
  • variable – Variable name to be interpolated as in the dataframe (Temperature, Oxygen, etc)
  • z_int – List of the desired depth intervals to return, i.e [0,10,20]
  • max_z_extrapolation – The maximum length to allow extrapolating. Nan values outside this distance.
  • max_z_copy_single_value – If only one row is present in the cast, this is the maximum distance between the point and the interpolation level for copying the value
  • kind – Type of interpolation as in interpolate_profile
Returns:

DataFrame of parameter values at prescribed depth levels.

Interpolate Casts to grid

UtilityFunctions.interpolate_to_grid(values, int_points, interp_type='linear', minimum_neighbors=3, gamma=0.25, kappa_star=5.052, search_radius=0.1, rbf_func='linear', rbf_smooth=0.001, rescale=True)

Interpolate unstructured ND data to a Nd grid

Powered by the metpy library

Parameters:
  • points – (N,D) array of points, typically latitude and longitude
  • values – (N,1) array of corresponding values, i.e Temperature, Oxygen etc
  • int_points – list of arrays for gridding i.e lat/long grid –> (np.linspace(-25,35,60*10+1),np.linspace(50,80,30*10+1))
  • interp_type – What type of interpolation to use. Available options include: 1) “linear”, “nearest”, “cubic”, or “rbf” from scipy.interpolate. 2) “natural_neighbor”, “barnes”, or “cressman” from metpy.interpolate. Default “linear”.
  • minimum_neighbors – Minimum number of neighbors needed to perform barnes or cressman interpolation for a point. Default is 3.
  • gamma – Adjustable smoothing parameter for the barnes interpolation. Default 0.25.
  • kappa_star – Response parameter for barnes interpolation, specified nondimensionally in terms of the Nyquist. Default 5.052
  • search_radius – A search radius to use for the barnes and cressman interpolation schemes. If search_radius is not specified, it will default to the average spacing of observations.
  • rbf_func – Specifies which function to use for Rbf interpolation. Options include: ‘multiquadric’, ‘inverse’, ‘gaussian’, ‘linear’, ‘cubic’, ‘quintic’, and ‘thin_plate’. Defualt ‘linear’. See scipy.interpolate.Rbf for more information.
  • rbf_smooth – Smoothing value applied to rbf interpolation. Higher values result in more smoothing.
  • rescale
Returns:

Array representing the interpolated values for each input point

Return type:

values_interpolated

Interpolate profile

UtilityFunctions.interpolate_profile(z_int, max_z_extrapolation=10, max_z_copy_single_value=1, kind='linear')

Interpolate profile zv (depth, parameter) to a user defined depth.

Parameters:
  • zv – 2-D array of depth and a parameter (temperature, oxygen, …)
  • z_int – 1-D array of depth levles to interpolate to
  • max_z_extrapolation – Maximum distance to extrapolate outside profile. Use 0 for no extrapolation.
  • max_z_copy_single_value – Maximum distance for copying the value of a single value profile.
  • kind – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’)
Returns:

Returns array of interpolated values

Example:

zv=array(
   [[ 0.        , 21.64599991],
   [ 9.93530941, 21.54500008],
   [19.87013626, 20.96299934],
   [20.40699959, 29.80448341],
   [19.36800003, 49.67173004],
   [18.8010006 , 74.50308228],
   [18.27400017, 99.3314209 ]]
)

z_int = [0,0,25,50,75,100,125]

v_int = interpolate_profile(ZV,z_int)

print(v_int)
# >>> array([21.64599991, 20.67589412, 19.36050431, 18.79045314, 18.25980907, nan])

Plot Casts

UtilityFunctions.plot_casts(df, longitude, latitude, cmap='viridis', vrange=[None, None])

Plot casts :param variable: str of oceanographic vairable, i.e. ‘Temperature’ :param df: Pandas DataFrame from ODP with lat, lon, and variable columns :param longitude: List of min and max longitude, i.e [-10,35] :param latitude: List of min and max latitude, i.e [50,80] :param cmap: colormap specification :param vrange: Ranges for variables to be showsn, i.e. [0,20]

Returns:Map with variable measurments plotted as points

Plot Grid

UtilityFunctions.plot_grid(latitude, int_lon, int_lat, g, cmap='viridis', vrange=[None, None], crs_latlon=<sphinx.ext.autodoc.importer._MockObject object>, variable_name='')

Plot Grid :param int_lon: (M,N) array of longitude grid :param int_lat: (M,N) array of latitude grid :param g: (M,N) grid to be shown :param cmap: colormap :param vrange: Ranges for grid to be shown i.e [0,35] :param crs_latlon: :param variable_name:

Returns:Map with interpolated values

Get Units

UtilityFunctions.get_units()

Get dict describing the units of the different columns

Returns:Dict of units

Plot percentage of nulls for each variable in variable list

UtilityFunctions.plot_nulls(var_list=None)

Plot percentage of nulls for each variable in variable list.

Takes a dataframe from ODP and a list of variables and plots the percentage of missing values

Parameters:
  • df – Pandas dataframe from ODP
  • var_list – list of variables (column names) that user is interested in default list is all the columns
Returns:

Plot of percentage of values missing at each measuremtn (lat, lon, depth)

Plot metadata-statistics

UtilityFunctions.plot_meta_stats(variable)

Get bar graph of percentage of data belonging to a specific variable subset in the metadata

Parameters:
  • df – Pandas DataFrame with extId-column
  • variable – Variable in subset of metadata
Returns:

Bar graph with percentage of data belonging to variable subset (i.e. data belonging to different modes of data collection (‘dataset’))

Plot distribution of values

UtilityFunctions.plot_distributions(var_list)

Plot the distributions of the values for a list of variables

Parameters:
  • df – Pandas DataFrame from ODP containing oceanographic variables and values
  • var_list – list of variables (column names) that should be plotted
Returns:

Plots of distributions of values for each variable in variable list

Plot casts belonging to specific dataset

UtilityFunctions.plot_datasets(variable, latitude, longitude)

Plots on a map casts belonging to specific dataset (mode of data collection, i.e. ctd, xbt)

Parameters:
  • df – Pandas DataFrame
  • variable – Variable of choice
  • latitude – Bounding box latitude
  • longitude – Bounding box longitude
Returns:

Map with color coded casts based on dataset_code

Internal Helper Functions

UtilityFunctions.geo_map()

Helper function for mapping :param ax: Matplotlib axis

UtilityFunctions.missing_values(var_list)

Get dataframe of nulls for each variable in variable list.

Takes a dataframe from ODP and a list of variables and return dataframe of missing values

Parameters:
  • df – Pandas DataFrame from ODP
  • var_list – list of variables (column names) that user is interested in default list is all the columns
Returns:

Dataframe percentage of values missing at each measuremtn (lat, lon, depth)

Geographic Utilities

Convert Latitude and Longitude to Geo-Index

utils.gcs_to_index(lon: Union[float, List[float], numpy.ndarray], res: float = 1.0) → numpy.ndarray

Convert lat/lon to ODP index

Parameters:
  • lat – Latitude
  • lon – Longitude
  • res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns:

ODP-index

Return type:

float

Convert Latitude and Longitude to grid-coordinates

utils.gcs_to_grid(lon: Union[float, List[float], numpy.ndarray], res=1.0) → Union[Tuple[int, int], numpy.ndarray]

Convert lat/lon to grid

Parameters:
  • lat – Latitude
  • lon – Longitude
  • res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns:

Grid index

Return type:

tuple(int, int)

Convert Geo-Index to grid-coordinates

utils.index_to_grid(res: float = 1.0) → Union[Tuple[int, int], numpy.ndarray]

Convert ODP-index to grid-coordinates

Parameters:
  • index – ODP-index in the range [1, 64800] when res=1
  • res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns:

Grid-coordinates

Return type:

tuple(int, int)

Convert Geo-Index to Latitude and Longitude

utils.index_to_gcs(res: float = 1.0) → Union[Tuple[float, float], numpy.ndarray]

Convert ODP-index to lat/lon

Parameters:
  • index – ODP-index in the range [1, 64800] when res=1
  • res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns:

longitude, latitude

Return type:

tuple(float, float)

Get all grid-coordinates within a rectangle

utils.grid_rect_members(p2: Tuple[int, int], compensate_dateline: bool = False) → numpy.ndarray

Fill a rectangle, defined by two corner grid-coordinates, with all grid-coordinates contained in it

Parameters:
  • p1 – First corner of rectangle
  • p2 – Second corner of rectangle
  • compensate_dateline – Compensate for international dateline. If true, then two points close to each other near the international dateline or south pole will define a rectangle across the dateline, instead going all the way around the globe
Returns:

2D-array of all grid-coordinates contained within the rectangle.

Return type:

np.array

Note

The ends are included. For example - if p1 and p2 are equal, then the returned array is NOT empty, but instead contains a single point - p1

Get all Geo-Indices within a rectangle

utils.index_rect_members(p2: int, res: float = 1, compensate_dateline: bool = False) → numpy.array

Fill a rectangle, defined by two corner geo-indices, with all geo-indices contained in it

Parameters:
  • p1 – Geo-Index of first corner of rectangle
  • p2 – Geo-Index of second corner of rectangle
  • res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
  • compensate_dateline – Compensate for international dateline. If true, then two points close to each other near the international dateline or south pole will define a rectangle across the dateline, instead going all the way around the globe
Returns:

1D-array of all geo-indices contained within the rectangle.

Return type:

np.array

Note

The ends are included. For example - if p1 and p2 are equal, then the returned array is NOT empty, but instead contains a single point - p1