Quickstart¶
Authenticate¶
In order to use the ODP SDK, you need to authenticate using your provided API-key. This is achieved by setting the api_key-argument when instantiating ODPClient:
from odp_sdk import ODPClient
client = ODPClient(api_key="<my-api-key>")
You can also set the COGNITE_API_KEY environment variable:
$ export COGNITE_API_KEY=<my-api-key>
Download Ocean Data¶
Downloading ocean data is very easy once you have instantiated the ODPClient. The data is then returned as a Pandas DataFrame
df = client.casts(longitude=[-25, 35], latitude=[50, 80], timespan=["2018-06-01", "2018-06-30"])
It is also possible to specify what parameters to download:
df = client.casts(
longitude = [-25, 35],
latitude = [50, 80],
timespan = ["2018-06-01", "2018-06-30"],
parameters = ["date", "lon", "lat", "z", "Temperature", "Salinity"
)
In some instances, some filtering is necessary before downloading the data. This is achieved by first listing the available casts:
casts = client.get_available_casts(
longitude = [-25, 35],
latitude = [50, 80],
timespan = ["2018-06-01", "2018-06-30"],
metadata_parameters = ["extId", "date", "time", "lat", "lon", "country", "Platform", "dataset_code"
)
Then apply any desirable filters before downloading the data:
casts_norway = casts[casts.country == "NORWAY"]
df = client.download_data_from_casts(casts_norway.extId.tolist(),
parameters=["date", "lat", "lon", "z", "Temperature", "Salinity")
You can also download the cast metadata:
df = client.get_metadata(casts_norway.extId.tolist())
API¶
ODPClient¶
-
class
odp_sdk.ODPClient(api_key: str = None, project: str = 'odp', client_name: str = 'ODPPythonSDK', base_url: str = None, max_workers: int = None, headers: Dict[str, str] = None, timeout: int = None, token: Union[str, Callable[[], str], None] = None, disable_pypi_version_check: Optional[bool] = None, debug: bool = False, info_odp: bool = True)¶ Main entrypoint into the Ocean Data Platform SDK. All services are made available through this object.
Download cast data, containing ocean measurements through the water column around the globe.
Example:
from odp_sdk import ODPClient client = ODPClient(api_key=MY_API_KEY) df = client.casts(longitude=[-10,35], latitude=[50,80], timespan=['2018-03-01','2018-09-01'])
-
casts(longitude: Tuple[float, float] = (-180.0, 180.0), latitude: Tuple[float, float] = (-90.0, 90.0), timespan: Tuple[str, str] = ('1700-01-01', '2050-01-01'), n_threads: int = 35, include_flagged_data: bool = True, parameters: List[str] = None) → Optional[pandas.core.frame.DataFrame]¶ Download cast data within search criteria
Parameters: - longitude – list of min and max longitude, i.e [-10,35]
- latitude – list of min and max latitude, i.e [50,80]
- timespan – list of min and max datetime string [‘YYYY-MM-DD’] i.e [‘2018-03-01’,‘2018-09-01’]
- n_threads – Number of threads to use
- include_flagged_data – Boolean, whether flagged data that is flagged should be included or not
- parameters – List of parameters to be included in DataFrame. If None all column are included. I.e. parameters=[‘date’,’lon’,’lat’,’Temperature’,’Oxygen’]
Returns: Pandas DataFrame with cast data
-
filter_casts(casts: pandas.core.frame.DataFrame, longitude: Tuple[int, int], latitude: Tuple[int, int], timespan: Tuple[pandas._libs.tslibs.timestamps.Timestamp, pandas._libs.tslibs.timestamps.Timestamp]) → Optional[pandas.core.frame.DataFrame]¶ Filtering a DataFrame of casts based on longitude, latitude and time
Parameters: - casts – DataFrame containing at least cast id, longitude, latitude and time
- longitude – Tuple of min and max longitude, i.e (-10,35)
- latitude – Tuple of min and max latitude, i.e (50,80)
- timespan – Tuple of min and max pd.Timestamp
Returns: DataFrame of filtered cast
-
get_available_casts(longitude: Tuple[float, float], latitude: Tuple[float, float], timespan: Tuple[str, str], n_threads: int = 35, meta_parameters: List[str] = None) → pandas.core.frame.DataFrame¶ Retrieves the available casts within search criteria
Parameters: - longitude – Tuple of min and max longitude, i.e (-10.11,35.33)
- latitude – Tuple of min and max latitude, i.e (50,80)
- timespan – Tuple of min and max datetime string [‘YYYY-MM-DD’] i.e (‘2018-03-01’,‘2018-09-01’)
- n_threads –
- meta_parameters – List of column names to be returned. None returns all. i.e meta_parameters=[‘extId’,’lat’,’lon’,’date’, ‘country’, ‘equpment’, ‘Platform’]
Returns: DataFrame of filtered cast
-
download_data_from_casts(cast_names: List[str], n_threads: int = 35, parameters: List[str] = None) → pandas.core.frame.DataFrame¶ Retrieving data from list of level 3 casts
Parameters: - cast_names – The externalId of the cast (‘extId’)
- n_threads – Number of threads to be used for retrieving each cast
- parameters – List of parameters to be downloaded If None all column are included. I.e. parameters=[‘date’,’lon’,’lat’,’Temperature’,’Oxygen’]
Returns: Pandas data frame with cast data
-
get_metadata(cast_names: List[str]) → Union[None, pandas.core.frame.DataFrame]¶ Returns the metadata associated with the particular cast
Parameters: cast_names – List of cast names (externalId in ODP) Returns: DataFrame of casts with metadata
-
Utilities¶
Advanced Helper Functions¶
Interpolate Casts to Z¶
-
UtilityFunctions.interpolate_casts_to_z(variable, z_int, max_z_extrapolation=3, max_z_copy_single_value=1, kind='linear')¶ Interpolate profiles in dataframe to prescribed depth level.
Takes a complete dataframe from ODP and interpolates each cast by filtering out the values from each unique cast
Parameters: - df – Pandas DataFrame fromODP
- variable – Variable name to be interpolated as in the dataframe (Temperature, Oxygen, etc)
- z_int – List of the desired depth intervals to return, i.e [0,10,20]
- max_z_extrapolation – The maximum length to allow extrapolating. Nan values outside this distance.
- max_z_copy_single_value – If only one row is present in the cast, this is the maximum distance between the point and the interpolation level for copying the value
- kind – Type of interpolation as in interpolate_profile
Returns: DataFrame of parameter values at prescribed depth levels.
Interpolate Casts to grid¶
-
UtilityFunctions.interpolate_to_grid(values, int_points, interp_type='linear', minimum_neighbors=3, gamma=0.25, kappa_star=5.052, search_radius=0.1, rbf_func='linear', rbf_smooth=0.001, rescale=True)¶ Interpolate unstructured ND data to a Nd grid
Powered by the metpy library
Parameters: - points – (N,D) array of points, typically latitude and longitude
- values – (N,1) array of corresponding values, i.e Temperature, Oxygen etc
- int_points – list of arrays for gridding i.e lat/long grid –> (np.linspace(-25,35,60*10+1),np.linspace(50,80,30*10+1))
- interp_type – What type of interpolation to use. Available options include: 1) “linear”, “nearest”, “cubic”, or “rbf” from scipy.interpolate. 2) “natural_neighbor”, “barnes”, or “cressman” from metpy.interpolate. Default “linear”.
- minimum_neighbors – Minimum number of neighbors needed to perform barnes or cressman interpolation for a point. Default is 3.
- gamma – Adjustable smoothing parameter for the barnes interpolation. Default 0.25.
- kappa_star – Response parameter for barnes interpolation, specified nondimensionally in terms of the Nyquist. Default 5.052
- search_radius – A search radius to use for the barnes and cressman interpolation schemes. If search_radius is not specified, it will default to the average spacing of observations.
- rbf_func – Specifies which function to use for Rbf interpolation. Options include: ‘multiquadric’, ‘inverse’, ‘gaussian’, ‘linear’, ‘cubic’, ‘quintic’, and ‘thin_plate’. Defualt ‘linear’. See scipy.interpolate.Rbf for more information.
- rbf_smooth – Smoothing value applied to rbf interpolation. Higher values result in more smoothing.
- rescale –
Returns: Array representing the interpolated values for each input point
Return type: values_interpolated
Interpolate profile¶
-
UtilityFunctions.interpolate_profile(z_int, max_z_extrapolation=10, max_z_copy_single_value=1, kind='linear')¶ Interpolate profile zv (depth, parameter) to a user defined depth.
Parameters: - zv – 2-D array of depth and a parameter (temperature, oxygen, …)
- z_int – 1-D array of depth levles to interpolate to
- max_z_extrapolation – Maximum distance to extrapolate outside profile. Use 0 for no extrapolation.
- max_z_copy_single_value – Maximum distance for copying the value of a single value profile.
- kind – Specifies the kind of interpolation as a string (‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’, ‘previous’, ‘next’, where ‘zero’, ‘slinear’)
Returns: Returns array of interpolated values
Example:
zv=array( [[ 0. , 21.64599991], [ 9.93530941, 21.54500008], [19.87013626, 20.96299934], [20.40699959, 29.80448341], [19.36800003, 49.67173004], [18.8010006 , 74.50308228], [18.27400017, 99.3314209 ]] ) z_int = [0,0,25,50,75,100,125] v_int = interpolate_profile(ZV,z_int) print(v_int) # >>> array([21.64599991, 20.67589412, 19.36050431, 18.79045314, 18.25980907, nan])
Plot Casts¶
-
UtilityFunctions.plot_casts(df, longitude, latitude, cmap='viridis', vrange=[None, None])¶ Plot casts :param variable: str of oceanographic vairable, i.e. ‘Temperature’ :param df: Pandas DataFrame from ODP with lat, lon, and variable columns :param longitude: List of min and max longitude, i.e [-10,35] :param latitude: List of min and max latitude, i.e [50,80] :param cmap: colormap specification :param vrange: Ranges for variables to be showsn, i.e. [0,20]
Returns: Map with variable measurments plotted as points
Plot Grid¶
-
UtilityFunctions.plot_grid(latitude, int_lon, int_lat, g, cmap='viridis', vrange=[None, None], crs_latlon=<sphinx.ext.autodoc.importer._MockObject object>, variable_name='')¶ Plot Grid :param int_lon: (M,N) array of longitude grid :param int_lat: (M,N) array of latitude grid :param g: (M,N) grid to be shown :param cmap: colormap :param vrange: Ranges for grid to be shown i.e [0,35] :param crs_latlon: :param variable_name:
Returns: Map with interpolated values
Get Units¶
-
UtilityFunctions.get_units()¶ Get dict describing the units of the different columns
Returns: Dict of units
Plot percentage of nulls for each variable in variable list¶
-
UtilityFunctions.plot_nulls(var_list=None)¶ Plot percentage of nulls for each variable in variable list.
Takes a dataframe from ODP and a list of variables and plots the percentage of missing values
Parameters: - df – Pandas dataframe from ODP
- var_list – list of variables (column names) that user is interested in default list is all the columns
Returns: Plot of percentage of values missing at each measuremtn (lat, lon, depth)
Plot metadata-statistics¶
-
UtilityFunctions.plot_meta_stats(variable)¶ Get bar graph of percentage of data belonging to a specific variable subset in the metadata
Parameters: - df – Pandas DataFrame with extId-column
- variable – Variable in subset of metadata
Returns: Bar graph with percentage of data belonging to variable subset (i.e. data belonging to different modes of data collection (‘dataset’))
Plot distribution of values¶
-
UtilityFunctions.plot_distributions(var_list)¶ Plot the distributions of the values for a list of variables
Parameters: - df – Pandas DataFrame from ODP containing oceanographic variables and values
- var_list – list of variables (column names) that should be plotted
Returns: Plots of distributions of values for each variable in variable list
Plot casts belonging to specific dataset¶
-
UtilityFunctions.plot_datasets(variable, latitude, longitude)¶ Plots on a map casts belonging to specific dataset (mode of data collection, i.e. ctd, xbt)
Parameters: - df – Pandas DataFrame
- variable – Variable of choice
- latitude – Bounding box latitude
- longitude – Bounding box longitude
Returns: Map with color coded casts based on dataset_code
Internal Helper Functions¶
-
UtilityFunctions.geo_map()¶ Helper function for mapping :param ax: Matplotlib axis
-
UtilityFunctions.missing_values(var_list)¶ Get dataframe of nulls for each variable in variable list.
Takes a dataframe from ODP and a list of variables and return dataframe of missing values
Parameters: - df – Pandas DataFrame from ODP
- var_list – list of variables (column names) that user is interested in default list is all the columns
Returns: Dataframe percentage of values missing at each measuremtn (lat, lon, depth)
Geographic Utilities¶
Convert Latitude and Longitude to Geo-Index¶
-
utils.gcs_to_index(lon: Union[float, List[float], numpy.ndarray], res: float = 1.0) → numpy.ndarray¶ Convert lat/lon to ODP index
Parameters: - lat – Latitude
- lon – Longitude
- res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns: ODP-index
Return type: float
Convert Latitude and Longitude to grid-coordinates¶
-
utils.gcs_to_grid(lon: Union[float, List[float], numpy.ndarray], res=1.0) → Union[Tuple[int, int], numpy.ndarray]¶ Convert lat/lon to grid
Parameters: - lat – Latitude
- lon – Longitude
- res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns: Grid index
Return type: tuple(int, int)
Convert Geo-Index to grid-coordinates¶
-
utils.index_to_grid(res: float = 1.0) → Union[Tuple[int, int], numpy.ndarray]¶ Convert ODP-index to grid-coordinates
Parameters: - index – ODP-index in the range [1, 64800] when res=1
- res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns: Grid-coordinates
Return type: tuple(int, int)
Convert Geo-Index to Latitude and Longitude¶
-
utils.index_to_gcs(res: float = 1.0) → Union[Tuple[float, float], numpy.ndarray]¶ Convert ODP-index to lat/lon
Parameters: - index – ODP-index in the range [1, 64800] when res=1
- res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
Returns: longitude, latitude
Return type: tuple(float, float)
Get all grid-coordinates within a rectangle¶
-
utils.grid_rect_members(p2: Tuple[int, int], compensate_dateline: bool = False) → numpy.ndarray¶ Fill a rectangle, defined by two corner grid-coordinates, with all grid-coordinates contained in it
Parameters: - p1 – First corner of rectangle
- p2 – Second corner of rectangle
- compensate_dateline – Compensate for international dateline. If true, then two points close to each other near the international dateline or south pole will define a rectangle across the dateline, instead going all the way around the globe
Returns: 2D-array of all grid-coordinates contained within the rectangle.
Return type: np.array
Note
The ends are included. For example - if p1 and p2 are equal, then the returned array is NOT empty, but instead contains a single point - p1
Get all Geo-Indices within a rectangle¶
-
utils.index_rect_members(p2: int, res: float = 1, compensate_dateline: bool = False) → numpy.array¶ Fill a rectangle, defined by two corner geo-indices, with all geo-indices contained in it
Parameters: - p1 – Geo-Index of first corner of rectangle
- p2 – Geo-Index of second corner of rectangle
- res – Resolution where 1x1 degress per index is default. For half-degree grids, use 0.5
- compensate_dateline – Compensate for international dateline. If true, then two points close to each other near the international dateline or south pole will define a rectangle across the dateline, instead going all the way around the globe
Returns: 1D-array of all geo-indices contained within the rectangle.
Return type: np.array
Note
The ends are included. For example - if p1 and p2 are equal, then the returned array is NOT empty, but instead contains a single point - p1