profinder#

profinder is a collection of algorithms for finding profiles in ocean pressure data.

Usage#

Currently the package contains just one algorithm for identifying profiles, called find_profiles, which you can import from the main package.

from profinder import get_example_data, find_profiles

The function operates on pressure or depth time series data with no explicit need for time information. It assumes uniformly spaced data, but will still mostly work in the case that data are not uniform. The pressure data in the examples come from a 32 Hz RBR Concerto profiling in an icy Alaskan fjord. You can get the example data using:

pressure = get_example_data()
print(pressure[:10])

[0.01519012 0.02872944 0.01291084 0.02024746 0.01770782 0.02277946
 0.02475262 0.03238297 0.03802967 0.02418041]

The find_profiles function will identify up and down pairs in the data. Each up or down part in a profile is defined by start and end indexes, (down_start, down_end, up_start, up_end). The down_end and up_start will sometimes be identical for sharp profiles, or if speed thresholding options are not selected.

segments = find_profiles(pressure)
print(segments)

[(18289, 21825, 21900, 22456), (22490, 23312, 23389, 24086), (24137, 25140, 25207, 26016), (26089, 27531, 27567, 28840), (28970, 30345, 30353, 32115), (40664, 46835, 46894, 52390), (55882, 57125, 57199, 60976), (78102, 81833, 81927, 82613), (84138, 85574, 85711, 86931)]

The default parameters may need to be changed to find the profiles properly. Note that some smaller profiles are not identified in the following example. Zoom in to see the different between down end and up start locations.

find_profiles accepts a number of arguments for fine-tuning profile identification. Underneath the hood it is applying scipy functions find_peaks and savgol_filter. Consequently, a lot of the arguments to find_profiles alter the behaviour of these functions and it is helpful to be familiar with their operation.

The profile finding algorithm roughly follows the steps below. The action of each step is modified by a set of arguments.

Step	Arguments modifying the step
1. Pressure data are smoothed to remove noise (optional)	`apply_smoothing`, `window_length`, `polyorder`
2. Pressure maxima are identified	`peaks_kwargs`
3. Pressure minima are identified, equivalent to searching for maxima in negative pressure	`troughs_kwargs`
4. Segments are cleaned up to get better estimates of where the profiles start and end	`min_pressure`, `run_length`, `min_pressure_change`, `apply_speed_threshold`, `time`, `velocity`, `min_speed`, `direction`

The results from the example above can be improved by modifying the peak finding steps. We identify by eye that some smaller peaks are missed, suggesting that we should decrease the minimum height and prominance thresholds when finding peaks and troughs.

peaks_kwargs = {"height": 15, "distance": 200, "width": 200, "prominence": 15}
segments = find_profiles(pressure, apply_smoothing=True, peaks_kwargs=peaks_kwargs)

We can also set a minimum pressure threshold for profiles to start and end. Note that start and end pressures may not match this minimum exactly when applying smoothing. If it is important that the restriction be met exactly while smoothing, add some buffer into the value (e.g. choose 3.5 dbar instead of 3 dbar) or don’t smooth.

You can also increase the run_length to make sure that the instrument is going up/down for a sufficient number of points to consider the profile started and ended. This argument is coupled with min_pressure_change, which defines the minimum pressure changes between each point in the run. For example, if your profile starts with pressure values [1, 2, 5, 8, …] and you have a run length of 2 and minimum pressure change of 2, the profile would start at the value 2, because the although the pressure is always increasing, the change between 1 and 2 does not exceed the minimum. This removes some of the flatter regions in data.

segments = find_profiles(
  pressure, 
  apply_smoothing=True, 
  window_length=9, 
  min_pressure=3.0, 
  run_length=10, 
  peaks_kwargs=peaks_kwargs
  )

Glider example#

Gliders return decimated (low resolution) real-time pressure data and may undertake complex dive plans. The example below illustrates how to extract profiles in this case.

from profinder import synthetic_glider_pressure

pressure = synthetic_glider_pressure()
peaks_kwargs = {"height": 100, "distance": 5, "width": 5, "prominence": 100}
segments = find_profiles(pressure, peaks_kwargs=peaks_kwargs)

Microstructure example#

Microstructure instruments, such as the vertical microstructure profiler (VMP) or glider-based MicroRider require additional constraints to identify profiles. The sensors needs to be moving fast enough through the water to collect useful data and data that fall outside of this speed threshold need to be discarded. Additionally, the MicroRider may only be turned on during a climb or dive, meaning that only half a profile of data are collected.

Below, we create a toy model of a VMP that initially falls in free-fall, before being pulled up by a winch (constant force), with the goal of simulating velocity changes experienced by a real instrument.

A possible system of differential equations for the instrument motion is:

\[\begin{split} \begin{align*} \frac{dz}{dt} &= w \\ \frac{dw}{dt} &= g \frac{m_v - m_w}{m_v} - \frac{m_w}{m_v} \frac{C_d}{L} w |w| + \frac{T(t)}{m_v} \end{align*} \end{split}\]

where \(z\) is the height, \(w\) is the vertical velocity, \(g\) is gravity, \(m_v\) is the instrument mass, \(m_w\) is the mass of water displaced, \(C_d\) is the drag coefficient, \(L\) is the hull length, and \(T(t)\) is the time-dependent tension from the winch. These equations are solved in the code to produce a synthetic depth profile. We also mimic the sampling rate of a real VMP (60 Hz).

We can apply find_profiles to this synthetic data like before, but data when the instrument is moving slowly may not be properly excluded.

segments_speed = find_profiles(-z, apply_speed_threshold=True, velocity=-w, min_speed=0.9, direction="down")

Handling missing data#

Returning to the glider example, what if there are missing data? One supported approach is to drop the missing data.

np.random.seed(14123)

pressure = synthetic_glider_pressure(n_points=1000)

pressure[::8] = np.nan
pressure[::9] = np.nan
indices = np.random.choice(pressure.size, 50, replace=False)
pressure[indices] = np.nan

peaks_kwargs = {"height": 100, "distance": 5, "width": 5, "prominence": 100}
segments = find_profiles(pressure, peaks_kwargs=peaks_kwargs, missing="drop")