Gaussian Processes
This module houses Gaussian Processes wrappers.
The wrappers below call the Rust written library that runs Gaussian processes smoothing and forecasting. There are multithreaded functions denoted by the prefix “multiple”. More will be added here in future, but for now this is sufficient for most EO applications.
- EOkit.gaussian_processes.multiple_gps(x_inputs, y_inputs, forecast_spacing, forecast_amount, length_scale=30, amplitude=0.5, noise=0.1, n_threads=-1)
Run multiple RBF kernel GPs on 1D data.
This is a wrapper function that lets you specify RBF kernel parameters to run GP smoothers. The code also allows for forecasting, simply set the forecasting spacing as desired. For example, if x_input is in days and a weekly forecast is needed, forecast_spacing would be 7. Then set the forecasting amount - this is how many forecasting of forecast_spacing are needed.
A list of NumPy arrays should be used as inputs. A list was used here so that the inputs can be of different lengths, which makes it easier when removing cloud cover or other types of null value.
Notes
NO NANS/INFS SHOULD ENTER THIS FUNCTION.
- Parameters:
x_inputs (list of ndarrays of type float, size (N)) – A list of NumPy arrays containing the x_input variable.
y_inputs (list of ndarrays of type float, size (N)) – A list of NumPy arrays containing the y input (the variable to be forecast/smoothed). Remove NaNs first.
forecast_spacing (float) – The spacing of the forecast. E.g. the temporal resolution of the forecast.
forecast_amount (float) – The amount of forecasts of resultion forecast_spacing. Set to 0 for no forecasts (just smoothing).
length_scale (float, optional) – The lengthscale of the RBF kernel. Larger = Smoother, by default 50.0
amplitude (float, optional) – The amplitude of the RBF kernel, by default 0.5
noise (float, optional) – Noise of the GP regresion, by default 0.01
n_threads (int, optional) – Amount of worker threads spawned to complete the task. The default is -1 which uses all logical processor cores. To tone this down, use something between 1 and the number of processor cores you have. Setting this value to a number larger than the amount of logical cores you have will most likely degreade performance.
- Returns:
A list of numpy arrays containing the smoothed/forecasted values. In the future this may also include the X variable for ease.
- Return type:
list of ndarrays of type float, size (N)
- EOkit.gaussian_processes.single_gp(x_input, y_input, forecast_spacing, forecast_amount, length_scale=50.0, amplitude=0.5, noise=0.01)
Run a single RBF kernel GP on 1D data.
This is a wrapper function that lets you specify RBF kernel parameters to run a GP smoother. The code also allows for forecasting, simply set the forecasting spacing as desired. For example, if x_input is in days and a weekly forecast is needed, forecast_spacing would be 7. Then set the forecasting amount - this is how many forecasting of forecast_spacing are needed.
Notes
NO NANS/INFS SHOULD ENTER THIS FUNCTION.
- Parameters:
x_input (ndarray of type float, size (N)) – The x_inputs, in most cases this will be time of some format. It’s best to subtract the start value from this array to have it start from 0.
y_input (ndarray of type float, size (N)) – The y_inputs (the variable to be forecast/smoothed). Remove NaNs first.
forecast_spacing (float) – The spacing of the forecast. E.g. the temporal resolution of the forecast.
forecast_amount (float) – The amount of forecasts of resultion forecast_spacing. Set to 0 for no forecasts (just smoothing).
length_scale (float, optional) – The lengthscale of the RBF kernel. Larger = Smoother, by default 50.0
amplitude (float, optional) – The amplitude of the RBF kernel, by default 0.5
noise (float, optional) – Noise of the GP regresion, by default 0.01
- Returns:
A numpy array containing the smoothed/forecasted values. In the future this may also include the X variable for ease.
- Return type:
ndarray of type float, size (N)
Examples
Below is a simple example of how to use Gaussian Processes.
>>> data_len = 1000 >>> days = np.arange(0, data_len, 1., dtype=float) >>> vci = np.sin(days) + np.random.standard_normal(data_len) * 2)) >>> rust_smoothed_data = gaussian_processes.single_gp(days, vci, 0, 0)