Contributed Code

Nested Sampling

Nested Sampling is a non-MCMC approach that works for arbitrary probability models, and is particularly well suited to complex posteriors:

NestedSampler offers a wrapper for jaxns. See JAXNS’s readthedocs for examples and Nested Sampling for Gaussian Shells example for how to apply the sampler on numpyro models. Can handle arbitrary models, including ones with discrete RVs, and non-invertible transformations.

class NestedSampler(model, *, constructor_kwargs=None, termination_kwargs=None)[source]

Bases: object

(EXPERIMENTAL) A wrapper for jaxns , a nested sampling package based on JAX.

See reference [1] for details on the meaning of each parameter. Please consider citing this reference if you use the nested sampler in your research.

Note

To enumerate over a discrete latent variable, you can add the keyword infer={“enumerate”: “parallel”} to the corresponding sample statement.

Note

To improve the performance, please consider enabling x64 mode at the beginning of your NumPyro program numpyro.enable_x64().

References

JAXNS: a high-performance nested sampling package based on JAX, Joshua G. Albert (https://arxiv.org/abs/2012.15286)

Parameters:

model (callable) – a call with NumPyro primitives
constructor_kwargs (dict) – additional keyword arguments to construct an upstream jaxns.NestedSampler instance.
termination_kwargs (dict) – keyword arguments to terminate the sampler. Please refer to the upstream jaxns.NestedSampler.__call__() method.

Example:

>>> from jax import random
>>> import jax.numpy as jnp
>>> import numpyro
>>> import numpyro.distributions as dist
>>> from numpyro.contrib.nested_sampling import NestedSampler

>>> true_coefs = jnp.array([1., 2., 3.])
>>> data = random.normal(random.key(0), (2000, 3))
>>> labels = dist.Bernoulli(logits=(true_coefs * data).sum(-1)).sample(random.key(1))
>>>
>>> def model(data, labels):
...     coefs = numpyro.sample('coefs', dist.Normal(0, 1).expand([3]))
...     intercept = numpyro.sample('intercept', dist.Normal(0., 10.))
...     return numpyro.sample('y', dist.Bernoulli(logits=(coefs * data + intercept).sum(-1)),
...                           obs=labels)
>>>
>>> ns = NestedSampler(model)
>>> ns.run(random.key(2), data, labels)
>>> samples = ns.get_samples(random.key(3), num_samples=1000)
>>> assert jnp.mean(jnp.abs(samples['intercept'])) < 0.05
>>> print(jnp.mean(samples['coefs'], axis=0))
[0.93661342 1.95034876 2.86123884]

run(rng_key, *args, **kwargs)[source]

Run the nested samplers and collect weighted samples.

Parameters:

rng_key (Array) – Random number generator key to be used for the sampling.
args – The arguments needed by the model.
kwargs – The keyword arguments needed by the model.

get_samples(rng_key, num_samples, *, group_by_chain=False)[source]

Draws samples from the weighted samples collected from the run.

Parameters:

rng_key (Array) – Random number generator key to be used to draw samples.
num_samples (int) – The number of samples.
group_by_chain (bool) – If True, a leading chain dimension of 1 is added to the output arrays.

Returns:

a dict of posterior samples

get_weighted_samples()[source]: Gets weighted samples and their corresponding log weights.

print_summary()[source]: Print summary of the result. This is a wrapper of jaxns.utils.summary().

diagnostics()[source]: Plot diagnostics of the result. This is a wrapper of jaxns.plotting.plot_diagnostics() and jaxns.plotting.plot_cornerplot().

Stein Variational Inference

Stein variational inference (SteinVI) is a family of VI techniques for approximate Bayesian inference based on Stein’s method (see [1] for an overview). It is gaining popularity as it combines the scalability of traditional VI with the flexibility of non-parametric particle-based methods.

Stein variational gradient descent (SVGD) [2] is a recent SteinVI technique which uses iteratively moves a set of particles \(\{z_i\}_{i=1}^N\) to approximate a distribution p(z). SVGD is well suited for capturing correlations between latent variables as a particle-based method. The technique preserves the scalability of traditional VI approaches while offering the flexibility and modeling scope of methods such as Markov chain Monte Carlo (MCMC). SVGD is good at capturing multi-modality [3].

numpyro.contrib.einstein is a framework for particle-based inference using the Stein mixture inference algorithm [4]. The framework works on Stein mixtures, a restricted mixture of guide programs parameterized by Stein particles. Similarly to how SVGD works, Stein mixtures can approximate model posteriors by moving the Stein particles according to the Stein forces. Because the Stein particles parameterize a guide, they capture a neighborhood rather than a single point.

numpyro.contrib.einstein mimics the interface from numpyro.infer.svi, so trying SteinVI requires minimal change to the code for existing models inferred with SVI. For primary usage, see the Bayesian neural network example.

The framework currently supports several kernels, including:

RBFKernel
LinearKernel
RandomFeatureKernel
MixtureKernel
GraphicalKernel
RadialGaussNewtonKernel

SteinVI based examples include:

The Bayesian neural network example.
The deep Markov example.

References

Stein’s Method Meets Statistics: A Review of Some Recent Developments. 2021.
Andreas Anastasiou, Alessandro Barp, François-Xavier Briol, Bruno Ebner, Robert E. Gaunt, Fatemeh Ghaderinezhad, Jackson Gorham, Arthur Gretton, Christophe Ley, Qiang Liu, Lester Mackey, Chris. J. Oates, Gesine Reinert, Yvik Swan.
Stein Variational Gradient Descent: A General-Purpose Bayesian Inference Algorithm. 2016.
Qiang Liu, Dilin Wang. NeurIPS
Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models. 2019.
Dilin Wang, Qiang Liu. PMLR
ELBOing Stein: Variational Bayes with Stein Mixture Inference. 2024.
Ola Rønning, Eric Nalisnick, Christophe Ley, Padhraic Smyth, and Thomas Hamelryck. arXiv:2410.22948.

SteinVI Interface

class SteinVI(model, guide, optim, kernel_fn, num_stein_particles=10, num_elbo_particles=10, loss_temperature=1.0, repulsion_temperature=1.0, non_mixture_guide_params_fn=<function SteinVI.<lambda>>, **static_kwargs)[source]

Variational inference with Stein mixtures inference [1].

Example:

>>> from jax import random, numpy as jnp

>>> from numpyro import sample, param, plate
>>> from numpyro.distributions import Beta, Bernoulli
>>> from numpyro.distributions.constraints import positive

>>> from numpyro.optim import Adagrad
>>> from numpyro.contrib.einstein import MixtureGuidePredictive, SteinVI, RBFKernel

>>> def model(data):
...     f = sample("fairness", Beta(10, 10))
...     n = data.shape[0] if data is not None else 1
...     with plate("N", n):
...         sample("obs", Bernoulli(f), obs=data)

>>> def guide(data):
...     # Initialize all particles in the same point.
...     alpha_q = param("alpha_q", 15., constraint=positive)
...     # Initialize particles by sampling an Exponential distribution.
...     beta_q = param("beta_q",
...                     lambda rng_key: random.exponential(rng_key),
...                     constraint=positive)
...     sample("fairness", Beta(alpha_q, beta_q))

>>> data = jnp.concatenate([jnp.ones(6), jnp.zeros(4)])

>>> opt = Adagrad(step_size=0.05)
>>> k = RBFKernel()
>>> stein = SteinVI(model, guide, opt, k, num_stein_particles=2)

>>> stein_result = stein.run(random.key(0), 200, data)
>>> params = stein_result.params

>>> # Use guide to make predictions.
>>> predictive = MixtureGuidePredictive(model, guide, params, num_samples=10, guide_sites=stein.guide_sites)
>>> samples = predictive(random.key(1), data=None)

Parameters:

model (Callable) – Python callable with NumPyro primitives for the model.
guide (Callable) – Python callable with NumPyro primitives for the guide.
optim (_NumPyroOptim) – An instance of _NumpyroOptim. Adagrad should be preferred over Adam [1].
kernel_fn (SteinKernel) – Function that computes the reproducing kernel to use with Stein mixture inference. We currently recommend RBFKernel. This may change as criteria for kernel selection are not well understood yet.
num_stein_particles – Number of particles (i.e., mixture components) in the mixture approximation. Default is 10.
num_elbo_particles – Number of Monte Carlo draws used to approximate the attractive force gradient. More particles give better gradient approximations. Default is 10.
loss_temperature (Float) – Scaling factor of the attractive force. Default is 1.
repulsion_temperature (Float) – Scaling factor of the repulsive force [2]. We recommend not scaling the repulsion. Default is 1.
non_mixture_guide_param_fn (Callable) – Predicate on names of parameters in the guide which should be optimized using one particle. This could be parameters for large normal networks or other transformation. Default excludes all parameters from this option.
static_kwargs – Static keyword arguments for the model and guide. These arguments cannot change during inference.

References:

Rønning, Ola, et al. “ELBOing Stein: Variational Bayes with Stein Mixture Inference.”
arXiv preprint arXiv:2410.22948 (2024).
Liu, Chang, et al. “Understanding and Accelerating Particle-Based Variational Inference.”
International Conference on Machine Learning. PMLR, 2019.
Wang, Dilin, and Qiang Liu. “Nonlinear Stein Variational Gradient Descent for Learning Diversified Mixture Models.”
International Conference on Machine Learning. PMLR, 2019.

class SVGD(model, optim, kernel_fn, num_stein_particles=10, guide_kwargs={}, **static_kwargs)[source]

Stein variational gradient descent [1].

Example:

>>> from jax import random, numpy as jnp

>>> from numpyro import sample, param, plate
>>> from numpyro.distributions import Beta, Bernoulli
>>> from numpyro.distributions.constraints import positive

>>> from numpyro.optim import Adagrad
>>> from numpyro.contrib.einstein import SVGD, RBFKernel
>>> from numpyro.infer import Predictive

>>> def model(data):
...     f = sample("fairness", Beta(10, 10))
...     n = data.shape[0] if data is not None else 1
...     with plate("N", n):
...         sample("obs", Bernoulli(f), obs=data)

>>> data = jnp.concatenate([jnp.ones(6), jnp.zeros(4)])

>>> opt = Adagrad(step_size=0.05)
>>> k = RBFKernel()
>>> svgd = SVGD(model, opt, k, num_stein_particles=2)

>>> svgd_result = svgd.run(random.key(0), 200, data)

>>> params = svgd_result.params
>>> predictive = Predictive(model, guide=svgd.guide, params=params, num_samples=10, batch_ndims=1)
>>> samples = predictive(random.key(1), data=None)

Parameters:

model (Callable) – Python callable with NumPyro primitives for the model.
guide (Callable) – Python callable with NumPyro primitives for the guide.
optim (_NumPyroOptim) – An instance of _NumpyroOptim. Adagrad should be preferred over Adam [1].
kernel_fn (SteinKernel) – Function that computes the reproducing kernel to use with SVGD. We currently recommend RBFKernel. This may change as criteria for kernel selection are not well understood yet.
num_stein_particles – Number of particles (i.e., mixture components) in the mixture approximation. Default is 10.

guide_kwargs (Dict) –

Keyword arguments for AutoDelta. Default behaviour is the same as the default for AutoDelta.

Usage:

opt = Adagrad(step_size=0.05)
k = RBFKernel()
svgd = SVGD(model, opt, k, guide_kwargs={'init_loc_fn': partial(init_to_uniform, radius=0.1)})

static_kwargs (Dict) – Static keyword arguments for the model and guide. These arguments cannot change during inference.

References:

Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.”
Advances in neural information processing systems 29 (2016).

class ASVGD(model, optim, kernel_fn, num_stein_particles=10, num_cycles=10, transition_speed=10, guide_kwargs={}, **static_kwargs)[source]

Annealing Stein variational gradient descent [1].

Example:

>>> from jax import random, numpy as jnp

>>> from numpyro import sample, param, plate
>>> from numpyro.distributions import Beta, Bernoulli
>>> from numpyro.distributions.constraints import positive

>>> from numpyro.optim import Adagrad
>>> from numpyro.contrib.einstein import ASVGD, RBFKernel
>>> from numpyro.infer import Predictive

>>> def model(data):
...     f = sample("fairness", Beta(10, 10))
...     n = data.shape[0] if data is not None else 1
...     with plate("N", n):
...         sample("obs", Bernoulli(f), obs=data)

>>> data = jnp.concatenate([jnp.ones(6), jnp.zeros(4)])

>>> opt = Adagrad(step_size=0.05)
>>> k = RBFKernel()
>>> asvgd = ASVGD(model, opt, k, num_stein_particles=2)

>>> asvgd_result = asvgd.run(random.key(0), 200, data)

>>> params = asvgd_result.params
>>> predictive = Predictive(model, guide=asvgd.guide, params=params, num_samples=10, batch_ndims=1)
>>> samples = predictive(random.key(1), data=None)

Parameters:

model (Callable) – Python callable with NumPyro primitives for the model.
guide (Callable) – Python callable with NumPyro primitives for the guide.
optim (_NumPyroOptim) – An instance of _NumpyroOptim. Adagrad should be preferred over Adam [1].
kernel_fn (SteinKernel) – Function that computes the reproducing kernel to use with ASVGD. We currently recommend RBFKernel. This may change as criteria for kernel selection are not well understood yet.
num_stein_particles – Number of particles (i.e., mixture components) in the mixture approximation. Default is 10.
num_cycles – The total number of cycles during inference. This corresponds to \(C\) in eq. 4 of [1]. Default is 10.
trans_speed – Speed of transition between two phases during inference. This corresponds to \(p\) in eq. 4 of [1]. Default is 10.

guide_kwargs (Dict) –

Keyword arguments for AutoDelta. Default behaviour is the same as the default for AutoDelta.

Usage:

opt = Adagrad(step_size=0.05)
k = RBFKernel()
asvgd = ASVGD(model, opt, k, guide_kwargs={'init_loc_fn': partial(init_to_uniform, radius=0.1)})

static_kwargs (Dict) – Static keyword arguments for the model and guide. These arguments cannot change during inference.

References:

D’Angelo, Francesco, and Vincent Fortuin. “Annealed Stein Variational Gradient Descent.”
Third Symposium on Advances in Approximate Bayesian Inference, 2021.

SteinVI Kernels

class RBFKernel(mode='norm', matrix_mode='norm_diag', bandwidth_factor: ~collections.abc.Callable[[float], float] = <function RBFKernel.<lambda>>)[source]

Calculates the Gaussian RBF kernel function used in [1]. The kernel is given by

\(k(x,y) = \exp(\frac{-1}{h} \|x-y\|^2)\),

where the bandwidth \(h\) is computed using the median heuristic

\(h = \frac{1}{\log(m)} \text{med}(\|x-y\|)\).

In the above \(m\) is the number of particles.

Parameters:

mode (str) – Either ‘norm’ (default) specifying to take the norm of each particle, ‘vector’ to return a component-wise kernel or ‘matrix’ to return a matrix-valued kernel
matrix_mode (str) – Either ‘norm_diag’ (default) for diagonal filled with the norm kernel or ‘vector_diag’ for diagonal of vector-valued kernel
bandwidth_factor – A multiplier to the bandwidth based on data size n (default 1/log(n))

References:

Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm.”
Advances in neural information processing systems 29 (2016).

class LinearKernel(mode='norm')[source]

Calculates the linear kernel from Theorem 3.3 in [1]. The kernel is given by

\(k(x,y) = x^T y + 1\).

References:

Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent as Moment Matching.”
Advances in Neural Information Processing Systems 31 (2018).

class RandomFeatureKernel(mode='norm', bandwidth_subset=None, bandwidth_factor: ~collections.abc.Callable[[float], float] = <function RandomFeatureKernel.<lambda>>)[source]

Calculates the Gaussian variate of random kernel in eq. 5 and 6 of [1]. The kernel is given by

\(k(x,y)= \frac{1}{m}\sum_{l=1}^{m}\phi(x,w_l)\phi(y,w_l)\),

where \(\phi(\cdot, w)\) are the Gaussian random feature maps in eq. 6. The maps are given by

\(\phi(z, w) = \sqrt{2}\left(h^{-1}w_1^Tz + w_0\right)\),

where \(h\) is the bandwidth computed using the median trick from RBFKernel, \(w_0\sim\text{Uni}([0,2\pi])\) and \(w_1\sim\mathcal{N}(0,I)\).

Parameters:

bandwidth_subset – How many particles should be used to calculate the bandwidth? (default None, meaning all particles)
random_indices – The set of indices which to do random feature expansion on. (default None, meaning all indices)
bandwidth_factor – A multiplier to the bandwidth based on data size n (default 1/log(n))

References:

Liu, Qiang, and Dilin Wang. “Stein Variational Gradient Descent as Moment Matching.”
Advances in Neural Information Processing Systems 31 (2018).

class MixtureKernel(ws: list[float], kernel_fns: list[SteinKernel], mode='norm')[source]

Calculates a mixture of multiple kernels from eq. 1 of [1]. The kernel is given by

\(k(x,y) = \sum_i w_ik_i(x,y)\),

where \(k_i\) is a reproducing kernel and \(w_i \in (0,\infty)\).

Parameters:

ws – Weight of each kernel in the mixture.
kernel_fns – Different kernel functions to mix together.

References:

Ai, Qingzhong, et al. “Stein variational gradient descent with multiple kernels.”
Cognitive Computation 15.2 (2023): 672-682.

class GraphicalKernel(mode='matrix', local_kernel_fns: dict[str, ~numpyro.contrib.einstein.stein_kernels.SteinKernel]=None, default_kernel_fn: SteinKernel = <numpyro.contrib.einstein.stein_kernels.RBFKernel object>)[source]

Calculates the graphical kernel, also called the coordinate-wise kernel, from Theorem 1 in [1]. The kernel is given by

\(k(x,y) = diag({k_l(x_l,y_l)})\),

for coordinate-wise kernels \(k_l\).

Parameters:

local_kernel_fns – A mapping between parameters and a choice of kernel function for that parameter (default to default_kernel_fn for each parameter)
default_kernel_fn – The default choice of kernel function when none is specified for a particular parameter

References:

Wang, Dilin, Zhe Zeng, and Qiang Liu. “Stein variational message passing for continuous graphical models.”
International Conference on Machine Learning. PMLR, 2018.

class ProbabilityProductKernel(guide, scale=1.0, mode='norm')[source]

EXPERIMENTAL Compute the unormalized probability product kernel for Gaussians given by eq. 5 in [1].

References:

Jebara, Tony, Risi Kondor, and Andrew Howard. “Probability product kernels.”
The Journal of Machine Learning Research 5 (2004): 819-844.

class RadialGaussNewtonKernel[source]

The Radial Gauss-Newton Kernel [1,2], also called the scaled Hessian kernel, is a scalar kernel defined as

\[k(x,y) = \exp\left(-\frac{1}{2d}(x-y)^T M (x-y)\right),\]

where \(x,y \in R^d\) are particles and \(M\) is a metric matrix. \(M\) approximates the expected curvature of the log posterior using Hessian approximations \(A(x)\).

The matrix \(M\) is computed using \(m\) particles as follows [2, Eq. 19, p.5]:

\[M= \frac{1}{m} \sum_{i=1}^m A(x_i)\]

with the Hessian approximation given by:

\[A(x) = J(x) J(x)^T,\]

where \(J(x)\) is the Jacobian of an ELBO at \(x\).

References:

Maken, Fahira Afzal, Fabio Ramos, and Lionel Ott. “Stein Particle Filter for Nonlinear,
Non-Gaussian State Estimation.” IEEE Robotics and Automation Letters 7.2 (2022).
Detommaso, Gianluca, et al. “A Stein variational Newton method.”
Advances in Neural Information Processing Systems 31 (2018).

Stochastic Support

class StochasticSupportInference(model: Callable, num_slp_samples: int, max_slps: int)[source]

Bases: ABC

Base class for running inference in programs with stochastic support. Each subclass decomposes the input model into so called straight-line programs (SLPs) which are the different control-flow paths in the model. Inference is then run in each SLP separately and the results are combined to produce an overall posterior.

Note

This implementation assumes that all stochastic branching is done based on the outcomes of discrete sampling sites that are annotated with infer={"branching": True}. For example,

def model():
    model1 = numpyro.sample("model1", dist.Bernoulli(0.5), infer={"branching": True})
    if model1 == 0:
        mean = numpyro.sample("a1", dist.Normal(0.0, 1.0))
    else:
        mean = numpyro.sample("a2", dist.Normal(1.0, 1.0))
    numpyro.sample("obs", dist.Normal(mean, 1.0), obs=0.2)

Parameters:

model – Python callable containing Pyro primitives primitives. local inference. Defaults to NUTS.
num_slp_samples (int) – Number of samples to draw from the prior to discover the straight-line programs (SLPs).
max_slps (int) – Maximum number of SLPs to discover. DCC will not run inference on more than max_slps.

run(rng_key: Array, *args: Any, **kwargs: Any) → DCCResult | SDVIResult[source]

Run inference on each SLP separately and combine the results.

Parameters:

rng_key (jax.random.key) – Random number generator key.
args – Arguments to the model.
kwargs – Keyword arguments to the model.

class DCC(model: ~typing.Callable, mcmc_kwargs: dict[str, ~typing.Any], kernel_cls: type[~numpyro.infer.mcmc.MCMCKernel] = <class 'numpyro.infer.hmc.NUTS'>, num_slp_samples: int = 1000, max_slps: int = 124, proposal_scale: float = 1.0)[source]

Bases: StochasticSupportInference

Implements the Divide, Conquer, and Combine (DCC) algorithm for models with stochastic support from [1].

References:

Divide, Conquer, and Combine: a New Inference Strategy for Probabilistic Programs with Stochastic Support, Yuan Zhou, Hongseok Yang, Yee Whye Teh, Tom Rainforth

Example:

def model():
    model1 = numpyro.sample("model1", dist.Bernoulli(0.5), infer={"branching": True})
    if model1 == 0:
        mean = numpyro.sample("a1", dist.Normal(0.0, 1.0))
    else:
        mean = numpyro.sample("a2", dist.Normal(1.0, 1.0))
    numpyro.sample("obs", dist.Normal(mean, 1.0), obs=0.2)

mcmc_kwargs = dict(
    num_warmup=500, num_samples=1000
)
dcc = DCC(model, mcmc_kwargs=mcmc_kwargs)
dcc_result = dcc.run(random.key(0))

Parameters:

model – Python callable containing Pyro primitives primitives.
mcmc_kwargs (dict) – Dictionary of arguments passed to MCMC.
kernel_cls (numpyro.infer.mcmc.MCMCKernel) – MCMC kernel class that is used for local inference. Defaults to NUTS.
num_slp_samples (int) – Number of samples to draw from the prior to discover the straight-line programs (SLPs).
max_slps (int) – Maximum number of SLPs to discover. DCC will not run inference on more than max_slps.
proposal_scale (float) – Scale parameter for the proposal distribution for estimating the normalization constant of an SLP.

class SDVI(model: Callable, optimizer, svi_num_steps: int = 1000, combine_elbo_particles: int = 1000, guide_init: Callable = <class 'numpyro.infer.autoguide.AutoNormal'>, loss: ELBO = <numpyro.infer.elbo.Trace_ELBO object>, svi_progress_bar: bool = False, num_slp_samples: int = 1000, max_slps: int = 124)[source]

Bases: StochasticSupportInference

Implements the Support Decomposition Variational Inference (SDVI) algorithm for models with stochastic support from [1]. This implementation creates a separate guide for each SLP, trains the guides separately, and then combines the guides by weighting them proportional to their ELBO estimates.

References:

Rethinking Variational Inference for Probabilistic Programs with Stochastic Support, Tim Reichelt, Luke Ong, Tom Rainforth

Example:

def model():
    model1 = numpyro.sample("model1", dist.Bernoulli(0.5), infer={"branching": True})
    if model1 == 0:
        mean = numpyro.sample("a1", dist.Normal(0.0, 1.0))
    else:
        mean = numpyro.sample("a2", dist.Normal(1.0, 1.0))
    numpyro.sample("obs", dist.Normal(mean, 1.0), obs=0.2)

sdvi = SDVI(model, numpyro.optim.Adam(step_size=0.001))
sdvi_result = sdvi.run(random.key(0))

Parameters:

model – Python callable containing Pyro primitives primitives.
optimizer – An instance of _NumpyroOptim, a jax.example_libraries.optimizers.Optimizer or an Optax GradientTransformation. Gets passed to SVI.
svi_num_steps (int) – Number of steps to run SVI for each SLP.
combine_elbo_particles (int) – Number of particles to estimate ELBO for computing SLP weights.
guide_init – A constructor for the guide. This should be a callable that returns a AutoGuide instance. Defaults to AutoNormal.
loss – ELBO loss for SVI. Defaults to Trace_ELBO.
svi_progress_bar (bool) – Whether to use a progress bar for SVI.
num_slp_samples (int) – Number of samples to draw from the prior to discover the straight-line programs (SLPs).
max_slps (int) – Maximum number of SLPs to discover. DCC will not run inference on more than max_slps.

Hilbert Space Gaussian Processes Approximation

This module contains helper functions for use in the Hilbert Space Gaussian Process (HSGP) approximation method described in [1] and [2].

Warning

This module is experimental.

Why do we need an approximation?

Gaussian processes do not scale well with the number of data points. Recall we had to invert the kernel matrix! The computational complexity of the Gaussian process model is \(\mathcal{O}(n^3)\), where \(n\) is the number of data points. The HSGP approximation method is a way to reduce the computational complexity of the Gaussian process model to \(\mathcal{O}(mn + m)\), where \(m\) is the number of basis functions used in the approximation.

Approximation Strategy Steps:

We strongly recommend reading [1] and [2] for a detailed explanation of the approximation method. In [3] you can find a practical approach using NumPyro and PyMC.

Here we provide the main steps and ingredients of the approximation method:

Each stationary kernel \(k\) has an associated spectral density \(S(\omega)\). There are closed formulas for the most common kernels. These formulas depend on the hyperparameters of the kernel (e.g. amplitudes and length scales).

We can approximate the spectral density \(S(\omega)\) as a polynomial series in \(||\omega||\). We call \(\omega\) the frequency.

We can interpret these polynomial terms as “powers” of the Laplacian operator. The key observation is that the Fourier transform of the Laplacian operator is \(||\omega||^2\).

Next, we impose Dirichlet boundary conditions on the Laplacian operator which makes it self-adjoint and with discrete spectrum.

We identify the expansion in (2) with the sum of powers of the Laplacian operator in the eigenbasis of (4).

Let \(m^\star = \prod_{d=1}^D m_d\) be the total number of terms of the approximation, where \(m_d\) is the number of basis functions used in the approximation for the \(d\)-th dimension. Then, the approximation formula, in the non-centered parameterization, is:

\[f(x) \approx \sum_{j = 1}^{m^\star} \overbrace{\color{red}{\left(S(\sqrt{\boldsymbol{\lambda}_j})\right)^{1/2}}}^{\text{all hyperparameters are here!}} \times \underbrace{\color{blue}{\phi_{j}(\boldsymbol{x})}}_{\text{easy to compute!}} \times \overbrace{\color{green}{\beta_{j}}}^{\sim \: \text{Normal}(0,1)}\]

where \(\boldsymbol{x}\) is a \(D\) vector of inputs, \(\boldsymbol{\lambda}_j\) are the eigenvalues of the Laplacian operator, \(\phi_{j}(\boldsymbol{x})\) are the eigenfunctions of the Laplacian operator, and \(\beta_{j}\) are the coefficients of the expansion (see Eq. (8) in [2]). We expect this to be a good approximation for a finite number of \(m^\star\) terms in the series as long as the inputs values \(x\) are not too close to the boundaries \(-L_d\) and \(L_d\).

Note

Even though the periodic kernel is not stationary, one can still adapt and find a similar approximation formula. However, these kernels are not supported for multidimensional inputs. See Appendix B in [2] for more details.

Example:

Here is an example of how to use the HSGP approximation method with NumPyro. We will use the squared exponential kernel. Other kernels can be used similarly.

>>> from jax import random
>>> import jax.numpy as jnp

>>> import numpyro
>>> from numpyro.contrib.hsgp.approximation import hsgp_squared_exponential
>>> import numpyro.distributions as dist
>>> from numpyro.infer import MCMC, NUTS


>>> def generate_synthetic_data(rng_key, start, stop: float, num, scale):
...     """Generate synthetic data."""
...     x = jnp.linspace(start=start, stop=stop, num=num)
...     y = jnp.sin(4 * jnp.pi * x) + jnp.sin(7 * jnp.pi * x)
...     y_obs = y + scale * random.normal(rng_key, shape=(num,))
...     return x, y_obs


>>> rng_key = random.key(seed=42)
>>> rng_key, rng_subkey = random.split(rng_key)
>>> x, y_obs = generate_synthetic_data(
...     rng_key=rng_subkey, start=0, stop=1, num=80, scale=0.3
>>> )


>>> def model(x, ell, m, non_centered, y=None):
...     # --- Priors ---
...     alpha = numpyro.sample("alpha", dist.InverseGamma(concentration=12, rate=10))
...     length = numpyro.sample("length", dist.InverseGamma(concentration=6, rate=1))
...     noise = numpyro.sample("noise", dist.InverseGamma(concentration=12, rate=10))
...     # --- Parametrization ---
...     f = hsgp_squared_exponential(
...         x=x, alpha=alpha, length=length, ell=ell, m=m, non_centered=non_centered
...     )
...     # --- Likelihood ---
...     with numpyro.plate("data", x.shape[0]):
...         numpyro.sample("likelihood", dist.Normal(loc=f, scale=noise), obs=y)


>>> sampler = NUTS(model)
>>> mcmc = MCMC(sampler=sampler, num_warmup=500, num_samples=1_000, num_chains=2)

>>> rng_key, rng_subkey = random.split(rng_key)

>>> ell = 1.3
>>> m = 20
>>> non_centered = True

>>> mcmc.run(rng_subkey, x, ell, m, non_centered, y_obs)

>>> mcmc.print_summary()

              mean       std    median      5.0%     95.0%     n_eff     r_hat
   alpha      1.24      0.34      1.18      0.72      1.74   1804.01      1.00
 beta[0]     -0.10      0.66     -0.10     -1.24      0.92   1819.91      1.00
 beta[1]      0.00      0.71     -0.01     -1.09      1.26   1872.82      1.00
 beta[2]     -0.05      0.69     -0.03     -1.09      1.16   2105.88      1.00
 beta[3]      0.25      0.74      0.26     -0.98      1.42   2281.30      1.00
 beta[4]     -0.17      0.69     -0.17     -1.21      1.00   2551.39      1.00
 beta[5]      0.09      0.75      0.10     -1.13      1.30   3232.13      1.00
 beta[6]     -0.49      0.75     -0.49     -1.65      0.82   3042.31      1.00
 beta[7]      0.42      0.75      0.44     -0.78      1.65   2885.42      1.00
 beta[8]      0.69      0.71      0.71     -0.48      1.82   2811.68      1.00
 beta[9]     -1.43      0.75     -1.40     -2.63     -0.21   2858.68      1.00
beta[10]      0.33      0.71      0.33     -0.77      1.51   2198.65      1.00
beta[11]      1.09      0.73      1.11     -0.23      2.18   2765.99      1.00
beta[12]     -0.91      0.72     -0.91     -2.06      0.31   2586.53      1.00
beta[13]      0.05      0.70      0.04     -1.16      1.12   2569.59      1.00
beta[14]     -0.44      0.71     -0.44     -1.58      0.73   2626.09      1.00
beta[15]      0.69      0.73      0.70     -0.45      1.88   2626.32      1.00
beta[16]      0.98      0.74      0.98     -0.15      2.28   2282.86      1.00
beta[17]     -2.54      0.77     -2.52     -3.82     -1.29   3347.56      1.00
beta[18]      1.35      0.66      1.35      0.30      2.46   2638.17      1.00
beta[19]      1.10      0.54      1.09      0.25      2.01   2428.37      1.00
  length      0.07      0.01      0.07      0.06      0.09   2321.67      1.00
   noise      0.33      0.03      0.33      0.29      0.38   2472.83      1.00

Number of divergences: 0

Note

Additional examples with code can be found in [3], [4] and [5].

References:

Solin, A., Särkkä, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput 30, 419-446 (2020).

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Orduz, J., A Conceptual and Practical Introduction to Hilbert Space GPs Approximation Methods.

Example: Hilbert space approximation for Gaussian processes.

Gelman, Vehtari, Simpson, et al., Bayesian workflow book - Birthdays.

Note

The code of this module is based on the code of the example Example: Hilbert space approximation for Gaussian processes by Omar Sosa Rodríguez.

eigenindices

eigenindices(m: list[int] | int, dim: int) → Array[source]

Returns the indices of the first \(D \times m^\star\) eigenvalues of the laplacian operator.

\[m^\star = \prod_{i=1}^D m_i\]

For more details see Eq. (10) in [1].

References:

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

m (list[int] | int) – The number of desired eigenvalue indices in each dimension. If an integer, the same number of eigenvalues is computed in each dimension.
dim (int) – The dimension of the space.

Returns:

An array of the indices of the first \(D \times m^\star\) eigenvalues.

Return type:

Array

Examples:

>>> import jax.numpy as jnp

>>> from numpyro.contrib.hsgp.laplacian import eigenindices

>>> m = 10
>>> S = eigenindices(m, 1)
>>> assert S.shape == (1, m)
>>> S
Array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]], dtype=int32)

>>> m = 10
>>> S = eigenindices(m, 2)
>>> assert S.shape == (2, 100)

>>> m = [2, 2, 3]  # Riutort-Mayol et al eq (10)
>>> S = eigenindices(m, 3)
>>> assert S.shape == (3, 12)
>>> S
Array([[1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2],
       [1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2],
       [1, 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3]], dtype=int32)

sqrt_eigenvalues

The first \(m^\star \times D\) square root of eigenvalues of the laplacian operator in \([-L_1, L_1] \times ... \times [-L_D, L_D]\). See Eq. (56) in [1].

References:

Solin, A., Särkkä, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput 30, 419-446 (2020)

Parameters:

ell (int | float | list[int | float]) – The length of the interval in each dimension divided by 2. If a float, the same length is used in each dimension.
m (list[int] | int) – The number of eigenvalues to compute in each dimension. If an integer, the same number of eigenvalues is computed in each dimension.
dim (int) – The dimension of the space.

Returns:

An array of the first \(m^\star \times D\) square root of eigenvalues.

Return type:

Array

eigenfunctions

The first \(m^\star\) eigenfunctions of the laplacian operator in \([-L_1, L_1] \times ... \times [-L_D, L_D]\) evaluated at values of x. See Eq. (56) in [1]. If x is 1D, the problem is assumed unidimensional. Otherwise, the dimension of the input space is inferred as the size of the last dimension of x. Other dimensions are treated as batch dimensions.

Example:

>>> import jax.numpy as jnp

>>> from numpyro.contrib.hsgp.laplacian import eigenfunctions

>>> n = 100
>>> m = 10

>>> x = jnp.linspace(-1, 1, n)

>>> basis = eigenfunctions(x=x, ell=1.2, m=m)

>>> assert basis.shape == (n, m)

>>> x = jnp.ones((n, 3))  # 2d input
>>> basis = eigenfunctions(x=x, ell=1.2, m=[2, 2, 3])
>>> assert basis.shape == (n, 12)

References:

Solin, A., Särkkä, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput 30, 419-446 (2020)

Parameters:

x (ArrayLike) – The points at which to evaluate the eigenfunctions. If x is 1D the problem is assumed unidimensional. Otherwise, the dimension of the input space is inferred as the last dimension of x. Other dimensions are treated as batch dimensions.
ell (float | list[float]) – The length of the interval in each dimension divided by 2. If a float, the same length is used in each dimension.
m (int | list[int]) – The number of eigenvalues to compute in each dimension. If an integer, the same number of eigenvalues is computed in each dimension.

Returns:

An array of the first \(m^\star \times D\) eigenfunctions evaluated at x.

Return type:

Array

eigenfunctions_periodic

Basis functions for the approximation of the periodic kernel.

Parameters:

x (ArrayLike) – The points at which to evaluate the eigenfunctions.
w0 (ArrayLike) – The frequency of the periodic kernel.
m (int) – The number of eigenfunctions to compute.

Note

If you want to parameterize it with respect to the period use w0 = 2 * jnp.pi / period.

Warning

Multidimensional inputs are not supported.

spectral_density_squared_exponential

Spectral density of the squared exponential kernel.

See Section 4.2 in [1] and Section 2.1 in [2].

\[S(\boldsymbol{\omega}) = \alpha (\sqrt{2\pi})^D \ell^D \exp\left(-\frac{1}{2} \ell^2 \boldsymbol{\omega}^{T} \boldsymbol{\omega}\right)\]

References:

Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning.

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

dim (int) – dimension
w (ArrayLike) – frequency
alpha (ArrayLike) – amplitude
length (ArrayLike) – length scale

Returns:

spectral density value

Return type:

Array

spectral_density_matern

Spectral density of the Matérn kernel.

See Eq. (4.15) in [1] and Section 2.1 in [2].

\[S(\boldsymbol{\omega}) = \alpha \frac{2^{D} \pi^{D/2} \Gamma(\nu + D/2) (2 \nu)^{\nu}}{\Gamma(\nu) \ell^{2 \nu}} \left(\frac{2 \nu}{\ell^2} + \boldsymbol{\omega}^{T} \boldsymbol{\omega}\right)^{-\nu - D/2}\]

References:

Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning.

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

dim (int) – dimension
nu (float) – smoothness
w (ArrayLike) – frequency
alpha (ArrayLike) – amplitude
length (ArrayLike) – length scale

Returns:

spectral density value

Return type:

Array

diag_spectral_density_squared_exponential

Evaluates the spectral density of the squared exponential kernel at the first \(D \times m^\star\) square root eigenvalues of the laplacian operator in \([-L_1, L_1] \times ... \times [-L_D, L_D]\).

Parameters:

alpha (ArrayLike) – amplitude of the squared exponential kernel
length (ArrayLike) – length scale of the squared exponential kernel
ell (float | int | list[float | int]) – The length of the interval divided by 2 in each dimension. If a float or int, the same length is used in each dimension.
m (int | list[int]) – The number of eigenvalues to compute for each dimension. If an integer, the same number of eigenvalues is computed in each dimension.
dim (int) – The dimension of the space

Returns:

spectral density vector evaluated at the first \(D \times m^\star\) square root eigenvalues

Return type:

Array

diag_spectral_density_matern

Evaluates the spectral density of the Matérn kernel at the first \(D \times m^\star\) square root eigenvalues of the laplacian operator in \([-L_1, L_1] \times ... \times [-L_D, L_D]\).

Parameters:

nu (float) – smoothness parameter
alpha (ArrayLike) – amplitude of the Matérn kernel
length (ArrayLike) – length scale of the Matérn kernel
ell (float | int | list[float | int]) – The length of the interval divided by 2 in each dimension. If a float or int, the same length is used in each dimension.
m (int | list[int]) – The number of eigenvalues to compute for each dimension. If an integer, the same number of eigenvalues is computed in each dimension.
dim (int) – The dimension of the space

Returns:

spectral density vector evaluated at the first \(D \times m^\star\) square root eigenvalues

Return type:

Array

spectral_density_rational_quadratic

Spectral density of the Rational Quadratic kernel.

The Rational Quadratic kernel can be seen as a scale mixture (an infinite sum) of squared exponential kernels with different length scales. As the scale mixture parameter approaches infinity, the kernel converges to the squared exponential kernel.

The spectral density involves the modified Bessel function of the second kind. For the kernel \(k(r) = (1 + r^2/(2 \alpha_{\text{mix}} \ell^2))^{-\alpha_{\text{mix}}}\), the spectral density in \(D\) dimensions is:

\[S(\boldsymbol{\omega}) = \sigma^2 (2\pi)^{D/2} \cdot 2^{1-\alpha_{\text{mix}}} \cdot a \cdot (2\alpha_{\text{mix}})^{(D-1)/2} \cdot \frac{(a|\boldsymbol{\omega}|)^{\alpha_{\text{mix}}-D/2} K_{\alpha_{\text{mix}}-D/2}(a|\boldsymbol{\omega}|)}{\Gamma(\alpha_{\text{mix}})}\]

where \(a = \sqrt{2 \alpha_{\text{mix}}} \cdot \ell\), \(\sigma^2\) is the amplitude, and \(K_\nu\) is the modified Bessel function of the second kind.

For \(\boldsymbol{\omega} \to 0\), we use the asymptotic expansion \(z^\nu K_\nu(z) \to \Gamma(\nu) 2^{\nu-1}\) as \(z \to 0\), giving:

\[S(0) = \sigma^2 \pi^{D/2} a^D (2\alpha_{\text{mix}})^{(D-1)/2} \frac{\Gamma(\alpha_{\text{mix}} - D/2)}{\Gamma(\alpha_{\text{mix}})}\]

Note

The S(0) formula requires \(\alpha_{\text{mix}} > D/2\) to be well-defined. For \(\alpha_{\text{mix}} \leq D/2\), the spectral density diverges at \(\omega = 0\). The HSGP approximation for the RQ kernel may require larger ell values compared to the Squared Exponential kernel due to the heavier tails of the RQ kernel.

References:

Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning.

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

dim (int) – dimension
w (ArrayLike) – frequency
alpha (ArrayLike) – amplitude (σ² in the spectral density formula)
length (ArrayLike) – length scale (scalar, isotropic)
scale_mixture (ArrayLike) – scale mixture parameter (α_mix in the RQ kernel formula). Controls the relative weighting of small-scale and large-scale variations. As scale_mixture → ∞, the kernel converges to the squared exponential kernel.

Returns:

spectral density value

Return type:

Array

diag_spectral_density_rational_quadratic

Evaluates the spectral density of the Rational Quadratic kernel at the first \(D \times m^\star\) square root eigenvalues of the laplacian operator in \([-L_1, L_1] \times ... \times [-L_D, L_D]\).

Parameters:

alpha (ArrayLike) – amplitude of the Rational Quadratic kernel
length (ArrayLike) – length scale of the Rational Quadratic kernel
scale_mixture (ArrayLike) – scale mixture parameter (α in the RQ kernel formula). Controls the relative weighting of small-scale and large-scale variations. As scale_mixture → ∞, the kernel converges to the squared exponential kernel.
ell (float | int | list[float | int]) – The length of the interval divided by 2 in each dimension. If a float or int, the same length is used in each dimension.
m (int | list[int]) – The number of eigenvalues to compute for each dimension. If an integer, the same number of eigenvalues is computed in each dimension.
dim (int) – The dimension of the space

Returns:

spectral density vector evaluated at the first \(D \times m^\star\) square root eigenvalues

Return type:

Array

diag_spectral_density_periodic

Not actually a spectral density but these are used in the same way. These are simply the first m coefficients of the low rank approximation for the periodic kernel. See Appendix B in [1].

References:

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

alpha (ArrayLike) – amplitude
length (ArrayLike) – length scale
m (int) – number of eigenvalues

Returns:

“spectral density” vector

Return type:

Array

hsgp_squared_exponential

Hilbert space Gaussian process approximation using the squared exponential kernel.

The main idea of the approach is to combine the associated spectral density of the squared exponential kernel and the spectrum of the Dirichlet Laplacian operator to obtain a low-rank approximation of the Gram matrix. For more details see [1, 2].

References:

Solin, A., Särkkä, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput 30, 419-446 (2020).

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

x (ArrayLike) – input data
alpha (ArrayLike) – amplitude of the squared exponential kernel
length (ArrayLike) – length scale of the squared exponential kernel
ell (float | int | list[float | int]) – positive value that parametrizes the length of the D-dimensional box so that the input data lies in the interval \([-L_1, L_1] \times ... \times [-L_D, L_E]\). We expect the approximation to be valid within this interval
m (int | list[m]) – number of eigenvalues to compute and include in the approximation for each dimension (\(\left\{1, ..., D\right\}\)). If an integer, the same number of eigenvalues is computed in each dimension.
non_centered (bool) – whether to use a non-centered parameterization. By default, it is set to True

Returns:

the low-rank approximation linear model

Return type:

Array

hsgp_matern

Hilbert space Gaussian process approximation using the Matérn kernel.

The main idea of the approach is to combine the associated spectral density of the Matérn kernel kernel and the spectrum of the Dirichlet Laplacian operator to obtain a low-rank approximation of the Gram matrix. For more details see [1, 2].

References:

Solin, A., Särkkä, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput 30, 419-446 (2020).

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

x (ArrayLike) – input data
nu (float) – smoothness parameter
alpha (ArrayLike) – amplitude of the squared exponential kernel
length (ArrayLike) – length scale of the squared exponential kernel
ell (float | int | list[float | int]) – positive value that parametrizes the length of the D-dimensional box so that the input data lies in the interval \([-L_1, L_1] \times ... \times [-L_D, L_D]\). We expect the approximation to be valid within this interval
m (int | list[m]) – number of eigenvalues to compute and include in the approximation for each dimension (\(\left\{1, ..., D\right\}\)). If an integer, the same number of eigenvalues is computed in each dimension.
non_centered (bool) – whether to use a non-centered parameterization. By default, it is set to True.

Returns:

the low-rank approximation linear model

Return type:

Array

hsgp_rational_quadratic

Hilbert space Gaussian process approximation using the Rational Quadratic kernel.

The Rational Quadratic kernel can be seen as a scale mixture (an infinite sum) of squared exponential kernels with different length scales. As the scale mixture parameter approaches infinity, the kernel converges to the squared exponential kernel.

The main idea of the approach is to combine the associated spectral density of the Rational Quadratic kernel and the spectrum of the Dirichlet Laplacian operator to obtain a low-rank approximation of the Gram matrix. For more details see [1, 2].

Note

Due to the heavier tails of the RQ kernel compared to the Squared Exponential kernel, the HSGP approximation may require larger ell values (e.g., 10 instead of 5) for accurate results, especially for small scale_mixture values.
The spectral density requires scale_mixture > dim/2 for the approximation at \(\omega = 0\) to be well-defined. For example, scale_mixture > 0.5 for 1D, scale_mixture > 1 for 2D, and scale_mixture > 1.5 for 3D.

References:

Solin, A., Särkkä, S. Hilbert space methods for reduced-rank Gaussian process regression. Stat Comput 30, 419-446 (2020).

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

x (ArrayLike) – input data
alpha (ArrayLike) – amplitude of the Rational Quadratic kernel
length (ArrayLike) – length scale of the Rational Quadratic kernel (scalar, isotropic only)
scale_mixture (ArrayLike) – scale mixture parameter (α in the RQ kernel formula). Controls the relative weighting of small-scale and large-scale variations. As scale_mixture → ∞, the kernel converges to the squared exponential kernel.
ell (float | int | list[float | int]) – positive value that parametrizes the length of the D-dimensional box so that the input data lies in the interval \([-L_1, L_1] \times ... \times [-L_D, L_D]\). We expect the approximation to be valid within this interval
m (int | list[m]) – number of eigenvalues to compute and include in the approximation for each dimension (\(\left\{1, ..., D\right\}\)). If an integer, the same number of eigenvalues is computed in each dimension.
non_centered (bool) – whether to use a non-centered parameterization. By default, it is set to True.

Returns:

the low-rank approximation linear model

Return type:

Array

hsgp_periodic_non_centered

Low rank approximation for the periodic squared exponential kernel in the non-centered parametrization.

See Appendix B in [1].

References:

Riutort-Mayol, G., Bürkner, PC., Andersen, M.R. et al. Practical Hilbert space approximate Bayesian Gaussian processes for probabilistic programming. Stat Comput 33, 17 (2023).

Parameters:

x (ArrayLike) – input data
alpha (ArrayLike) – amplitude
length (ArrayLike) – length scale
w0 (ArrayLike) – frequency of the periodic kernel
m (int) – number of eigenvalues to compute and include in the approximation

Returns:

the low-rank approximation linear model

Return type:

Array