Metrics¶

Functions:

`cov_mmd`(real_jets, gen_jets[, ...])	Calculate coverage and MMD between real and generated jets, using the Energy Mover's Distance as the distance metric.
`fpd`(real_features, gen_features[, ...])	Calculates the value and error of the Fréchet physics distance (FPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295.
`fpnd`(jets, jet_type[, dataset_name, device, ...])	Calculates the Frechet ParticleNet Distance, as defined in https://arxiv.org/abs/2106.11535, for input `jets` of type `jet_type`.
`get_fpd_kpd_jet_features`(jets[, efp_jobs])	Get recommended jet features (36 EFPs) for the FPD and KPD metrics from an input sample of jets.
`kpd`(real_features, gen_features[, ...])	Calculates the median and error of the kernel physics distance (KPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295.
`w1efp`(jets1, jets2[, use_particle_masses, ...])	Get 1-Wasserstein distances between Energy Flow Polynomials (Komiske et al. 2017 https://arxiv.org/abs/1712.07124) of `jets1` and `jets2`.
`w1m`(jets1, jets2[, num_eval_samples, ...])	Get 1-Wasserstein distance between masses of `jets1` and `jets2`.
`w1p`(jets1, jets2[, mask1, mask2, ...])	Get 1-Wasserstein distances between particle features of `jets1` and `jets2`.

jetnet.evaluation.cov_mmd(real_jets: Tensor | np.ndarray, gen_jets: Tensor | np.ndarray, num_eval_samples: int = 100, num_batches: int = 10, use_tqdm: bool = True) → tuple[float, float]¶

Calculate coverage and MMD between real and generated jets, using the Energy Mover’s Distance as the distance metric.

Parameters

real_jets (Tensor | np.ndarray) – Tensor or array of jets, of shape [num_jets, num_particles, num_features] with features in order [eta, phi, pt]
gen_jets (Tensor | np.ndarray) – tensor or array of generated jets, same format as real_jets.
num_eval_samples (int) – number of jets out of the real and gen jets each between which to evaluate COV and MMD. Defaults to 100.
num_batches (int) – number of different batches to calculate COV and MMD and average over. Defaults to 100.
use_tqdm (bool) – use tqdm bar while calculating over num_batches batches. Defaults to True.

Returns

float: coverage, averaged over num_batches.
float: MMD, averaged over num_batches.

Return type

Tuple[float, float]

jetnet.evaluation.fpd(real_features: Tensor | np.ndarray, gen_features: Tensor | np.ndarray, min_samples: int = 20000, max_samples: int = 50000, num_batches: int = 20, num_points: int = 10, normalise: bool = True, seed: int = 42) → tuple[float, float]¶

Calculates the value and error of the Fréchet physics distance (FPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295.

It is recommended to use input sample sizes of at least 50,000, and the default values for other input parameters for consistency with other measurements.

Similarly, for jets, it is recommended to use the set of EFPs as provided by the get_fpd_kpd_jet_features method.

Parameters

real_features (Tensor | np.ndarray) – set of real features of shape [num_samples, num_features].
gen_features (Tensor | np.ndarray) – set of generated features of shape [num_samples, num_features].
min_samples (int, optional) – min batch size to measure FPD for. Defaults to 20,000.
max_samples (int, optional) – max batch size to measure FPD for. Defaults to 50,000.
num_batches (int, optional) – # of batches to average over for each batch size. Defaults to 20.
num_points (int, optional) – # of points to sample between the min and max samples. Defaults to 10.
normalise (bool, optional) – normalise the individual features over the full sample to have the same scaling. Defaults to True.
seed (int, optional) – random seed. Defaults to 42.

Returns

value and error of FPD.

Return type

Tuple[float, float]

jetnet.evaluation.fpnd(jets: Tensor | np.ndarray, jet_type: str, dataset_name: str = 'jetnet', device: str | None = None, batch_size: int = 16, use_tqdm: bool = True) → float¶

Calculates the Frechet ParticleNet Distance, as defined in https://arxiv.org/abs/2106.11535, for input jets of type jet_type.

jets are passed through our pretrained ParticleNet module and activations are compared with the cached activations from real jets. The recommended and max number of jets is 50,000.

torch_geometric must be installed separately for running inference with ParticleNet.

Currently FPND only supported for the JetNet dataset with 30 particles, but functionality for other datasets + ability for users to use their own version is in development.

Parameters

jets (Tensor | np.ndarray) – Tensor or array of jets, of shape [num_jets, num_particles, num_features] with features in order [eta, phi, pt, (optional) mask]
jet_type (str) – jet type, out of ['g', 't', 'q'].
dataset_name (str) – Dataset to use. Currently only JetNet is supported. Defaults to “jetnet”.
device (str) – ‘cpu’ or ‘cuda’. If not specified, defaults to cuda if available else cpu.
batch_size (int) – Batch size for ParticleNet inference. Defaults to 16.
use_tqdm (bool) – use tqdm bar while getting ParticleNet activations. Defaults to True.

Returns

the measured FPND.

Return type

float

jetnet.evaluation.get_fpd_kpd_jet_features(jets: Tensor | np.ndarray, efp_jobs: int | None = None) → np.ndarray¶

Get recommended jet features (36 EFPs) for the FPD and KPD metrics from an input sample of jets.

Parameters

jets (Tensor | np.ndarray) – Tensor or array of jets, of shape [num_jets, num_particles, num_features] with features in order [eta, phi, pt].
efp_jobs (int, optional) – number of jobs to use for energyflow’s EFP batch computation. None means as many processes as there are CPUs.

Returns

array of EFPs of shape [num_jets, 36].

Return type

np.ndarray

jetnet.evaluation.kpd(real_features: Tensor | np.ndarray, gen_features: Tensor | np.ndarray, num_batches: int = 10, batch_size: int = 5000, normalise: bool = True, seed: int = 42, num_threads: int | None = None) → tuple[float, float]¶

Calculates the median and error of the kernel physics distance (KPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295.

It is recommended to use input sample sizes of at least 50,000, and the default values for other input parameters for consistency with other measurements.

Similarly, for jets, it is recommended to use the set of EFPs as provided by the get_fpd_kpd_jet_features method.

Parameters

real_features (Tensor | np.ndarray) – set of real features of shape [num_samples, num_features].
gen_features (Tensor | np.ndarray) – set of generated features of shape [num_samples, num_features].
num_batches (int, optional) – number of batches to average over. Defaults to 10.
batch_size (int, optional) – size of each batch for which MMD is measured. Defaults to 5,000.
normalise (bool, optional) – normalise the individual features over the full sample to have the same scaling. Defaults to True.
seed (int, optional) – random seed. Defaults to 42.
num_threads (int, optional) – parallelize KPD through numba using this many threads. 0 means numba’s default number of threads, based on # of cores available. Defaults to None, i.e. no parallelization.

Returns

median and error of KPD.

Return type

Tuple[float, float]

jetnet.evaluation.w1efp(jets1: Tensor | np.ndarray, jets2: Tensor | np.ndarray, use_particle_masses: bool = False, efpset_args: list | None = None, num_eval_samples: int = 50000, num_batches: int = 5, return_std: bool = True, efp_jobs: int | None = None)¶

Get 1-Wasserstein distances between Energy Flow Polynomials (Komiske et al. 2017 https://arxiv.org/abs/1712.07124) of jets1 and jets2.

Parameters

jets1 (Tensor | np.ndarray) – Tensor or array of jets of shape [num_jets, num_particles, num_features], with features in order [eta, phi, pt, (optional) mass]. If no particle masses given (particle_masses should be False), they are assumed to be 0.
jets2 (Tensor | np.ndarray) – Tensor or array of jets, of same format as jets1.
use_particle_masses (bool) – Whether jets1 and jets2 have particle masses as their 4th particle features. Defaults to False.
efpset_args (List) – Args for the energyflow.efpset function to specify which EFPs to use, as defined here https://energyflow.network/docs/efp/#efpset. Defaults to the n=4, d=5, prime EFPs.
num_eval_samples (int) – Number of jets out of the total to use for W1 measurement. Defaults to 50,000.
num_batches (int) – Number of different batches to average W1 scores over. Defaults to 5.
average_over_efps (bool) – Average over the EFPs to return a single W1-EFP score. Defaults to True.
return_std (bool) – Return the standard deviation as well of the W1 scores over the num_batches batches. Defaults to True.
efp_jobs (int) – number of jobs to use for energyflow’s EFP batch computation. None means as many processes as there are CPUs.

Returns

np.ndarray: array of average W1 scores for each EFP.
np.ndarray (optional, only if return_std is True): array of std of W1 scores for each feature.

Return type

Tuple[np.ndarray, np.ndarray]

jetnet.evaluation.w1m(jets1: Tensor | np.ndarray, jets2: Tensor | np.ndarray, num_eval_samples: int = 50000, num_batches: int = 5, return_std: bool = True)¶

Get 1-Wasserstein distance between masses of jets1 and jets2.

Parameters

jets1 (Tensor | np.ndarray) – Tensor or array of jets, of shape [num_jets, num_particles, num_features] with features in order [eta, phi, pt, (optional) mass]
jets2 (Tensor | np.ndarray) – Tensor or array of jets, of same format as jets1.
num_eval_samples (int) – Number of jets out of the total to use for W1 measurement. Defaults to 50,000.
num_batches (int) – Number of different batches to average W1 scores over. Defaults to 5.
return_std (bool) – Return the standard deviation as well of the W1 scores over the num_batches batches. Defaults to True.

Returns

float: W1 mass score, averaged over num_batches.
float (optional, only if ```return_std`` is True)`: standard deviation of W1 mass scores over num_batches.

Return type

Tuple[float, float]

jetnet.evaluation.w1p(jets1: Tensor | np.ndarray, jets2: Tensor | np.ndarray, mask1: Tensor | np.ndarray = None, mask2: Tensor | np.ndarray = None, exclude_zeros: bool = True, num_particle_features: int = 0, num_eval_samples: int = 50000, num_batches: int = 5, return_std: bool = True)¶

Get 1-Wasserstein distances between particle features of jets1 and jets2.

Parameters

jets1 (Tensor | np.ndarray) – Tensor or array of jets, of shape [num_jets, num_particles_per_jet, num_features_per_particle].
jets2 (Tensor | np.ndarray) – Tensor or array of jets, of same format as jets1.
mask1 (Tensor | np.ndarray) – Optional tensor or array of binary particle masks, of shape [num_jets, num_particles_per_jet] or [num_jets, num_particles_per_jet, 1]. If given, 0-masked particles will be excluded from w1 calculation.
mask2 (Tensor | np.ndarray) – Optional tensor or array of same format as masks2.
exclude_zeros (bool) – Ignore zero-padded particles i.e. those whose whose feature norms are exactly 0. Defaults to True.
num_particle_features (int) – Will return W1 scores of the first num_particle_features particle features. If 0, will calculate for all.
num_eval_samples (int) – Number of jets out of the total to use for W1 measurement. Defaults to 50,000.
num_batches (int) – Number of different batches to average W1 scores over. Defaults to 5.
return_std (bool) – Return the standard deviation as well of the W1 scores over the num_batches batches. Defaults to True.

Returns

Union[float, np.ndarray]: array of length num_particle_features containing average W1 scores for each feature.
Union[float, np.ndarray] (optional, only if ``return_std` is True)`: array of length num_particle_features containing standard deviation W1 scores for each feature.

Return type

Tuple[Union[float, np.ndarray], Union[float, np.ndarray]]