Metrics¶
Functions:
|
Calculate coverage and MMD between real and generated jets, using the Energy Mover's Distance as the distance metric. |
|
Calculates the value and error of the Fréchet physics distance (FPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295. |
|
Calculates the Frechet ParticleNet Distance, as defined in https://arxiv.org/abs/2106.11535, for input |
|
Get recommended jet features (36 EFPs) for the FPD and KPD metrics from an input sample of jets. |
|
Calculates the median and error of the kernel physics distance (KPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295. |
|
Get 1-Wasserstein distances between Energy Flow Polynomials (Komiske et al. 2017 https://arxiv.org/abs/1712.07124) of |
|
Get 1-Wasserstein distance between masses of |
|
Get 1-Wasserstein distances between particle features of |
- jetnet.evaluation.cov_mmd(real_jets: Tensor | np.ndarray, gen_jets: Tensor | np.ndarray, num_eval_samples: int = 100, num_batches: int = 10, use_tqdm: bool = True) tuple[float, float] ¶
Calculate coverage and MMD between real and generated jets, using the Energy Mover’s Distance as the distance metric.
- Parameters
real_jets (Tensor | np.ndarray) – Tensor or array of jets, of shape
[num_jets, num_particles, num_features]
with features in order[eta, phi, pt]
gen_jets (Tensor | np.ndarray) – tensor or array of generated jets, same format as real_jets.
num_eval_samples (int) – number of jets out of the real and gen jets each between which to evaluate COV and MMD. Defaults to 100.
num_batches (int) – number of different batches to calculate COV and MMD and average over. Defaults to 100.
use_tqdm (bool) – use tqdm bar while calculating over
num_batches
batches. Defaults to True.
- Returns
float: coverage, averaged over
num_batches
.float: MMD, averaged over
num_batches
.
- Return type
Tuple[float, float]
- jetnet.evaluation.fpd(real_features: Tensor | np.ndarray, gen_features: Tensor | np.ndarray, min_samples: int = 20000, max_samples: int = 50000, num_batches: int = 20, num_points: int = 10, normalise: bool = True, seed: int = 42) tuple[float, float] ¶
Calculates the value and error of the Fréchet physics distance (FPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295.
It is recommended to use input sample sizes of at least 50,000, and the default values for other input parameters for consistency with other measurements.
Similarly, for jets, it is recommended to use the set of EFPs as provided by the
get_fpd_kpd_jet_features
method.- Parameters
real_features (Tensor | np.ndarray) – set of real features of shape
[num_samples, num_features]
.gen_features (Tensor | np.ndarray) – set of generated features of shape
[num_samples, num_features]
.min_samples (int, optional) – min batch size to measure FPD for. Defaults to 20,000.
max_samples (int, optional) – max batch size to measure FPD for. Defaults to 50,000.
num_batches (int, optional) – # of batches to average over for each batch size. Defaults to 20.
num_points (int, optional) – # of points to sample between the min and max samples. Defaults to 10.
normalise (bool, optional) – normalise the individual features over the full sample to have the same scaling. Defaults to True.
seed (int, optional) – random seed. Defaults to 42.
- Returns
value and error of FPD.
- Return type
Tuple[float, float]
- jetnet.evaluation.fpnd(jets: Tensor | np.ndarray, jet_type: str, dataset_name: str = 'jetnet', device: str | None = None, batch_size: int = 16, use_tqdm: bool = True) float ¶
Calculates the Frechet ParticleNet Distance, as defined in https://arxiv.org/abs/2106.11535, for input
jets
of typejet_type
.jets
are passed through our pretrained ParticleNet module and activations are compared with the cached activations from real jets. The recommended and max number of jets is 50,000.torch_geometric must be installed separately for running inference with ParticleNet.
Currently FPND only supported for the JetNet dataset with 30 particles, but functionality for other datasets + ability for users to use their own version is in development.
- Parameters
jets (Tensor | np.ndarray) – Tensor or array of jets, of shape
[num_jets, num_particles, num_features]
with features in order[eta, phi, pt, (optional) mask]
jet_type (str) – jet type, out of
['g', 't', 'q']
.dataset_name (str) – Dataset to use. Currently only JetNet is supported. Defaults to “jetnet”.
device (str) – ‘cpu’ or ‘cuda’. If not specified, defaults to cuda if available else cpu.
batch_size (int) – Batch size for ParticleNet inference. Defaults to 16.
use_tqdm (bool) – use tqdm bar while getting ParticleNet activations. Defaults to True.
- Returns
the measured FPND.
- Return type
float
- jetnet.evaluation.get_fpd_kpd_jet_features(jets: Tensor | np.ndarray, efp_jobs: int | None = None) np.ndarray ¶
Get recommended jet features (36 EFPs) for the FPD and KPD metrics from an input sample of jets.
- Parameters
jets (Tensor | np.ndarray) – Tensor or array of jets, of shape
[num_jets, num_particles, num_features]
with features in order[eta, phi, pt]
.efp_jobs (int, optional) – number of jobs to use for energyflow’s EFP batch computation. None means as many processes as there are CPUs.
- Returns
array of EFPs of shape
[num_jets, 36]
.- Return type
np.ndarray
- jetnet.evaluation.kpd(real_features: Tensor | np.ndarray, gen_features: Tensor | np.ndarray, num_batches: int = 10, batch_size: int = 5000, normalise: bool = True, seed: int = 42, num_threads: int | None = None) tuple[float, float] ¶
Calculates the median and error of the kernel physics distance (KPD) between a set of real and generated features, as defined in https://arxiv.org/abs/2211.10295.
It is recommended to use input sample sizes of at least 50,000, and the default values for other input parameters for consistency with other measurements.
Similarly, for jets, it is recommended to use the set of EFPs as provided by the
get_fpd_kpd_jet_features
method.- Parameters
real_features (Tensor | np.ndarray) – set of real features of shape
[num_samples, num_features]
.gen_features (Tensor | np.ndarray) – set of generated features of shape
[num_samples, num_features]
.num_batches (int, optional) – number of batches to average over. Defaults to 10.
batch_size (int, optional) – size of each batch for which MMD is measured. Defaults to 5,000.
normalise (bool, optional) – normalise the individual features over the full sample to have the same scaling. Defaults to True.
seed (int, optional) – random seed. Defaults to 42.
num_threads (int, optional) – parallelize KPD through numba using this many threads. 0 means numba’s default number of threads, based on # of cores available. Defaults to None, i.e. no parallelization.
- Returns
median and error of KPD.
- Return type
Tuple[float, float]
- jetnet.evaluation.w1efp(jets1: Tensor | np.ndarray, jets2: Tensor | np.ndarray, use_particle_masses: bool = False, efpset_args: list | None = None, num_eval_samples: int = 50000, num_batches: int = 5, return_std: bool = True, efp_jobs: int | None = None)¶
Get 1-Wasserstein distances between Energy Flow Polynomials (Komiske et al. 2017 https://arxiv.org/abs/1712.07124) of
jets1
andjets2
.- Parameters
jets1 (Tensor | np.ndarray) – Tensor or array of jets of shape
[num_jets, num_particles, num_features]
, with features in order[eta, phi, pt, (optional) mass]
. If no particle masses given (particle_masses
should be False), they are assumed to be 0.jets2 (Tensor | np.ndarray) – Tensor or array of jets, of same format as
jets1
.use_particle_masses (bool) – Whether
jets1
andjets2
have particle masses as their 4th particle features. Defaults to False.efpset_args (List) – Args for the energyflow.efpset function to specify which EFPs to use, as defined here https://energyflow.network/docs/efp/#efpset. Defaults to the n=4, d=5, prime EFPs.
num_eval_samples (int) – Number of jets out of the total to use for W1 measurement. Defaults to 50,000.
num_batches (int) – Number of different batches to average W1 scores over. Defaults to 5.
average_over_efps (bool) – Average over the EFPs to return a single W1-EFP score. Defaults to True.
return_std (bool) – Return the standard deviation as well of the W1 scores over the
num_batches
batches. Defaults to True.efp_jobs (int) – number of jobs to use for energyflow’s EFP batch computation. None means as many processes as there are CPUs.
- Returns
np.ndarray: array of average W1 scores for each EFP.
np.ndarray (optional, only if return_std is True): array of std of W1 scores for each feature.
- Return type
Tuple[np.ndarray, np.ndarray]
- jetnet.evaluation.w1m(jets1: Tensor | np.ndarray, jets2: Tensor | np.ndarray, num_eval_samples: int = 50000, num_batches: int = 5, return_std: bool = True)¶
Get 1-Wasserstein distance between masses of
jets1
andjets2
.- Parameters
jets1 (Tensor | np.ndarray) – Tensor or array of jets, of shape
[num_jets, num_particles, num_features]
with features in order[eta, phi, pt, (optional) mass]
jets2 (Tensor | np.ndarray) – Tensor or array of jets, of same format as
jets1
.num_eval_samples (int) – Number of jets out of the total to use for W1 measurement. Defaults to 50,000.
num_batches (int) – Number of different batches to average W1 scores over. Defaults to 5.
return_std (bool) – Return the standard deviation as well of the W1 scores over the
num_batches
batches. Defaults to True.
- Returns
float: W1 mass score, averaged over
num_batches
.float (optional, only if ```return_std`` is True)`: standard deviation of W1 mass scores over
num_batches
.
- Return type
Tuple[float, float]
- jetnet.evaluation.w1p(jets1: Tensor | np.ndarray, jets2: Tensor | np.ndarray, mask1: Tensor | np.ndarray = None, mask2: Tensor | np.ndarray = None, exclude_zeros: bool = True, num_particle_features: int = 0, num_eval_samples: int = 50000, num_batches: int = 5, return_std: bool = True)¶
Get 1-Wasserstein distances between particle features of
jets1
andjets2
.- Parameters
jets1 (Tensor | np.ndarray) – Tensor or array of jets, of shape
[num_jets, num_particles_per_jet, num_features_per_particle]
.jets2 (Tensor | np.ndarray) – Tensor or array of jets, of same format as
jets1
.mask1 (Tensor | np.ndarray) – Optional tensor or array of binary particle masks, of shape
[num_jets, num_particles_per_jet]
or[num_jets, num_particles_per_jet, 1]
. If given, 0-masked particles will be excluded from w1 calculation.mask2 (Tensor | np.ndarray) – Optional tensor or array of same format as
masks2
.exclude_zeros (bool) – Ignore zero-padded particles i.e. those whose whose feature norms are exactly 0. Defaults to True.
num_particle_features (int) – Will return W1 scores of the first
num_particle_features
particle features. If 0, will calculate for all.num_eval_samples (int) – Number of jets out of the total to use for W1 measurement. Defaults to 50,000.
num_batches (int) – Number of different batches to average W1 scores over. Defaults to 5.
return_std (bool) – Return the standard deviation as well of the W1 scores over the
num_batches
batches. Defaults to True.
- Returns
Union[float, np.ndarray]: array of length
num_particle_features
containing average W1 scores for each feature.Union[float, np.ndarray] (optional, only if ``return_std` is True)`: array of length
num_particle_features
containing standard deviation W1 scores for each feature.
- Return type
Tuple[Union[float, np.ndarray], Union[float, np.ndarray]]