Function and class documentation¶

class pyidr.IDR(mean: float = 2.6, sigma: float = 1.3, rho: float = 0.8, prob: float = 0.7, eps: float = 0.001, maxiter: int = 30)¶

Compute the irreproducible recovery rate.

Compute the irreproducible recovery rate (short idr) for two datasets.

mean¶

Mean value for the estimated gaussian.

Type:	float

sigma¶

Sigma value for the estimated gaussian.

Type:	float

rho¶

Correlation coefficient.

Type:	float

prob¶

Mixing proportion of the reproducible component

Type:	float

eps¶

Small epsilon used in calculations.

Type:	float

fit(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → pyidr._base.IDR¶

Fit the parameter to the two datasets using copula mixture models.

Parameters:	dataset1 (np.ndarray) – The first dataset. datasets (np.ndarray) – The second dataset.
Returns:	The fitted IDR object (self).
Return type:	IDR
Raises:	`AssertionError` – Error if both datasets have unequal length.

fit_predict(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray¶

Call fit and predict. Same as calling first ‘fit’ and then ‘predict’.

Parameters:	dataset1 (np.ndarray) – The first dataset. dataset2 (np.ndarray) – The second dataset.
Returns:	The local idr for each observation (i.e. estimated conditionalprobablility for each observation to belong to the irreproducible component.
Return type:	np.ndarray
Raises:	`AssertionError` – Error if both datasets have unequal length.

static get_correspondence(dataset1: numpy.ndarray, dataset2: numpy.ndarray, right_percent: numpy.ndarray) → pyidr._base.CorrespondenceProfile¶

Compute the correspondance profile.

Ranks both datasets and computes the correspondence profile.

Parameters:	dataset1 (np.ndarray) – The first dataset. dataset2 (np.ndarray) – The second dataset. right_percent (np.ndarray) – The right-tail percentage. A numeric vector between 0 and 1 in ascending order.
Returns:	The correspondance profile.
Return type:	CorrespondenceProfile

get_params() → Dict[str, float]¶

Get parameter from model.

Returns:	All parameters.
Return type:	Dict[str, float]

predict(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray¶

Predict the local idr for the two datasets.

Parameters:	dataset1 (np.ndarray) – The first dataset. dataset2 (np.ndarray) – The second dataset.
Returns:	The local idr for each observation (i.e. estimated conditionalprobablility for each observation to belong to the irreproducible component.
Return type:	np.ndarray
Raises:	`AssertionError` – Error if both datasets have unequal length.

predict_global(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray¶

Compute the expected idr for all observations.

Parameters:	dataset1 (np.ndarray) – The first dataset. dataset2 (np.ndarray) – The second dataset.
Returns:	The expected irreproducible discovery rate for observations that are as irreproducible or more irreproducible than the given observations.
Return type:	np.ndarray
Raises:	`AssertionError` – Error if both datasets have unequal length.

proportion¶

Return the proportion of reproducible component for a fitted model.

Returns:	The proportion of reproducible component.
Return type:	float

set_params(**params) → pyidr._base.IDR¶

Set parameters to model.

Allowed parameters are: ‘prob’,’rho’,’mean’,’sigma’,’eps’ and ‘maxiter’.

Returns:	Updated IDR object (self).
Return type:	IDR

class pyidr.CorrespondenceProfile(ranking_1: numpy.ndarray, ranking_2: numpy.ndarray, right_percent: numpy.ndarray)¶

Compute the correspondence profile.

Compute the correspondence profile for two ranked datasets with a given right-tail probability.

get_dpsi(scale: bool = False) → Dict[str, Union[Dict[str, numpy.ndarray], numpy.ndarray, float]]¶

Compute the derivative of the correspondence profile.

Derivative of correspondence profile as percentage or as number of observations when ‘scale=True’.

Parameters:	scale (bool) – scale by the total number of samples. Defaults to ‘False’.
Returns:	t: upper percentage for dpsi value: dpsi smoothed_line: smoothing spline jump_point: the index of the vector of t such that dpsi(t[jump.point]) jumps up due to ties at the low values. This only happends when data consists of a large number of discrete values, e.g. values imputed for observations appearing on only one replicate.
Return type:	Dict[str, Union[Dict[str, np.ndarray], np.ndarray, float]]

get_psi(scale: bool = False) → Dict[str, Union[Dict[str, numpy.ndarray], numpy.ndarray, float]]¶

Compute the correspondence profile.

Correspondence profile as percentage or as number of observations when ‘scale=True’.

Parameters:	scale (bool) – scale by the total number of samples. Defaults to ‘False’.
Returns:	t: upper percentage for psi or number of top ranked observations value: psi smoothed_line: smoothing spline jump_point: the index of the vector of t such that psi(t[jump.point]) jumps up due to ties at the low values. This only happends when data consists of a large number of discrete values, e.g. values imputed for observations appearing on only one replicate.
Return type:	Dict[str, Union[Dict[str, np.ndarray], np.ndarray, float]]

plot_diagnostics() → Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]¶

Plot Psi and Psi’ vs the right-tail percentage given as user input.

Returns:	The produced figure.
Return type:	Tuple[matplotlib.figure.Figure, matplotlib.axes.Axes]