Function and class documentation

class pyidr.IDR(mean: float = 2.6, sigma: float = 1.3, rho: float = 0.8, prob: float = 0.7, eps: float = 0.001, maxiter: int = 30)

Compute the irreproducible recovery rate.

Compute the irreproducible recovery rate (short idr) for two datasets.

mean

Mean value for the estimated gaussian.

Type:float
sigma

Sigma value for the estimated gaussian.

Type:float
rho

Correlation coefficient.

Type:float
prob

Mixing proportion of the reproducible component

Type:float
eps

Small epsilon used in calculations.

Type:float
fit(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → pyidr._base.IDR

Fit the parameter to the two datasets using copula mixture models.

Parameters:
  • dataset1 (np.ndarray) – The first dataset.
  • datasets (np.ndarray) – The second dataset.
Returns:

The fitted IDR object (self).

Return type:

IDR

Raises:

AssertionError – Error if both datasets have unequal length.

fit_predict(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray

Call fit and predict. Same as calling first ‘fit’ and then ‘predict’.

Parameters:
  • dataset1 (np.ndarray) – The first dataset.
  • dataset2 (np.ndarray) – The second dataset.
Returns:

The local idr for each observation (i.e. estimated conditionalprobablility for each observation to belong to the irreproducible component.

Return type:

np.ndarray

Raises:

AssertionError – Error if both datasets have unequal length.

static get_correspondence(dataset1: numpy.ndarray, dataset2: numpy.ndarray, right_percent: numpy.ndarray) → pyidr._base.CorrespondenceProfile

Compute the correspondance profile.

Ranks both datasets and computes the correspondence profile.

Parameters:
  • dataset1 (np.ndarray) – The first dataset.
  • dataset2 (np.ndarray) – The second dataset.
  • right_percent (np.ndarray) – The right-tail percentage. A numeric vector between 0 and 1 in ascending order.
Returns:

The correspondance profile.

Return type:

CorrespondenceProfile

get_params() → Dict[str, float]

Get parameter from model.

Returns:All parameters.
Return type:Dict[str, float]
predict(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray

Predict the local idr for the two datasets.

Parameters:
  • dataset1 (np.ndarray) – The first dataset.
  • dataset2 (np.ndarray) – The second dataset.
Returns:

The local idr for each observation (i.e. estimated conditionalprobablility for each observation to belong to the irreproducible component.

Return type:

np.ndarray

Raises:

AssertionError – Error if both datasets have unequal length.

predict_global(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray

Compute the expected idr for all observations.

Parameters:
  • dataset1 (np.ndarray) – The first dataset.
  • dataset2 (np.ndarray) – The second dataset.
Returns:

The expected irreproducible discovery rate for observations that are as irreproducible or more irreproducible than the given observations.

Return type:

np.ndarray

Raises:

AssertionError – Error if both datasets have unequal length.

proportion

Return the proportion of reproducible component for a fitted model.

Returns:The proportion of reproducible component.
Return type:float
set_params(**params) → pyidr._base.IDR

Set parameters to model.

Allowed parameters are: ‘prob’,’rho’,’mean’,’sigma’,’eps’ and ‘maxiter’.

Returns:Updated IDR object (self).
Return type:IDR
class pyidr.CorrespondenceProfile(ranking_1: numpy.ndarray, ranking_2: numpy.ndarray, right_percent: numpy.ndarray)

Compute the correspondence profile.

Compute the correspondence profile for two ranked datasets with a given right-tail probability.

get_dpsi(scale: bool = False) → Dict[str, Union[Dict[str, numpy.ndarray], numpy.ndarray, float]]

Compute the derivative of the correspondence profile.

Derivative of correspondence profile as percentage or as number of observations when ‘scale=True’.
Parameters:scale (bool) – scale by the total number of samples. Defaults to ‘False’.
Returns:
t:
upper percentage for dpsi
value:
dpsi
smoothed_line:
smoothing spline
jump_point:
the index of the vector of t such that dpsi(t[jump.point]) jumps up due to ties at the low values. This only happends when data consists of a large number of discrete values, e.g. values imputed for observations appearing on only one replicate.
Return type:Dict[str, Union[Dict[str, np.ndarray], np.ndarray, float]]
get_psi(scale: bool = False) → Dict[str, Union[Dict[str, numpy.ndarray], numpy.ndarray, float]]

Compute the correspondence profile.

Correspondence profile as percentage or as number of observations when ‘scale=True’.
Parameters:scale (bool) – scale by the total number of samples. Defaults to ‘False’.
Returns:
t:
upper percentage for psi or number of top ranked observations
value:
psi
smoothed_line:
smoothing spline
jump_point:
the index of the vector of t such that psi(t[jump.point]) jumps up due to ties at the low values. This only happends when data consists of a large number of discrete values, e.g. values imputed for observations appearing on only one replicate.
Return type:Dict[str, Union[Dict[str, np.ndarray], np.ndarray, float]]
plot_diagnostics() → Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]

Plot Psi and Psi’ vs the right-tail percentage given as user input.

Returns:The produced figure.
Return type:Tuple[matplotlib.figure.Figure, matplotlib.axes.Axes]