Function and class documentation¶
-
class
pyidr.IDR(mean: float = 2.6, sigma: float = 1.3, rho: float = 0.8, prob: float = 0.7, eps: float = 0.001, maxiter: int = 30)¶ Compute the irreproducible recovery rate.
Compute the irreproducible recovery rate (short idr) for two datasets.
-
mean¶ Mean value for the estimated gaussian.
Type: float
-
sigma¶ Sigma value for the estimated gaussian.
Type: float
-
rho¶ Correlation coefficient.
Type: float
-
prob¶ Mixing proportion of the reproducible component
Type: float
-
eps¶ Small epsilon used in calculations.
Type: float
-
fit(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → pyidr._base.IDR¶ Fit the parameter to the two datasets using copula mixture models.
Parameters: - dataset1 (np.ndarray) – The first dataset.
- datasets (np.ndarray) – The second dataset.
Returns: The fitted IDR object (self).
Return type: Raises: AssertionError– Error if both datasets have unequal length.
-
fit_predict(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray¶ Call fit and predict. Same as calling first ‘fit’ and then ‘predict’.
Parameters: - dataset1 (np.ndarray) – The first dataset.
- dataset2 (np.ndarray) – The second dataset.
Returns: The local idr for each observation (i.e. estimated conditionalprobablility for each observation to belong to the irreproducible component.
Return type: np.ndarray
Raises: AssertionError– Error if both datasets have unequal length.
-
static
get_correspondence(dataset1: numpy.ndarray, dataset2: numpy.ndarray, right_percent: numpy.ndarray) → pyidr._base.CorrespondenceProfile¶ Compute the correspondance profile.
Ranks both datasets and computes the correspondence profile.
Parameters: - dataset1 (np.ndarray) – The first dataset.
- dataset2 (np.ndarray) – The second dataset.
- right_percent (np.ndarray) – The right-tail percentage. A numeric vector between 0 and 1 in ascending order.
Returns: The correspondance profile.
Return type:
-
get_params() → Dict[str, float]¶ Get parameter from model.
Returns: All parameters. Return type: Dict[str, float]
-
predict(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray¶ Predict the local idr for the two datasets.
Parameters: - dataset1 (np.ndarray) – The first dataset.
- dataset2 (np.ndarray) – The second dataset.
Returns: The local idr for each observation (i.e. estimated conditionalprobablility for each observation to belong to the irreproducible component.
Return type: np.ndarray
Raises: AssertionError– Error if both datasets have unequal length.
-
predict_global(dataset1: numpy.ndarray, dataset2: numpy.ndarray) → numpy.ndarray¶ Compute the expected idr for all observations.
Parameters: - dataset1 (np.ndarray) – The first dataset.
- dataset2 (np.ndarray) – The second dataset.
Returns: The expected irreproducible discovery rate for observations that are as irreproducible or more irreproducible than the given observations.
Return type: np.ndarray
Raises: AssertionError– Error if both datasets have unequal length.
-
proportion¶ Return the proportion of reproducible component for a fitted model.
Returns: The proportion of reproducible component. Return type: float
-
-
class
pyidr.CorrespondenceProfile(ranking_1: numpy.ndarray, ranking_2: numpy.ndarray, right_percent: numpy.ndarray)¶ Compute the correspondence profile.
Compute the correspondence profile for two ranked datasets with a given right-tail probability.
-
get_dpsi(scale: bool = False) → Dict[str, Union[Dict[str, numpy.ndarray], numpy.ndarray, float]]¶ Compute the derivative of the correspondence profile.
Derivative of correspondence profile as percentage or as number of observations when ‘scale=True’.Parameters: scale (bool) – scale by the total number of samples. Defaults to ‘False’. Returns: - t:
- upper percentage for dpsi
- value:
- dpsi
- smoothed_line:
- smoothing spline
- jump_point:
- the index of the vector of t such that dpsi(t[jump.point]) jumps up due to ties at the low values. This only happends when data consists of a large number of discrete values, e.g. values imputed for observations appearing on only one replicate.
Return type: Dict[str, Union[Dict[str, np.ndarray], np.ndarray, float]]
-
get_psi(scale: bool = False) → Dict[str, Union[Dict[str, numpy.ndarray], numpy.ndarray, float]]¶ Compute the correspondence profile.
Correspondence profile as percentage or as number of observations when ‘scale=True’.Parameters: scale (bool) – scale by the total number of samples. Defaults to ‘False’. Returns: - t:
- upper percentage for psi or number of top ranked observations
- value:
- psi
- smoothed_line:
- smoothing spline
- jump_point:
- the index of the vector of t such that psi(t[jump.point]) jumps up due to ties at the low values. This only happends when data consists of a large number of discrete values, e.g. values imputed for observations appearing on only one replicate.
Return type: Dict[str, Union[Dict[str, np.ndarray], np.ndarray, float]]
-
plot_diagnostics() → Tuple[matplotlib.figure.Figure, matplotlib.axes._axes.Axes]¶ Plot Psi and Psi’ vs the right-tail percentage given as user input.
Returns: The produced figure. Return type: Tuple[matplotlib.figure.Figure, matplotlib.axes.Axes]
-