Models

Mlp Classification and Regression

We have implemented a Mlp Classifier and Mlp Regressor with an interface similar to the one in scikit-learn. Our implementation is much faster than the one in scikit-learn since it is based on pytorch and runs on GPUs. Moreover, we have reimplemented the method described in the Unsupervised Label Noise Modeling and Loss Correction for classification with noisy labels. The use-case for the MlpRegressor and MlpClassifier is to check whether the extracted features can be used to recapitulate biological annotations. For example, can the features be used to classify the the tissue-condition (“wild-type” vs “disease”)? Or, can the features be used to regress the Moran’s I score?

class BaseEstimator(hidden_dims: List[int] = None, hidden_activation: str = 'relu', batch_size: int = 256, solver: str = 'adam', alpha: float = 0.99, momentum: float = 0.9, betas: Tuple[float, float] = (0.9, 0.999), warm_up_epochs: int = 0, warm_down_epochs: int = 0, max_epochs: int = 200, min_learning_rate: float = 0.0001, max_learning_rate: float = 0.001, min_weight_decay: float = 0.0001, max_weight_decay: float = 0.0001, **kargs)[source]

Abstract Base Class which implements an interface similar to the MLP classifier/regressor in scikit-learn. The classes PlRegressor and PlClassifier inherit from this class.

class MlpRegressor(output_activation: torch.nn.Module = torch.nn.Identity, **kargs)[source]

Bases: BaseEstimator

Mlp regressor with interface similar to scikit-learn but able to run on GPUs.

fit(X, y) None[source]

Fit the model.

Parameters:
  • X – independent variable of shape \((n, *)\)

  • y – dependent variable of shape \((n)\)

property is_classifier

Returns False. For compatibility with scikit-learn interface.

property is_regressor

Returns True. For compatibility with scikit-learn interface.

predict(X) numpy.ndarray

Run the model forward to obtain the predictions, i.e. \(y_\text{pred} = \text{model}(X)\).

Parameters:

X – independent variable of shape \((n, *)\)

Returns:

y – the predicted values of shape \((n)\)

score(X, y) float

Compute the predictions, i.e. \(y_\text{pred} = \text{model}(X)\), and score them against the true values y.

Parameters:
  • X – independent variable of shape \((n, *)\)

  • y – dependent variable of shape \((n)\)

Returns:

score – R^2 (coefficient of determination) between \(y_\text{pred}\) and y.

class MlpClassifier(noisy_labels: bool = False, bootstrap_epoch_start: int = 100, lambda_reg: float = 1.0, hard_bootstrapping: bool = False, **kargs)[source]

Bases: BaseEstimator

Mlp classifier with interface similar to scikit-learn but able to run on GPUs.

It can performs classification with noisy labels following the method described in Unsupervised Label Noise Modeling and Loss Correction According to this method, the labels are dynamically corrected according to the formula:

\(l_\text{new} = (1.0-w) \times l_\text{old} + w \times p_\text{net}\)

where \(l_\text{old}\) are the noisy (and one-hot) original labels, \(p_\text{net}\) are the probabilities computed by the neural network and w is the probability of label being incorrect. w is computed by solving the assignment problem for a 2-component Mixture Model. This is based on the idea that correct (incorrect) labels will lead to small (large) losses. Therefore correct labels will belong to the low-loss component and incorrect label will belong to the high-loss component.

fit(X, y)[source]

Fit the model.

Parameters:
  • X – independent variable of shape \((n, *)\)

  • y – dependent variable of shape \((n)\)

property is_classifier

Returns True. For compatibility with scikit-learn interface.

property is_regressor

Returns False. For compatibility with scikit-learn interface.

predict(X) numpy.ndarray

Run the model forward to obtain the predictions, i.e. \(y_\text{pred} = \text{model}(X)\).

Parameters:

X – independent variable of shape \((n, *)\)

Returns:

y – the predicted values of shape \((n)\)

predict_log_proba(X) numpy.ndarray

Compute the log_probabilities for all the classes.

Parameters:

X – independent variable of shape \((n, *)\)

Returns:

log_p – Log_Probability of all the classes of shape \((n, C)\) where C is the number of classes.

predict_proba(X) numpy.ndarray

Compute the probabilities for all the classes.

Parameters:

X – independent variable of shape \((n, *)\)

Returns:

prob – Probability of all the classes of shape \((n, C)\) where C is the number of classes.

score(X, y) float

Compute the predictions, i.e. \(y_\text{pred} = \text{model}(X)\), and score them against the true values y.

Parameters:
  • X – independent variable of shape \((n, *)\)

  • y – dependent variable of shape \((n)\)

Returns:

accuracy – Accuracy classification score

Self Supervised Models

We have implemented multiple self-supervised learning (ssl) models. All these models ingest image patches. The data augmentation strategy and loss function depends on the ssl framework chosen. After training, these models can be used to compute features for new image patches. All ssl models inherit from the base class tissuemosaic.models.ssl_models.SslModelBase which is responsible for the validation (which is common to all ssl models) and logging.

Patch Analyzers

We have implemented two classes tissuemosaic.models.patch_analyzer.patch_analyzer.Composition and tissuemosaicmodels.patch_analyzer.patch_analyzer.SpatialAutocorrelation which can be used to extract annotations from image patches. Together with the other models described in Self Supervised Models and Mlp Classification and Regression these allow to answer interesting questions such as: “Can the patch embedding be used to predict the cellular-composition of a patch?” or “Can the patch embedding be used to predict the Moran’s I score of a patch?”.

class Composition(return_fraction: bool = True)[source]

Counts the number of elements in every channel and return their raw values or their normalized frequencies.

__call__(data: torch.Tensor | torch.sparse.Tensor | SparseImage | List[torch.sparse.Tensor] | List[SparseImage], windows: Tuple[float, float, float, float] = None) torch.Tensor[source]

Count the intensity for each channel in a 2D window.

Parameters:
  • data – torch.Tensor or torch.sparse.Tensor or SparseImage (or list thereof) corresponding to a spatial data of shape \((C, W, H)\)

  • windows – tuple with (min_row, min_col, max_row, max_col). If None (default) the entire image is considered.

Returns:

composition – A vector of size C with the count for each channel (or list thereof).

class SpatialAutocorrelation(modality: str = 'moran', n_neighbours: int | None = None, radius: float | None = None, neigh_correct: bool = True)[source]

Compute the Moran’s I or Geary’s C score of a sparse torch tensor. If the sparse tensor has ch channels it will produce ch scores, each indicating how each channel is dispersed in all the others.

Note

The results of a Moran’s I and Geary’s C tests depend on the choice of the weight matrix (see Moran and Geary for details). The input parameters determine the strategy used for the construction of the weight matrix.

__call__(data: SparseImage | List[SparseImage] | torch.sparse.Tensor | List[torch.sparse.Tensor])
Parameters:

data – A (list of) sparse tensor with C channels

Returns:

score – A (list of) torch.tensor of size C with the score (either moran or gready) indicating how each channel is dispersed in all the others.