In this Review, we discuss the current state of digital pathology, from data selection and management to feature selection, model training and validation, focusing on prostate cancer. We also consider some of the ethical and societal implications of the use of AI in prostate cancer care, and outline potential opportunities and future directions for research in this field. Last, we propose a checklist for evaluating computational pathology projects, from inception to post-market surveillance. Our hope is to facilitate meaningful research that will lead to practical applications of digital pathology in prostate cancer.
The development and implementation of AI tools in digital pathology follows a complex, multi-step process that requires careful attention at each stage. From initial specimen acquisition to final clinical use, successful implementation demands coordination between pathologists, data scientists and clinical teams. The workflow can be divided into two parallel streams: the physical processing of pathology specimens (Fig. 2), and the computational development of AI models (Fig. 3). As our focus is on applications of AI in prostate cancer, we describe only the basics of AI approaches in this Review, as more detailed descriptions of AI methods have been published elsewhere.
The technical foundation of digital pathology combines the digitization of histological specimens with robust computational workflows to enable accurate model development. WSI scanners generate gigapixel-level images whose quality is influenced by pre-analytical variables, such as tissue fixation, sectioning, staining protocols and scanner settings, that can introduce heterogeneity and affect downstream analysis. To address these challenges, preprocessing pipelines apply stain-normalization techniques and artefact detection models to standardize colour profiles and remove markings. On the computational side, supervised and unsupervised machine learning paradigms leverage either expert-engineered features or end-to-end deep neural networks to extract diagnostically and prognostically relevant information directly from digital slides. Integrating these components establishes a scalable, reproducible infrastructure essential for promoting digital pathology applications in the evaluation of prostate cancer.
ML techniques for pathology image analysis are primarily classified as supervised learning (including weakly supervised) and unsupervised learning. In supervised ML, models are trained on expert-labelled datasets. The model learns from these labels, considered the 'ground truth', to make predictions. When given new slides, the model predicts categories based on learned features, which is a classic example of classification. In regression applications, such as predicting patient survival, the model produces numerical outputs rather than categories, adjusting the predictions to minimize error according to specified formulas.
Unsupervised learning takes a different approach, using unlabelled data to discover inherent patterns within images. These models group instances based on detected patterns in the dataset, rather than relying on predefined categories. For prostate cancer pathology, this process could involve categorizing digital slides into different grades based on intrinsic image characteristics, similar to the principles underlying the Gleason grading system.
Hand-crafted ML models depend on expert pathologists to identify and select the same visual characteristics that humans already use to make diagnostic decisions. These models then use linear or non-linear transformations and combinations of the features or learned decision trees based on the features to derive subtle information. By incorporating these established diagnostic features, hand-crafted models attempt to replicate the same analytical process that pathologists use in clinical practice.
Deep learning models operate differently, automatically deconstructing images into fundamental elements such as edges, curves or shapes. This process occurs through multiple hidden layers (processing stages that learn increasingly complex features) that progressively abstract features from the input data. Each layer performs non-linear transformations, with successive layers indicating increasing levels of abstraction. Convolutional neural networks (CNNs), a type of deep learning model that systematically analyses images region by region, have become particularly valuable, and process data through node layers to recognize spatial features and optimize predictions through back-propagation (a learning process where the model adjusts itself based on its errors). Several neural network architectures, including U-Net and ResNet, have proven especially effective for medical imaging applications. These approaches have shown impressive results in computer-aided diagnosis, as shown by tools such as Paige Prostate, a deep learning-based platform that uses CNNs trained on thousands of annotated prostate biopsy whole-slide images to automatically identify and highlight regions suspicious for malignancy for pathologist review, which improves pathologists' sensitivity in detecting prostate cancer on whole-slide images. Deep learning excels at extracting features from large datasets but operates as a 'black box' system, and therefore faces challenges with interpretability. The choice between hand-crafted and deep learning approaches often depends on specific uses and available resources.
Data processing in digital pathology encompasses the end-to-end handling of whole-slide images, from initial acquisition and secure storage, through normalization and artefact correction, to augmentation, to prepare robust, high-quality datasets for downstream analysis. Effective processing pipelines are essential to mitigate variability introduced by pre-analytical factors, maintain consistency across multicentre repositories, and maximize the value of limited clinical data.
The traditional pathology workflow begins with the preparation of histology slides, which are stained with haematoxylin and eosin (H&E) or another stain, analysed visually for quality under a microscope, and then physically stored. Trained pathologists analyse these slides to provide diagnostic assessments that guide management and prognosis. This conventional process is effective, but presents several challenges, as the workflow is cumbersome, physical storage is impractical, and collaboration between pathologists and the broader clinical team including oncologists, surgeons and radiologists is hindered by the need to physically retrieve slides when needed.
Digital pathology constitutes a considerable evolution from this traditional approach. Histopathology slides are digitized at high resolution using whole-slide scanners to generate virtual slides. These digital images provide comprehensive visual information about tissue architecture and staining characteristics across various amplifications and focal planes. This transformation creates a dynamic, image-based process encompassing image acquisition, storage, management, interpretation and virtual staining. An increasing push in the field is observed towards multiplexed WSI, in which immunohistochemistry (IHC) and immunofluorescence information can be paired for digital slide interpretation, further extending the richness of data in digital pathology.
Notably, the virtual slides are derived from glass slides that can be prepared using either conventional (regular) cassettes or large-format cassettes. Regular cassettes process standard-size tissue sections, whereas large-format histology, in which larger cassettes are used, enables the entire gland (or large tissue specimen) and the whole tumour to be mounted on a single slide. This comprehensive coverage preserves crucial spatial context and architectural information, which is particularly advantageous for quantitative analysis, and enhances the performance of AI algorithms by providing a complete view of the tumour and its microenvironment.
Slide scanning is typically done with bright-field microscopy, but fluorescence confocal microscopy (FCM) is being increasingly applied to digital pathology for prostate cancer. FCM uses a laser-based scanning system with a pinhole aperture to restrict light collection to the in-focus plane, in turn reducing out-of-focus blur and yielding images with exceptional clarity and contrast. In contrast to bright-field microscopy, which requires fixation, embedding, sectioning and staining, FCM enables visualization of fresh, unfixed and unsectioned tissue, bypassing several labour-intensive and time-intensive steps involved in H&E-stained slide preparation. This approach enables rapid, near-real-time acquisition of diagnostic-quality images that emulate the morphological appearance on H&E staining, facilitating fast clinical workflows and enabling point-of-care pathology in both operative and remote settings. AI research has not extensively used FCM data yet, but FCM's strength in digital pathology is being increasingly shown, particularly in the diagnostic field, which benefits from FCM's high sample processing speed.
AI implementation in this field is associated with both opportunities and challenges, as the large amount of complex information generated through digital pathology exceeds human capacity for comprehensive analysis.
AI models generate predictions based on training data, with performance directly dependent on the quality and quantity of the training dataset. High-quality training data (digitized slides) in health care present unique challenges owing to cost, privacy concerns and limited availability. Data quality is substantially influenced by tissue processing techniques, scanner type, image magnification, stain variation and artefacts. Even with standardized protocols, slide preparation varies across laboratories. Variations in specimen fixation, sectioning, mounting, staining and digitization parameters can disrupt three-dimensional tissue architecture and create substantial variability in slide colour and intensity, potentially leading to suboptimal performance on unseen data. These variations affect model development, validation, testing and commercial adoption.
Standardizing all steps before analysis would be ideal, but is rarely feasible. To address these quality and standardization challenges, several preprocessing approaches have become fundamental in computational pathology. Stain normalization, which addresses variations that arise from differences in slice thickness, staining procedures and digitization processes, has emerged as a crucial preprocessing step. Popular normalization methods include the approaches of Reinhard et al., Macenko et al. and Vahadane et al.. The value of these techniques has been shown in different studies. For instance, improved prostate cancer detection accuracy with colour-normalized histology images -- compared with unnormalized images -- was shown using the method of Macenko et al., whereas in another study, a 6% improvement in detection of prostate cancer on H&E-stained slides in terms of area under the curve (AUC) for the receiver operating characteristic curve was observed with stain normalization. Advances in stain normalization using generative adversarial networks (GANs) resulted in performance that matched or exceeded those of these classic methods. Beyond stain normalization, histological slides often require additional processing to address artefacts. Pathologists frequently mark slides with permanent ink to indicate areas of interest, such as high-grade tumour or extracapsular extension. These markings can contaminate virtual slides and lead to misclassification errors. Advanced deep learning models have been developed to remove these markings while preserving tissue data and image resolution.
Virtual staining through GANs has emerged as a promising approach to augment traditional staining processes. The potential of this approach has been shown through the development of a model that achieves 95% pixel-by-pixel overlap between computationally stained images and images of traditional H&E-stained slides from prostate biopsy samples. By learning non-linear mappings between whole-slide image pairs before and after H&E staining, virtual staining through GANs can effectively reproduce structural features including gland shape, nuclei, stroma and blood vessels in a clinically interpretable manner.
Data augmentation techniques that increase available data through colour transformation, rotation, flipping and distortion of existing images, have become invaluable in digital pathology, where large medical datasets are scarce. Similarly transfer learning, in which models trained on large datasets from related domains are fine-tuned for specific tasks, reduces the need for extensive new datasets while improving model performance.
The steps in the model development process include feature extraction, algorithm training, validation and performance benchmarking. Upon development, AI models must undergo internal and external validation to assess generalizability across diverse populations and clinical settings. The evaluation of performance using clinically meaningful metrics that extend beyond technical accuracy is equally important, ensuring that model outputs are interpretable and aligned with the real-world demands of prostate cancer care.
The development of AI models in pathology requires careful consideration of how image features are extracted and used. Two primary approaches have emerged: hand-crafted feature selection and deep learning-based feature extraction. In hand-crafted models, human experts with substantial field experience and domain knowledge manually select targeted features for the model to train on. This approach in prostate cancer relies on several characteristics: tissue architecture and collagen fibre orientation; gland morphology (including roundness, smoothness, compactness and size ratios); nuclear features (shape, size and spatial distribution); and textural features. This approach benefits from direct expert input, but can be time-consuming and might inadvertently translate human biases into the models.
Deep learning models take a fundamentally different approach, automatically deconstructing images into low-level cues, such as edges, curves or shapes, without human input. These elements are then aggregated to form high-order structural relationships to identify features of interest. This process occurs through multiple hidden layers that progressively abstract features from the input data, with each layer performing non-linear transformations for further data abstraction. CNNs have emerged as particularly powerful tools in this domain, processing data through node layers to recognize spatial features and optimize predictions through back-propagation.
Once features are extracted, ML algorithms identify patterns in the data to produce useful predictions. The most useful advantage of deep learning models is the ability to extract and learn relevant features from large numbers of instances without requiring expert input. However, this approach faces challenges with interpretability, as the reasoning behind predictions is not entirely clear, creating a black box that presents challenges for regulatory approval. Nevertheless, several techniques have been proposed to address this issue and enable interpretation of these predictions.
Model validation is a crucial step in model development. Internal validation assesses model performance on held-out data from the same source, whereas external validation evaluates performance on completely independent datasets. This multi-step validation process helps ensure model robustness and generalizability across different clinical settings.
The assessment of model performance must consider multiple metrics beyond simple accuracy. For diagnostic models, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV) and AUC provide crucial insights into the model's ability to correctly identify true disease instances, avoid false-positives and maintain predictive reliability across diverse clinical scenarios. However, clinical relevance and practical utility should ultimately guide model development and implementation. Models must show statistical significance but, more importantly, clinical significance to warrant integration into pathology workflows.