CCRL: Contrastive Cell Representation Learning

Cell identification within the H&E slides is an essential prerequisite that can pave the way towards further pathology analyses including tissue classification, cancer grading, and phenotype prediction. However, performing such a task using deep learning techniques requires a large cell-level annotated dataset. Although previous studies have investigated the performance of contrastive self-supervised methods in tissue classification, the utility of this class of algorithms in cell identification and clustering is still unknown. In this work, we investigated the utility of Self-Supervised Learning (SSL) in cell clustering by proposing the Contrastive Cell Representation Learning (CCRL) model. Through comprehensive comparisons, we show that this model can outperform all currently available cell clustering models by a large margin across two datasets from different tissue types. More interestingly, the results show that our proposed model worked well with a few number of cell categories while the utility of SSL models has been mainly shown in the context of natural image datasets with large numbers of classes (e.g., ImageNet). The unsupervised representation learning approach proposed in this research eliminates the time-consuming step of data annotation in cell classification tasks, which enables us to train our model on a much larger dataset compared to previous methods. Therefore, considering the promising outcome, this approach can open a new avenue to automatic cell representation learning.

PDF Presentation

Deep Learning-Based Histotype Diagnosis of Ovarian Carcinoma Whole-Slide Pathology Images

We compared four different deep convolutional neural networks for classifying H&E-stained images of epithelial ovarian carcinoma histotypes using the largest training dataset to date (948 slides corresponding to 485 patients), exploring techniques such as color normalization and partially balancing the histotypes. The best performing model, assessed on an independent test set of 60 patients from another institution, achieved a mean diagnostic concordance of 80.97% (Cohen’s kappa 0.7547). As well, in 4 of 8 cases misclassified by ML on the external testing dataset, two expert subspecialty pathologists rendered diagnoses, based on blind review of the WSIs, that agree with AI rather than the integrated reference diagnosis. Our results indicate that color normalization can reliably improve AI-based diagnosis of WSIs sourced from multiple centers, and specifically that an ML-based ovarian carcinoma classifier is ready for clinical validation studies as an adjunct for informing histotype diagnosis.


Identification of a Novel Subtype of Endometrial Cancer with Unfavourable Outcome Using Artificial Intelligence-based Histopathology Image Analysis

Utilizing an AI-based approach for histopathology image analysis, we have discovered ‘p53abn-like’ NSMPs, a novel subtype of NSMP ECs with morphological features similar to p53abn cases. ‘p53abn-like’ NSMPs exhibit similar clinical behavior as p53abn, having noticeably inferior outcome.


The utility of color normalization for AI-based diagnosis of hematoxylin and eosin-stained pathology images

The color variation of hematoxylin and eosin (H&E)-stained tissues has presented a challenge for applications of artificial intelligence (AI) in digital pathology. In this study, we investigated eight color normalization algorithms for AI-based classification of H&E-stained histopathology slides, in the context of both using images from one center and from multiple centers. Our results show that color normalization does not consistently improve classification performance when both training and testing data are from a single center. However, using four multi-center datasets of two cancer types, we show that color normalization can significantly improve the classification of images from external datasets (ovarian cancer 0.25 AUC increase, p=1.6e-05, pleural cancer 0.21 AUC increase, p=1.4e-10). Furthermore, we introduce a novel augmentation strategy by mixing color-normalized images using three easily accessible algorithms that consistently improves the diagnosis of test images from external centers, even when the individual normalization methods had varied results.


Synthesis of Clinical Grade Cancer Pathology Images Using Generative Adversarial Networks

The purpose of this project is to showcase the capacity of generative adversarial networks (GAN) in synthesizing realistic looking histopathology images. Our findings have important applications in proficiency testing of medical practitioners and quality assurance in clinical laboratories. Furthermore, training of computer-aided diagnostic systems can benefit from synthetic images where labeled datasets are limited (e.g., rare cancers). We have created a publicly available demo website where clinicians and researchers can attempt questions from the image survey.

PDF Demo

Classification of Epithelial Ovarian Carcinoma Whole-Slide Pathology Images Using Deep Transfer Learning

Ovarian cancer is the most lethal cancer of the female reproductive organs. There are 5 major histological subtypes of epithelial ovarian cancer, each with distinct morphological, genetic, and clinical features. Currently, these histotypes are determined by a pathologist's microscopic examination of tumor whole-slide images (WSI). This process has been hampered by poor inter-observer agreement (Cohen’s kappa 0.54-0.67). We utilized a two-stage deep transfer learning algorithm based on convolutional neural networks (CNN) and progressive resizing for automatic classification of epithelial ovarian carcinoma WSIs. The proposed algorithm achieved a mean accuracy of 87.54% and Cohen's kappa of 0.8106 in the slide-level classification of 305 WSIs; performing better than a standard CNN and pathologists without gynecology-specific training.

PDF GitHub

cPathPortal: a Web-Based Tool for Viewing and Annotation of Histopathology Images

In order to build and evaluate machine learning (ML) models, histopathology slide images need to be annotated by pathologists. Furthermore, the evaluation of ML models needs a conveninet platform for viweing the results. Our group have developed a web-based application (named “cPathPortal”) for quick and intuitive annotation of histopathology images that works with tablets with stylus. cPathPortal is built on top of OMERO ( and allows the users to load histopathology images, zoom into different sections and annotate them. Our intention is to release cPathPortal as an open-source platform in the near future. Please contact us if you would like to deploy and test cPathPortal on your own servers.


Machine Learning-Driven Biomarker Discovery Platform for Gynecological Cancers: Towards Precision Medicine

Despite the revolution in our understanding of genetic and molecular drivers of different cancers, standard clinical management of gynecological cancers (specifically, ovarian and endometrial) and patient outcomes have not seen major improvements in the past decades. For example, recent findings, including ours, in the classification of ovarian cancer based on genetic markers have important clinical implications, including the potential for identifying new treatments. However, until the discovery of surrogate biomarkers, these sub-classifications are based on labor-intensive assays with long turnaround times, limiting their routine use. We are building a platform for characterizing ovarian and endometrial cancers, based on advanced machine learning techniques, by combining genetic and histopathology images.

Spatial Genetic and Micro-Environmental Heterogeneity in Cancer

Cancer therapeutic resistance occurs as tumors develop resistance to treatments such as chemotherapy, radiotherapy and targeted therapies, through many different mechanisms. There is growing evidence that (epi)genetic and phenotypic heterogeneity within a tumor as well as tumor microenvironment contributes to resistance. While many studies in the past years (including ours) have characterized the degree of intra-tumor genetic heterogeneity and microenvironmental factors in different cancers, characterizing the spatial heterogeneity at single cell resolution is critical to identify driving factors of tumor heterogeneity. Using AI techniques, we are integrating a variety of assays including histopathology (haematoxin & eosin (H&E), multi-spectral IHC), in-situ, and single cell RNA sequencing to objectively characterize cancer–microenvironment interactions and genetic heterogeneity at different spatial scales in ovarian, endometrial, and synovial sarcoma tumors.

Genomic Signatures for Ovarian Cancer Histotypes Stratification

we stratified ovarian cancer histotypes into novel genomic subtypes. This major discovery was built upon novel approach of integrating computation of point mutation and structural variation signatures that provided a potent, genome-based discriminant biomarker across the spectra of ovarian cancer histotypes. The most striking finding was a previously unreported poor outcome group representing ~50% of high grade serous ovarian cancer (HGSOC) cases. The poor outcome group was characterized by specific types of genomic structural rearrangement reflective of micro homology mediated end joining (MMEJ) as an active double strand break repair process that was notably absent from the other HGSOC cases.


TMEM30A Loss of Function Mutations in Diffuse Large B-Cell Lymphoma

We conducted an integrative genomic analysis on a large cohort of patients with de novo DLBCL, drawn from a population registry, that were treated uniformly with modern immune-chemotherapy (R-CHOP). This study provides unprecedented insight into mutation-associated changes in membrane physiology linked to cancer. We identified and characterized TMEM30A as a novel tumor suppressor gene whose loss is exclusively found in DLBCL and associated with favourable outcome after immuno-chemotherapy.


Kronos: a Workflow Assembler for Genome Analytics and Informatics

Kronos is a software platform for facilitating the development and execution of modular, auditable, and distributable bioinformatics workflows. Kronos obviates the need for explicit coding of workflows by compiling a text configuration file into executable Python applications. The Kronos platform provides a standard framework for developers to implement custom tools, reuse existing tools, and contribute to the community at large.


Evolutionary Trajectories of Primary High‐Grade Serous Ovarian Cancers

The first world study on quantifying the degree of clonal diversity and evolution in primary untreated high-grade serous ovarian cancers (HGSOC). In this study, we measured the degree of genomic diversity within primary, untreated HGSCs to examine the natural state of tumour evolution prior to therapy. Our results demonstrate extensive intra-tumoral mutational, copy number and expression heterogeneity in HGSCs, thus illuminating a challenge for application of sequencing tumor genomes in the context of personalized, precision medicine. Although HGSCs are considered a single disease, they exhibit individual evolutionary trajectories that will require consideration in future therapeutic solutions.


DriverNet: Measuring the Impact of Somatic Mutations on Transcriptional Networks in Cancer

We introduce a novel computational framework, DriverNet, to identify likely driver mutations by virtue of their effect on mRNA expression networks. Application to four cancer datasets reveals the prevalence of rare candidate driver mutations associated with disrupted transcriptional networks and a simultaneous modulation of oncogenic and metabolic networks, induced by copy number co-modification of adjacent oncogenic and metabolic drivers.