Introduction¶

CRMap enhances the transparency of the COMPASS model by introducing a concept-bottleneck architecture that maps transcriptomic features into 44 high-level tumor immune microenvironment (TIME) concepts and traces the flow of information from gene expression through concept projection to final prediction; this personalized interpretability framework not only enables accurate immunotherapy response prediction but also provides patient-specific explanatory maps, thereby supporting mechanistic interpretation, biomarker discovery, and therapeutic target identification.


Workflow of CRMap¶

Step 1. Gene-level representation¶

  • Input: bulk RNA-seq TPM expression matrix.
  • A transformer encoder extracts contextual representations for each gene.

Step 2. Projection into concepts¶

  • Predefined 44 TIME concepts (e.g., CD8+ T cells, exhausted T cells, IFN-γ response, TGF-β signaling).
  • The concept projector learns to aggregate gene embeddings into concept-level representations.
  • The result is a set of concept scores representing the activity of each TIME concept for a given patient sample.

Step 3. Response prediction¶

  • Concept scores are passed into a decoder to predict patient response to ICI (immune checkpoint inhibitors).
  • Training involves comparing predictions against true labels using cross-entropy loss or survival modeling.

Features of CRMap¶

a. Personalized interpretability

  • Each patient has an individualized CRMap, revealing the causal chain from TIME features to treatment response.

b. Hierarchical interpretability

  • Enables tracing from genes → concepts → predictions.
  • Can be refined to the level of single-gene contributions, supporting discovery of new therapeutic targets (e.g., PPP2R1A).

Example Use Cases¶

  • Biomarker discovery: For exmaple: if CRMap consistently shows upregulation of TGF-β signaling in non-responders, this suggests a resistance mechanism.
  • Patient stratification: Patients can be clustered into distinct immune subtypes based on their CRMap, enabling biomarker-driven stratification.
  • Drug target prioritization: By simulating gene perturbations and observing shifts in CRMap contributions, novel actionable targets can be identified.
In [1]:
from compass.utils import prepare_crmap_data, personal_crmap_data, draw_personal_crmap
import pandas as pd
%matplotlib inline

01. Prepare CRMap Data (All patients)¶

Load bulk RNA-seq TPM matrices and clinical annotations for the cohort of interest. This dataset provides the foundation for building both cohort-level and patient-level CRMaps.

Purpose: Ensure that all necessary transcriptomic and clinical data are preprocessed and available for downstream analysis.

In [2]:
dfcx = pd.read_csv('https://www.immuno-compass.com/download/other/compass_gide_tpm.tsv', sep = '\t', index_col = 0)
In [3]:
# Download and load the COMPASS model pretrained in a leave-one-cohort-out (LOCO) strategy (e.g., `pft_leave_Gide.pt`).

(celltype2output, geneset2celltype, gene2geneset, genetpm2gene,
 dfgn, dfgs, dfct, dfpred, dfcx) = prepare_crmap_data(dfcx, 
                                                      model_path_name = 'https://www.immuno-compass.com/download/model/LOCO/pft_leave_Gide.pt',
                                                      map_location = 'cpu',
                                                      z_scale=True)
Downloading...
From: https://www.immuno-compass.com/download/model/LOCO/pft_leave_Gide.pt
To: /tmp/tmp0v3tbk3o
Downloading model from https://www.immuno-compass.com/download/model/LOCO/pft_leave_Gide.pt...
100%|████████████████████████████████████████████████████████████████████████████████████████████| 35.1M/35.1M [00:53<00:00, 662kB/s]
Model downloaded to: /tmp/tmp0v3tbk3o
100%|##################################################################################################| 1/1 [00:04<00:00,  4.22s/it]
100%|##################################################################################################| 1/1 [00:04<00:00,  4.19s/it]

02. Prepare Personal CRMap Data¶

Select a single patient (e.g., 1_ipiPD1_PRE) and extract their transcriptomic features. Define the immune-related concepts to visualize (e.g., IFN-γ pathway, Cytotoxic T cells, Endothelial cells, TGF-β pathway).

Purpose: Narrow down the focus to one patient, enabling personalized interpretability.

In [4]:
dfcx.head(5)
Out[4]:
cancer_code A1BG A1CF A2M A2ML1 A4GALT A4GNT AAAS AACS AADAC ... ZWILCH ZWINT ZXDA ZXDB ZXDC ZYG11A ZYG11B ZYX ZZEF1 ZZZ3
Index
1_ipiPD1_PRE 0.0 -0.689347 -0.249150 -0.856038 -0.293410 -0.748553 -0.757140 -0.919124 -0.674271 -0.436682 ... 0.078115 -0.802111 -1.077512 -0.870153 -1.170595 -0.296469 -1.612454 -1.085344 -1.284846 -1.157044
2_ipiPD1_PRE 0.0 -0.356076 -0.501760 1.935266 -0.307191 -0.671181 -0.757140 -0.241104 -0.853583 -0.470015 ... 1.601781 0.104700 -1.243284 -1.489081 0.568941 -0.567675 -1.211916 0.623320 -0.659467 -0.450424
6_ipiPD1_PRE 0.0 -0.893012 -0.501760 -0.634128 -0.292032 -0.837532 -0.838842 -1.216831 -0.856756 -0.481127 ... -1.225617 -1.240743 -1.160398 -1.290140 -1.489681 -0.513434 -1.649759 -1.517999 -1.243852 -1.203239
7_ipiPD1_PRE 0.0 -1.210854 -0.375455 -0.862134 -0.307191 -0.957459 -0.757140 -1.641944 -1.053523 -0.414460 ... -1.375262 -1.231653 -1.243284 -1.783809 -1.973458 -0.567675 -1.804869 -1.238387 -1.642455 -1.343537
8_ipiPD1_PRE 0.0 -0.664660 -0.501760 -0.874144 -0.304435 -0.725342 -0.512035 -0.224582 -0.563193 -0.481127 ... 0.932910 0.729695 -0.956372 -0.663844 -0.851508 -0.404951 0.081980 -0.637837 -1.140058 1.250255

5 rows × 15673 columns

In [5]:
## We will generate the CRMap  for patient `1_ipiPD1_PRE`
concept2plot=["IFNg_pathway", "Cytotoxic_Tcell",  'Endothelial', 'TGFb_pathway']
patient_id = "2_ipiPD1_PRE"
crmap_df = personal_crmap_data(
            patient_id=patient_id,
            concept2plot=concept2plot,
            TopK_gene=5,
            celltype2output=celltype2output,
            geneset2celltype=geneset2celltype,
            gene2geneset=gene2geneset,
            genetpm2gene=genetpm2gene,
            dfgn=dfgn, dfgs=dfgs, dfct=dfct, dfpred=dfpred, dfcx=dfcx)
In [6]:
crmap_df.head()
Out[6]:
target source weights group concept source_color target_color source_value target_value
0 NR IFNg_pathway -0.512413 celltype->output IFNg_pathway #9e9d93 #9e9d93 -1.185459 0.999955
1 NR Cytotoxic_Tcell -0.406077 celltype->output Cytotoxic_Tcell #9e9d93 #9e9d93 -1.563227 0.999955
2 NR Endothelial 0.093199 celltype->output Endothelial #9e9d93 #9e9d93 0.406011 0.999955
3 NR TGFb_pathway 0.402654 celltype->output TGFb_pathway #9e9d93 #9e9d93 0.512698 0.999955
4 R IFNg_pathway 0.512413 celltype->output IFNg_pathway #9e9d93 #9e9d93 -1.185459 0.000045

03. Draw Personal CRMap¶

Visualize the concept contribution scores in a heatmap or bar plot. The CRMap highlights positive and negative drivers of response prediction (e.g., IFN-γ pathway as a positive contributor, exhausted T cells as a negative contributor).

Purpose: Provide a patient-specific explanatory map that links TIME concepts to predicted response outcomes.

In [7]:
fig = draw_personal_crmap(crmap_df, 
                        concept2plot=concept2plot,
                        figsize=(16, 15),
                        fontsize=15,
                        layer_node_sizes=[0.5, 0.5, 1, 3, 5],
                        max_rad=0.25,
                        show_geneset_name=False,
                        layer_node_gaps={'celltype': 0.15, 'geneset': 0.1, 'output': 0.2},
                        layer_spacing=[1.2, 1, 1, 0.8],)
fig
Out[7]:
No description has been provided for this image
In [8]:
# Drawing CRMap by one concept (showing all of avaliable genes under TGFb pathway concept)
concept2plot=['TGFb_pathway']
patient_id = "2_ipiPD1_PRE"
crmap_df = personal_crmap_data(
            patient_id=patient_id,
            concept2plot=concept2plot,
            TopK_gene=100000,
            celltype2output=celltype2output,
            geneset2celltype=geneset2celltype,
            gene2geneset=gene2geneset,
            genetpm2gene=genetpm2gene,
            dfgn=dfgn, dfgs=dfgs, dfct=dfct, dfpred=dfpred, dfcx=dfcx)

fig = draw_personal_crmap(crmap_df, 
                        concept2plot=concept2plot, show_geneset_name = True)
fig
Out[8]:
No description has been provided for this image