The American Association for Cancer Research Project Genomics Evidence Neoplasia Information Exchange Biopharma Collaborative (GENIE BPC) is an effort to aggregate comprehensive clinical data linked to genomic sequencing data to create a pan-cancer, publicly available data repository. These data detail clinical characteristics and drug regimen treatment information, along with high-throughput sequencing data and clinical outcomes, for cancer patients across multiple institutions. The GENIE BPC data repository forms a unique observational database of comprehensive clinical annotation with molecularly characterized tumors that can be used to advance precision medicine research in oncology. Linking multiple clinical and genomic datasets that vary in structure introduces an inherent complexity for data users. Therefore, use of the GENIE BPC data requires a rigorous process for preparing and merging the data to build analytic models. The {genieBPC} package is a user-friendly data processing pipeline to streamline the process for developing analytic cohorts that are ready for clinico-genomic analyses.


Install {genieBPC} from CRAN:


Install the development version of {genieBPC} with:


Overview of {genieBPC} Functions

  • Data import: pull_data_synapse() imports GENIE BPC data from ‘Synapse’ into the R environment

  • Data processing

    • create_analytic_cohort() selects an analytic cohort based on cancer diagnosis information and/or cancer-directed drug regimen information
    • select_unique_ngs() selects a unique next generation sequencing (NGS) test corresponding to the selected diagnoses
  • Data visualization: drug_regimen_sunburst() creates a sunburst figure of drug regimen information corresponding to the selected diagnoses in the order that the regimens were administered

Data Access & Authentication

Access to the GENIE BPC data release folders on ‘Synapse’ is required in order to use this function. To obtain access:

For public data releases:

  1. Register for a ‘Synapse’ account. Accept the Synapse account terms of use.

  2. Navigate to the data release and request accept terms of use (e.g., for the NSCLC 2.0-public data release, navigate to the ‘Synapse’ page for the data release). Towards the top of the page, there is information including the ‘Synapse’ ID, DOI, Item count, and Access. Next to Access is a link that reads Request Access.

  3. Select Request Access, review the terms of data use and select Accept

Note that permissions for Synapse and permissions for each data release are distinct. Both permissions must be accepted to successfully access the data.

For consortium data releases (restricted to GENIE consortium members & BPC pharmaceutical partners):

  1. Register for a ‘Synapse’ account

  2. Use this link to access the GENIE BPC team list and request to join the team. Please include your full name and affiliation in the message before sending out the request.

  3. Once the request is accepted, you may access the data in the GENIE Biopharma Collaborative projects.

Note: Please allow up to a week to review and grant access.

Authenticate yourself

  1. Whether you are using public or consortium data, you will need to authenticate yourself at the beginning of each R session in which you use {genieBPC} to pull data (see set_synapse_credentials()), or store your credentials as environmental variables. See Tutorial: pull_data_synapse for more details.

Analytic Data Guides

The analytic data guides provide details on each analytic dataset and its corresponding variables for each data release.

The following example creates an analytic cohort of patients diagnosed with Stage IV adenocarcinoma NSCLC.

Pull data for NSCLC version 2.0-public:

nsclc_2_1 <- pull_data_synapse(cohort = "NSCLC", version = "v2.0-public")

Select stage IV adenocarcinoma NSCLC diagnoses:

nsclc_stg_iv_adeno <- create_analytic_cohort(data_synapse = nsclc_2_0$NSCLC_v2.0, 
                                             stage_dx = "Stage IV", 
                                             histology = "Adenocarcinoma")

Select one unique metastatic lung adenocarcinoma genomic sample per patient in the analytic cohort returned above:

nsclc_stg_iv_adeno_unique_sample <- select_unique_ngs(
  data_cohort = nsclc_stg_iv_adeno$cohort_ngs)

Create a visualization of the treatment patterns for the first 3 regimens received by patients diagnosed with stage IV adenocarcinoma:

sunplot <- drug_regimen_sunburst(data_synapse = nsclc_2_0$NSCLC_v2.0,
                                 data_cohort = nsclc_stg_iv_adeno,
                                 max_n_regimens = 3)

Example of a sunburst plot showing 3 treatment regimens, highlighting the first treatment regimen: