Tutorial: select_unique_ngs
Source:vignettes/select_unique_ngs_vignette.Rmd
select_unique_ngs_vignette.Rmd
Introduction
The genieBPC
package allows users to link the clinical
data to patients’ corresponding genomic samples.
Modifying Function Arguments
`select_unique_ngs()` Function Arguments | |
Argument | Description |
---|---|
data_cohort |
CPT (NGS) dataframe returned from the create_analytic_cohort function. |
oncotree_code |
Character vector specifying which sample OncoTree codes to keep. See 'cpt_oncotree_code' column of data_cohort argument above to get options. |
sample_type |
Character specifying which type of genomic sample to prioritize, options are 'Primary', 'Local' and 'Metastasis'. Default is to not select a NGS sample based on the sample type. |
min_max_time |
Character specifying if the first or last genomic sample recorded should be kept. Options are 'min' (first) and 'max' (last). |
The select_unique_ngs()
function returns the
‘cohort_ngs’ object of the create_analytic_cohort with a unique genomic
sample per patient for each diagnosis for the purpose of creating an
analytic dataset based on user-defined criterion, including OncoTree
code, primary vs. metastatic tumor sample, and earliest vs. most recent
sample. If multiple reports for a patient remain available after the
user-defined specifications, or if no specifications are provided, the
panel with the largest number of genes is selected by default. Sample
optimization is performed in the order that the arguments are specified
in the function, regardless of the arguments’ order provided by the
user. Namely the OncoTree code is prioritized first, sample type is
prioritized second and finally the time is prioritized last. For
patients with exactly one genomic sample, that unique genomic sample
will be returned regardless of whether it meets the user-specified
parameters; the purpose of this function is to specify criteria for
selecting a single NGS in the case that there are multiple NGS reports
associated with a single cancer diagnosis. Running the
select_unique_ngs() function will ensure that the resulting dataset
returned by merging the next generation sequencing report data onto the
analytic dataset returned by create_analytic_cohort() will maintain a
structure of one record per patient.
Note that the NGS dataset serves as the link between the clinical and genomic data, where the NGS dataset includes one record per NGS report per patient, including the NGS sample ID that is used to link to the genomic data files. Merging data from the NGS report onto the analytic cohort returned from create_analytic_cohort() therefore allows users to utilize all clinical and genomic data available.
Setup
Before going through the tutorial, load the {genieBPC} library and
log into Synapse using the set_synapse_credentials()
function. For more information on
set_synapse_credentials()
, refer to the
Tutorial: pull_data_synapse()
vignette.
This tutorial will utilize the data downloaded in the
Tutorial: pull_data_synapse()
vignette, as shown below:
nsclc_2_0 = pull_data_synapse("NSCLC", version = "v2.0-public")
Examples
Example 1
Select unique genomic samples from patients with stage IV NSCLC of histology adenocarcinoma. For patients with multiple samples, select the sample with OncoTree code LUAD. For patients with multiple samples with OncoTree code LUAD, select the metastatic genomic sample. If any patients have multiple metastatic samples with OncoTree code LUAD, take the later of the samples.
Note: for patients with exactly one genomic sample, that unique genomic sample will be returned regardless of whether it meets the argument criteria specified below.
ex1 <- create_analytic_cohort(
data_synapse = nsclc_2_0$NSCLC_v2.0,
stage_dx = c("Stage IV"),
histology = "Adenocarcinoma"
)
samples_data1 <- select_unique_ngs(
data_cohort = ex1$cohort_ngs,
oncotree_code = "LUAD",
sample_type = "Metastasis",
min_max_time = "max"
)
Example 2
Create a cohort of all NSCLC patients who received Cisplatin, Pemetrexed Disodium or Cisplatin, Etoposide as their first drug regimen, and select corresponding genomic samples.
For patients with multiple samples, first select samples with OncoTree code LUAD. If multiple samples per patient remain, prioritize samples from the primary tumor. Finally, prioritize the sample taken earliest.
Note: for patients with exactly one genomic sample, that unique genomic sample will be returned regardless of whether it meets the argument criteria specified below.
ex2 <- create_analytic_cohort(
data_synapse = nsclc_2_0$NSCLC_v2.0,
regimen_drugs = c("Cisplatin, Pemetrexed Disodium", "Cisplatin, Etoposide"),
regimen_order = 1,
regimen_order_type = "within regimen"
)
samples_data <- select_unique_ngs(
data_cohort = ex2$cohort_ngs,
oncotree_code = "LUAD",
sample_type = "Primary",
min_max_time = "min"
)