Obtain clinical & genomic data files for GENIE BPC Project
Source:R/pull_data_synapse.R
pull_data_synapse.Rd
Function to access specified versions of clinical and genomic GENIE BPC data from Synapse and read them into the R environment. See the pull_data_synapse vignette for further documentation and examples.
Usage
pull_data_synapse(
cohort = NULL,
version = NULL,
download_location = NULL,
username = NULL,
password = NULL
)
Arguments
- cohort
Vector or list specifying the cohort(s) of interest. Must be one of "NSCLC" (Non-Small Cell Lung Cancer), "CRC" (Colorectal Cancer), or "BrCa" (Breast Cancer), "PANC" (Pancreatic Cancer), "Prostate" (Prostate Cancer), and "BLADDER" (Bladder Cancer).
- version
Vector specifying the version of the data. Must be one of the following: "v1.1-consortium", "v1.2-consortium", "v2.1-consortium", "v2.0-public". When entering multiple cohorts, the order of the version numbers corresponds to the order that the cohorts are specified; the cohort and version number must be in the same order in order to pull the correct data. See examples below.
- download_location
if `NULL` (default), data will be returned as a list of dataframes with requested data as list items. Otherwise, specify a folder path to have data automatically downloaded there. When a path is specified, data are not read into the R environment.
- username
'Synapse' username
- password
'Synapse' password
Authentication
To access data, users must have a valid 'Synapse' account with permission to access the data set and they must have accepted any necessary 'Terms of Use'. Users must always authenticate themselves in their current R session. (see README: Data Access and Authentication
for details). To set your 'Synapse' credentials during each session, call:
`set_synapse_credentials(username = "your_username", password = "your_password")`
If your credentials are stored as environmental variables, you do not need to call `set_synapse_credentials()` explicitly each session. To store authentication information in your environmental variables, add the following to your .Renviron file, then restart your R session ' (tip: you can use `usethis::edit_r_environ()` to easily open/edit this file):
`SYNAPSE_USERNAME = <your-username>`
`SYNAPSE_PASSWORD = <your-password>`
Alternatively, you can pass your username and password to each individual data pull function if preferred, although it is recommended that you manage your passwords outside of your scripts for security purposes.
Analytic Data Guides
Documentation corresponding to the clinical data files can be found on 'Synapse' in the Analytic Data Guides:
Examples
# Example 1 ----------------------------------
# Set up 'Synapse' credentials
set_synapse_credentials()
#> ✔ You are now connected to 'Synapse' as
#> bstgeniebpc@mskcc.org for this R session!
# Print available versions of the data
synapse_version(most_recent = TRUE)
#> # A tibble: 8 × 4
#> cohort version release_date versions_returned
#> <chr> <chr> <chr> <chr>
#> 1 BLADDER v1.1-consortium 2022-11 Most Recent Versions
#> 2 BrCa v1.2-consortium 2022-10 Most Recent Versions
#> 3 CRC v1.2-consortium 2021-08 Most Recent Versions
#> 4 CRC v2.0-public 2022-10 Most Recent Versions
#> 5 NSCLC v2.1-consortium 2021-08 Most Recent Versions
#> 6 NSCLC v2.0-public 2022-05 Most Recent Versions
#> 7 PANC v1.2-consortium 2023-01 Most Recent Versions
#> 8 Prostate v1.2-consortium 2023-01 Most Recent Versions
# Pull version 2.0-public for non-small cell lung cancer
# and version 1.1-consortium for colorectal cancer data
ex1 <- pull_data_synapse(
cohort = c("NSCLC", "BrCa"),
version = c("v2.0-public", "v1.1-consortium")
)
#> ✔ pt_char has been imported for "BrCa_v1.1"
#> ✔ ca_dx_index has been imported for "BrCa_v1.1"
#> ✔ ca_dx_non_index has been imported for "BrCa_v1.1"
#> ✔ ca_drugs has been imported for "BrCa_v1.1"
#> ✔ prissmm_imaging has been imported for "BrCa_v1.1"
#> ✔ prissmm_pathology has been imported for "BrCa_v1.1"
#> ✔ prissmm_md has been imported for "BrCa_v1.1"
#> ✔ tumor_marker has been imported for "BrCa_v1.1"
#> ✔ cpt has been imported for "BrCa_v1.1"
#> ✔ mutations_extended has been imported for "BrCa_v1.1"
#> ✔ fusions has been imported for "BrCa_v1.1"
#> ✔ cna has been imported for "BrCa_v1.1"
#> ✔ pt_char has been imported for "NSCLC_v2.0"
#> ✔ ca_dx_index has been imported for "NSCLC_v2.0"
#> ✔ ca_dx_non_index has been imported for "NSCLC_v2.0"
#> ✔ ca_drugs has been imported for "NSCLC_v2.0"
#> ✔ prissmm_imaging has been imported for "NSCLC_v2.0"
#> ✔ prissmm_pathology has been imported for "NSCLC_v2.0"
#> ✔ prissmm_md has been imported for "NSCLC_v2.0"
#> ✔ cpt has been imported for "NSCLC_v2.0"
#> ✔ mutations_extended has been imported for "NSCLC_v2.0"
#> ✔ fusions has been imported for "NSCLC_v2.0"
#> ✔ cna has been imported for "NSCLC_v2.0"
names(ex1)
#> [1] "BrCa_v1.1" "NSCLC_v2.0"