Integrating predictive and empirical data for rational marker prioritization
SurfaceGenie is a web app for analyzing omic datasets (e.g. proteomic, transcriptomic) to prioritize candidate cell-type specific markers of interest for immunophenotyping, immunotherapy, drug targeting, and other applications. It works by calculating the likelihood a molecule is informative for distinguishing among sample groups (e.g. cell types, experimental conditions). While a major benefit of SurfaceGenie is the ability to prioritize proteins that are localized to the cell surface, it is also possible to analyze data without this parameter to find proteins of interest that reside in other subcellular localizations. See the descriptions for each of the four permutations of the scoring algorithm.
SurfaceGenie works well with quantitative proteomic and transcriptomic datasets, but others are also possible (see below). All calculations performed within SurfaceGenie are context-dependent, meaning that the tools will consider all data within a single dataset input (which may contain multiple experiments and/or cell types). If a user performs a comparison and subsequently determines additional data should be considered, a new file containing all data for the new comparison is required.
GenieScore: Use to prioritize surface proteins that have disparate levels of abundance/expression.
IsoGenieScore: Use to prioritize surface proteins that have similar, high levels of abundance/expression.
OmniGenieScore: Use to prioritize any molecules (genes/proteins) that have disparate levels of abundance/expression.
IsoOmniGenieScore: Use to prioritize any molecules (genes/proteins) that have similar, high levels of abundance/expression.
Input:
SurfaceGenie accepts a .csv file containing a list of proteins (UniProt Accession) and a surrogate value representative of abundance (e.g. number of peptide spectrum matches, peak area, FKPM, RKPM) identified within a set of samples. There is no limit to the number of samples that can be analyzed in a single file. SurfaceGenie has SPC datasets for human, mouse, and rat.
Data Processing:
SurfaceGenie calculates the dot product of three independent scores:
Output:
This feature enables users to obtain Surface Protein Consensus (SPC) score for proteins of interest without analyzing data through SurfaceGenie. Users may perform a batch retrieval by uploading a .csv file containing UniProt Accession numbers or may search individual UniProt accession numbers.
Although the calculation of SPC score depends on the use of Uniprot Accession IDs (for human, mouse, or rat), the other terms used here are agnostic to the type and distribution of data. Therefore, the OmniGenieScore and IsoOmniGenieScore can be used for any type of quantitative data for which there is a desire to find measurements that are either unique or similar between all samples. This could include metabolomic data, glycomics data, or strain counts for microbiome studies.
A .csv file containing a list of proteins (UniProt Accession) and a surrogate value representative of abundance (e.g. number of peptide spectrum matches, peak area) identified within a set of samples.
The first column of your data file must be labeled 'Accession' with no extra characters (e.g. not 'Accession #'). This column should contain the UniProt accession numbers of the proteins in your samples. You may include isoforms. To convert from a different protein ID type to UniProt, bulk conversion is available here . Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.
Importantly, data files must be in .csv format . If you are working in Excel, click 'File --> Save As' and select csv in the drop-down menu to convert from .xlsx to .csv.
GenieScore: Use to prioritize surface proteins that have disparate levels of abundance/expression.
IsoGenieScore: Use to prioritize surface proteins that have similar, high levels of abundance/expression.
OmniGenieScore: Use to prioritize any molecules (genes/proteins) that have disparate levels of abundance/expression.
IsoOmniGenieScore: Use to prioritize any molecules (genes/proteins) that have similar, high levels of abundance/expression.
Ideally, similar samples such as technical replicates or biological replicates will have values averaged or summed together into a single column. However, SurfaceGenie will carry out this step for you if you select 'Group samples'. If this box is checked, you will need to provide the grouping method as well as the column numbers for each group. For example, If columns 2, 3, and 5 of your dataset should be grouped together and columns 4 & 6 comprise another group, you should indicate the presence of 2 groups using the slider and then enter the corresponding column numbers below separated by commas: Group 1: '2, 3, 5', Group 2: '4, 6'. Remember that column 1 will contain accession numbers and cannot be grouped with other columns.
You may select data to export as columns appended to the right of your original data. The following variables are available for export:
GenieScore Components:
Annotations/Linkouts:
Several visualizations are made available by SurfaceGenie:
Enter a UniProt accession number(s) for your protein(s) of interest (e.g. Q01650). Isoform annotations (e.g. Q01650-1) can be included; however, the specific isoform will not be considered as SPC scores are indexed by parent protein accession number. Up to 100 proteins separated by commas can be searched using this method.
If your data are in a form other than UniProt (e.g. ENSEMBL gene, UniGene), a conversion tool is available here Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.
Upload a csv file containing a single column of UniProt accession numbers, with the header labeled “Accession”. Do not include extra characters in the header (e.g. not 'Accession #').
Bulk conversion from a different protein ID type to UniProt is available here . Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.
With this method, the original upload file will be returned as a downloadable csv file which includes a column containing SPC Scores appended to the original input file.
If you have questions or suggestions for additional features, please contact us by email:
rebekah.gundry at unmc.edu
Additional cell surface-related information and tools can be found at our growing website:
www.cellsurfer.net
If you use any of the SurfaceGenie tools in your work, please cite the original manuscript:
Waas M, Snarrenberg ST, Littrell J, Jones Lipinski RA, Hansen PA, Corbett JA, Gundry RL, SurfaceGenie: A web-based application for prioritizing cell-type specific marker candidates, https://doi.org/10.1101/575969
Coming Soon!