Welcome to SurfaceGenie !

Integrating predictive and empirical data for rational marker prioritization

SurfaceGenie is a web app for analyzing omic datasets (e.g. proteomic, transcriptomic) to prioritize candidate cell-type specific markers of interest for immunophenotyping, immunotherapy, drug targeting, and other applications. It works by calculating the likelihood a molecule is informative for distinguishing among sample groups (e.g. cell types, experimental conditions). While a major benefit of SurfaceGenie is the ability to prioritize proteins that are localized to the cell surface, it is also possible to analyze data without this parameter to find proteins of interest that reside in other subcellular localizations. See the descriptions for each of the four permutations of the scoring algorithm.

SurfaceGenie works well with quantitative proteomic and transcriptomic datasets, but others are also possible (see below). All calculations performed within SurfaceGenie are context-dependent, meaning that the tools will consider all data within a single dataset input (which may contain multiple experiments and/or cell types). If a user performs a comparison and subsequently determines additional data should be considered, a new file containing all data for the new comparison is required.


Scoring Permutations

GenieScore: Use to prioritize surface proteins that have disparate levels of abundance/expression.

IsoGenieScore: Use to prioritize surface proteins that have similar, high levels of abundance/expression.

OmniGenieScore: Use to prioritize any molecules (genes/proteins) that have disparate levels of abundance/expression.

IsoOmniGenieScore: Use to prioritize any molecules (genes/proteins) that have similar, high levels of abundance/expression.


Overview of Inputs and Outputs

Input:

SurfaceGenie accepts a .csv file containing a list of proteins (UniProt Accession) and a surrogate value representative of abundance (e.g. number of peptide spectrum matches, peak area, FKPM, RKPM) identified within a set of samples. There is no limit to the number of samples that can be analyzed in a single file. SurfaceGenie has SPC datasets for human, mouse, and rat.

Data Processing:

SurfaceGenie calculates the dot product of three independent scores:

  1. Surface Protein Consensus (SPC) score
    A predictive measure of the likelihood that a particular protein can be present at the cell surface. This value is a sum of the number of predictive datasets for which a protein has been predicted to be localized to the cell surface. Scores range 0-4. For more details on the predictive datasets used, see the References tab.
  2. Distribution Score
    A measure of how evenly or unevenly distributed a protein is among multiple samples within a comparison dataset. It is based on the Gini coefficient for calculating statistical dispersion of values. Scores range 0 - 1/(1-N).
  3. Signal Strength
    An approximate measure of protein abundance for cell types in which a protein is observed. Proteins at the lower limit of detection are of lower priority than those with more observations, because it is expected that those of higher abundance will practically serve as more accessible markers for downstream technologies. Scores typically range 0 ~ 4 .

Output:

  • CSV Download
    Columns of selected data types (e.g. SPC score, CD molecule annotation, etc) are appended to each entry in the original input file
  • Plots
    Scores from each of the 4 permutations are plotted in order of priority for all proteins within a dataset
  • SPC Histogram
    Displays the distribution of SPC scores

SPC Score Lookup

This feature enables users to obtain Surface Protein Consensus (SPC) score for proteins of interest without analyzing data through SurfaceGenie. Users may perform a batch retrieval by uploading a .csv file containing UniProt Accession numbers or may search individual UniProt accession numbers.


Other Applications

Although the calculation of SPC score depends on the use of Uniprot Accession IDs (for human, mouse, or rat), the other terms used here are agnostic to the type and distribution of data. Therefore, the OmniGenieScore and IsoOmniGenieScore can be used for any type of quantitative data for which there is a desire to find measurements that are either unique or similar between all samples. This could include metabolomic data, glycomics data, or strain counts for microbiome studies.

Data Input
Scoring Options

Species

Processing Option
Markers for Specific Sample
Sample Grouping

*Please see Sample Grouping section on the Home page for instructions on how to enter grouping information.

Export Options (CSV Download Tab)

Data Upload Instructions

Data Format

A .csv file containing a list of proteins (UniProt Accession) and a surrogate value representative of abundance (e.g. number of peptide spectrum matches, peak area) identified within a set of samples.

The first column of your data file must be labeled 'Accession' with no extra characters (e.g. not 'Accession #'). This column should contain the UniProt accession numbers of the proteins in your samples. You may include isoforms. To convert from a different protein ID type to UniProt, bulk conversion is available here . Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.

Importantly, data files must be in .csv format . If you are working in Excel, click 'File --> Save As' and select csv in the drop-down menu to convert from .xlsx to .csv.

Example Data

Data Proccessing Options

GenieScore: Use to prioritize surface proteins that have disparate levels of abundance/expression.

IsoGenieScore: Use to prioritize surface proteins that have similar, high levels of abundance/expression.

OmniGenieScore: Use to prioritize any molecules (genes/proteins) that have disparate levels of abundance/expression.

IsoOmniGenieScore: Use to prioritize any molecules (genes/proteins) that have similar, high levels of abundance/expression.

Sample Grouping

Ideally, similar samples such as technical replicates or biological replicates will have values averaged or summed together into a single column. However, SurfaceGenie will carry out this step for you if you select 'Group samples'. If this box is checked, you will need to provide the grouping method as well as the column numbers for each group. For example, If columns 2, 3, and 5 of your dataset should be grouped together and columns 4 & 6 comprise another group, you should indicate the presence of 2 groups using the slider and then enter the corresponding column numbers below separated by commas: Group 1: '2, 3, 5', Group 2: '4, 6'. Remember that column 1 will contain accession numbers and cannot be grouped with other columns.


Data Export Options

CSV File

You may select data to export as columns appended to the right of your original data. The following variables are available for export:

GenieScore Components:

  • Surface Protein Concensus score (SPC) : A predictive measure of the likelihood that a particular protein can be present at the cell surface.
  • Distribution Score (Gini) : A measure of the distribution of the protein amongst samples. A higher value corresponds to a more localized distribution. Wikipedia - Gini coefficient
  • Signal strength (SS) : A weighted value of the maximum value reported among samples for the protein
  • Genie Score (GS) : SurfaceGenie's measure for the value of a protein as a potential marker of interest.

Annotations/Linkouts:

  • CD molecules (CD) : Cluster of differentiation (CD) molecules are annotated with CD nomenclature. CD molecules have validated antibodies against them and therefore are attractive candidate markers for immunodetection -based applications.
  • HLA molecules (HLA) Human leukocyte antigen (HLA) molecules are surface proteins that have high sequence similarity. As such, it is often challenging to be certain of the specific gene product based solely on peptide-level evidence particularly for Cell Surface Capture experiments. As a result, it may be useful to exclude these from consideration when attempting to identify cell surface makers for a specific cell type.
  • Number of CSPA experiments (CSPA) : The number of cell types in which this protein was observed in the Cell Surface Protein Atlas. This information can provide context for how specific a protein might be among cell types.
  • UniProt Linkout : Link to the UniProt entry for input proteins providing effortless access to additional information about candidate markers.
Plots

Several visualizations are made available by SurfaceGenie:

  • SPC Histogram : Shows the distribution of SPC scores.
  • Plots : Scores from each of the 4 permutations are plotted in order of priority for all proteins within a dataset.














Quick Lookup

Bulk Lookup

Data Upload Instructions

Quick Lookup

Enter a UniProt accession number(s) for your protein(s) of interest (e.g. Q01650). Isoform annotations (e.g. Q01650-1) can be included; however, the specific isoform will not be considered as SPC scores are indexed by parent protein accession number. Up to 100 proteins separated by commas can be searched using this method.

If your data are in a form other than UniProt (e.g. ENSEMBL gene, UniGene), a conversion tool is available here Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.

Bulk Lookup

Upload a csv file containing a single column of UniProt accession numbers, with the header labeled “Accession”. Do not include extra characters in the header (e.g. not 'Accession #').

Bulk conversion from a different protein ID type to UniProt is available here . Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.

With this method, the original upload file will be returned as a downloadable csv file which includes a column containing SPC Scores appended to the original input file.



Contact Us!

If you have questions or suggestions for additional features, please contact us by email:

rebekah.gundry at unmc.edu

Additional cell surface-related information and tools can be found at our growing website:

www.cellsurfer.net


How to reference SurfaceGenie

If you use any of the SurfaceGenie tools in your work, please cite the original manuscript:

Waas M, Snarrenberg ST, Littrell J, Jones Lipinski RA, Hansen PA, Corbett JA, Gundry RL, SurfaceGenie: A web-based application for prioritizing cell-type specific marker candidates, https://doi.org/10.1101/575969


Publications that cite SurfaceGenie

Coming Soon!


Publications that support the SPC Score

  1. Bausch-Fluck D, et al. (2018) The in silico human surfaceome. Proc Natl Acad Sci U S A 115(46):E10988-E10997.
  2. da Cunha JP, et al. (2009) Bioinformatics construction of the human cell surfaceome. Proc Natl Acad Sci U S A 106(39):16752-16757
  3. Town J, et al. (2016) Exploring the surfaceome of Ewing sarcoma identifies a new and unique therapeutic target. Proc Natl Acad Sci U S A 113(13):3603-3608
  4. Diaz-Ramos MC, Engel P, & Bastos R (2011) Towards a comprehensive human cell-surface immunome database. Immunol Lett 134(2):183-187.

Users: