Welcome to SurfaceGenie !

Integrating predictive and empirical data for rational marker prioritization

SurfaceGenie is a tool for analyzing proteomic datasets to identify proteins of interest for immunophenotyping, immunotherapy, drug targeting, and other applications. It works by prioritizing the likelihood that a protein is informative for distinguishing among sample groups (i.e. cell types, experimental conditions). SurfaceGenie generates a score for each protein based on how likely it will be found on the cell surface, the number of samples it is observed in within a comparison set, and the magnitude of the measurement variable (i.e. relative abundance). While a major benefit of SurfaceGenie is the ability to prioritize molecules that are localized to the cell surface, it is also possible to analyze data without this parameter to find proteins of interest that reside in other subcellular localizations. SurfaceGenie works well with approaches that specifically identify cell surface proteins (e.g. Cell Surface Capture) and more generic approaches (e.g. analyses of whole cell lysate). The SurfaceGenie score is context-dependent, meaning that the tool will consider all data within a single dataset input (which may contain multiple experiments and/or cell types). If a user performs a comparison and subsequently determines additional data should be considered, a new file containing all data for the new comparison is required.

If you use SurfaceGenie in your research, please cite the article:

{ADD Reference and PUBMED LINK HERE}


SurfaceGenie Web Tools

SurfaceGenie

Input:

SurfaceGenie accepts a .csv file containing a list of proteins (UniProt Accession) and a surrogate value representative of abundance (e.g. number of peptide spectrum matches, peak area) identified within a set of samples. There is no limit to the number of samples that can be analyzed in a single file.

Data Processing:

SurfaceGenie calculates the dot product of three independent scores:

  1. Surface Protein Consensus (SPC) score
    A predictive measure of the likelihood that a particular protein can be present at the cell surface. This value is a sum of the number of predictive datasets for which a protein has been predicted to be localized to the cell surface. Scores range 0-4. For more details on the predictive datasets used, click here .
  2. Distribution Score
    A measure of how evenly or unevenly distributed a protein is among multiple samples within a comparison dataset. It is based on the Gini coefficient for calculating statistical dispersion of values. Scores range 0 - 1/(1-N).
  3. Signal Strength
    An approximate measure of protein abundance for cell types in which a protein is observed. Proteins at the lower limit of detection are of lower priority than those with more observations, because it is expected that those of higher abundance will practically serve as more accessible markers for downstream technologies. Scores typically range 0 ~ 4 .
SPC Score Lookup

This feature enables users to obtain Surface Protein Consensus (SPC) score for proteins of interest without analyzing data through SurfaceGenie. Users may perform a batch retrieval by uploading a .csv file containing UniProt Accession numbers or may search individual UniProt accession numbers.

Data Input
Scoring Options

Species

Processing Option
Markers for Specific Sample
Sample Grouping

*Please see Sample Grouping section on the Home page for instructions on how to enter grouping information.


Export Options (for CSV Download Tab)

Data Upload Instructions

Data Format

A .csv file containing a list of proteins (UniProt Accession) and a surrogate value representative of abundance (e.g. number of peptide spectrum matches, peak area) identified within a set of samples.

The first column of your data file must be labeled 'Accession' with no extra characters (e.g. not 'Accession #'). This column should contain the UniProt accession numbers of the proteins in your samples. You may include isoforms. To convert from a different protein ID type to UniProt, bulk conversion is available here . Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.

Additionally, data files must be in csv format. If you are working in Excel, click 'File --> Save As' and select csv in the drop-down menu to convert from .xlsx to .csv.

Example Data

Data Proccessing Options

Surface Protein Concensus (SPC) Score Consideration

If you are interested in finding cell surface markers, you will want to consider SPC score when calculating the Genie Score. This is the default setting. If you wish to ignore the SPC score for your proteins when generating Genie Scores, you may uncheck this option and the SPC score will be set to 1 for all proteins and will not be weighed into the Genie Score. You can confirm this in the 'CSV' tab and then uncheck 'SPC' in the export options to remove this from the download file. This is a feature designed to enable identification of molecules that may differ among cell types but that may be localized inside the cell.

HLA Molecule Exclusion

Human leukocyte antigen (HLA) molecules are typically found on the cell surface of most cell types and due to high sequence similarity among these proteins (e.g. HLA-A3 vs. HLA-A30), it is often challenging to be certain of the specific gene product based solely on peptide-level evidence. As a result, it may be useful to exclude these from consideration when attempting to identify cell surface makers for a specific cell type.

Find Markers for a Specific Sample

If you are interested in identifying markers that are present in a specific sample (e.g. positive selection marker for a cell type or experimental condition), SurfaceGenie can exclude proteins that are not observed in that sample. To do this, select the option “Find markers for specific sample”. A text box will then appear. In the text box, enter sample name of interest and make sure it exactly matches what is contained in the file header (i.e. 'd00' for the example dataset). If you have also selected to have SurfaceGenie group your samples (see 'Sample Grouping below') then you may also indicate a group (i.e. 'Group 1').

Sample Grouping

Ideally, similar samples such as technical replicates or biological replicates will have values averaged or summed together into a single column. However, SurfaceGenie will carry out this step for you if you select 'Group samples'. If this box is checked, you will need to provide the grouping method as well as the column numbers for each group. For example, If columns 2, 3, and 5 of your dataset should be grouped together and columns 4 & 6 comprise another group, you should indicate the presence of 2 groups using the slider and then enter the corresponding column numbers below separated by commas: Group 1: '2, 3, 5', Group 2: '4, 6'. Remember that column 1 will contain accession numbers and cannot be grouped with other columns.


Data Export Options

Plots

Several visualizations are made available by SurfaceGenie:

  • SurfaceGenie Plot: SurfaceGenie scores plotted in order of priority for all proteins in a dataset.
  • SPC Histogram: Shows the distribution of SPC scores.
  • Clustered Heatmap: Visualize the relationship among samples within a dataset based on the relative abundance measurement contained in the .csv file.
  • Distribution Score: Shows the distribution of Genie Scores.
CSV File

You may select data to export as columns appended to the right of your original data. The following variables are available for export:

  • Surface Protein Concensus score (SPC): A predictive measure of the likelihood that a particular protein can be present at the cell surface.
  • Distribution Score (Gini): A measure of the distribution of the protein amongst samples. A higher value corresponds to a more localized distribution. Wikipedia - Gini coefficient
  • Signal strength (SS): A weighted value of the maximum value reported among samples for the protein
  • Genie Score (GS): SurfaceGenie's measure for the value of a protein as a potential marker of interest.
  • CD molecules (CD): Cluster of differentiation (CD) molecules.
  • Number of CSPA experiments (CSPA-NE): ---
  • UniProt Linkout: Link to the UniProt for information on the protein.














Quick Lookup

Bulk Lookup

Data Upload Instructions

Quick Lookup

Enter a UniProt accession number(s) for your protein(s) of interest (e.g. Q01650). Isoform annotations (e.g. Q01650-1) can be included; however, the specific isoform will not be considered as SPC scores are indexed by parent protein accession number.

If your data are in a form other than UniProt (e.g. ENSEMBL gene, UniGene), a conversion tool is available here Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field. Up to 100 proteins separated by commas can be searched using this method.

Bulk Lookup

Upload a csv file containing a single column of UniProt accession numbers, with the header labeled “Accession”. Do not include extra characters in the header (e.g. not 'Accession #').

Bulk conversion from a different protein ID type to UniProt is available here . Under 'Select options', select your ID type in the 'From' field and then 'UniProt KB' in the 'To' field.

With this method your upload file will be returned as a file available for download which includes a column for SPC scores.



Contact Us!

If you have questions or suggestions for additional features, please contact us by email:

rgundry at mcw.edu

Additional cell surface-related information and tools can be found at our growing website:

www.cellsurfer.net


How to reference SurfaceGenie

Latest Version

www.cellsurfer.net

Previous Versions

www.cellsurfer.net


Papers referencing SurfaceGenie

Other Cites Go Here

www.cellsurfer.net