PSI-Causal_Interaction TAB 1.0 Format Introduction The PSI causal_Interaction format TAB is inspired by the PSI-MI 2.7 standard (1) and is suited to capture causal interactions among biological entities. PSI causal_Interaction format TAB only describes binary interactions, one pair of interactors per row. Columns are separated by tabulations. (1) http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&pubmedid=17925023 Column definitions The column contents should be as follows: Unique identifier for interactor A, represented as databaseName:identifier, where databaseName is the name of the corresponding database as defined in the PSI-MI controlled vocabulary*, and identifier is the unique primary identifier of the molecule in the database. Even though identifiers from multiple databases can be separated by "|", it is recommended to give only one identifier in this column. It is recommended that proteins be identified by stable identifiers such as their UniProtKB or RefSeq accession number. Small molecules or drugs should have Chebi or PubChem identifiers, nucleic acids should have embl/ddbj/genbank identifiers and gene should have entrez gene/locuslink, ensembl, or ensemblGenome identifiers, protein complexes should have Complex Portal ID (e.g. complex_id:EBI-1224506), molecule sets must be defined and should have a reference ID (e.g. SIGNOR_ID: SIGNOR-PF1), phenotypes should have a EFO ID or a EDAM ID or a GO xref, stimuli should have a EFO ID or a EDAM ID, biological processes should have a GO xref. MANDATORY. Unique identifier for interactor B. MANDATORY. Alternative identifier(s) for interactor A, represented as databaseName:ac, where databaseName is the name of the corresponding database as defined in the PSI-MI controlled vocabulary*, and ac is the primary identifier of the molecule in the database. Multiple identifiers separated by "|". It is recommended to only give database identifiers in this column. Other cross references for interactor A such as GO xrefs should be moved to the column 'Interactor xrefs A' and interactor names such as gene names should be moved to the column 'Alias A'. Ex: refseq:NP_001013128|ensembl:ENSRNOP00000012946. NON MANDATORY Alternative identifier(s) for interactor B. NON MANDATORY Alias(es) interactor A, separated by "|". Representation as databaseName:name(alias type), where databaseName is the name of the corresponding database as defined in the PSI-MI controlled vocabulary*, name is the alias name and alias type is the name of the corresponding alias type as defined in the PSI-MI controlled vocabulary*. In the absence of databaseName, one can use unknown. Multiple names separated by "|". In parenthesis, 'display_short' and 'display_long' are used to describe what name can be used for network display. Ex: uniprotkb:Tf(gene name)|uniprotkb:Serotransferrin(recommended name)|uniprotkb:Tf(display_short). NON MANDATORY Alias(es) interactor B. NON MANDATORY. Biological Effect interactor A. NON MANDATORY. Biological Effect interactor B. NON MANDATORY. Interaction detection method(s), taken from the corresponding PSI-MI controlled Vocabulary*, and represented as databaseName:identifier(methodName), separated by "|". As the detection methods are taken from the PSI-MI ontology, the database name is 'psi-mi'. NOT MANDATORY. Publication 1st author(s), surname(s) of the publication(s) followed by 'et al.' and the publication year in parenthesis, e.g. Ciferri et al.(2005). Separated by "|". MANDATORY. Publication Identifier(s), identifier(s) of the publication in which this interaction has been shown. Database name taken from the PSI-MI controlled vocabulary, represented as databaseName:identifier. Multiple identifiers separated by "|". It is recommended to give one pubmed id per MITAB line. MANDATORY. Causal regulatory mechanism, taken from the corresponding PSI-MI controlled vocabulary*, and represented as dataBaseName:identifier(interactionType), separated by "|". NOT MANDATORY. Source database(s) and identifiers, taken from the corresponding PSI-MI controlled vocabulary*, and represented as databaseName:identifier(sourceName). As the detection methods are taken from the PSI-MI ontology, the database name is 'psi-mi'. Multiple source databases can be separated by "|". When the interaction has been imported and reported by different sources, it is recommended to give the original source plus the source that currently reports the interaction. Ex: psi-mi:"MI:0469"(intact)|psi-mi:"MI:0923"(irefindex). MANDATORY. NCBI Taxonomy identifier for interactor A. represented as taxid:identifier(organismName) where the identifier is the taxon id of the organism and organism name can either be the common name or scientific name. Even though multiple identifiers can be separated by "|", it is recommended to have one organism per interactor per MITAB line. If both scientific name and common name are given, they should be represented with : taxid:id1(common name1)|taxid:id1(scientific name1). Note: Currently no taxonomy identifiers other than NCBI taxid are anticipated, apart from the use of -2 to indicate "chemical synthesis" and -3 indicates "unknown". MANDATORY. NCBI Taxonomy identifier for interactor B. MANDATORY. Interaction identifier(s) in the corresponding source database, represented by databaseName:identifier. It is recommended to always give a unique identifier per interaction (binary and n-ary). MANDATORY. Confidence value(s). Denoted as scoreType:value where scoreType is taken from the corresponding PSI-MI controlled vocabulary*. Multiple scores separated by "|". Ex: author-score:0.60|author-score:high|intact-miscore:0.36784992. NOT MANDATORY. Complex expansion. TO BE REMOVED Biological role A , taken from the corresponding PSI-MI controlled vocabulary*, and represented as dataBaseName:identifier(BiologicalRoleName), separated by "|". As the biological roles are taken from the PSI-MI ontology, is recommended to assign Biological role A to the modulator entity. MANDATORY. Biological role B , taken from the corresponding PSI-MI controlled vocabulary*, and represented as dataBaseName:identifier(BiologicalRoleName), separated by "|". As the biological roles are taken from the PSI-MI ontology, is recommended to assign Biological role B to the modulated entity. MANDATORY. Experimental role A , taken from the corresponding PSI-MI controlled vocabulary*, and represented as dataBaseName:identifier(experimental role name), separated by "|". As the experimental roles are taken from the PSI-MI ontology, the database name is 'psi-mi'. NOT MANDATORY. Experimental role B .NOT MANDATORY. Type Interactor A , taken from the corresponding PSI-MI controlled vocabulary*, and represented as dataBaseName:identifier(InteractorTypeName), separated by "|". As the interactor types are taken from the PSI-MI ontology, the database name is 'psi-mi'. MANDATORY. Type Interactor B. MANDATORY. Xref for interactor A, represented as databaseName:ac(text), where databaseName is the name of the corresponding database as defined in the PSI-MI controlled vocabulary*, and ac is the primary accession of the molecule in the database. For example the gene ontology cross references associated. The text can be used to describe the qualifier type of the cross reference (see the corresponding PSI-MI controlled vocabulary*) or could be used to give the name of the GO term in case of cross references to ontology databases. Multiple cross references separated by "|". This column aims at adding more information to describe the interactor A but cannot be used to identify the interactor A. If some sequence database accessions are ambiguous (Ex : uniprot secondary accessions that are shared between different uniprot entries and so cannot be used as identifiers of interactor A), it is possible to report them in this column. Ex: go:"GO:0003824"(catalytic activity). NOT MANDATORY. Xref for interactor B. NOT MANDATORY. Xref for the interaction, represented as databaseName:ac(text), where databaseName is the name of the corresponding database as defined in the PSI-MI controlled vocabulary*, and ac is the primary accession in the database. For example the gene ontology cross references associated (components, etc.) or OMIM cross references. Multiple cross references separated by "|". The text can be used to describe the qualifier type of the cross reference (see the corresponding PSI-MI controlled vocabulary*) or could be used to give the name of the GO term in case of cross references to ontology databases. Ex:go:"GO:0005643"(nuclear pore). NOT MANDATORY. Annotations for interactor A, represented as topic:"text", where topic is the name of the topic as defined in the PSI-MI controlled vocabulary* and text is free text associated with the topic (linebreak and other MITAB reserved characters should be properly escaped, replaced and/or removed). For example comments about this interactor : comment:"sequence not available in uniprotKb". The text is optional and only a topic could be given, e.g. anti-bacterial. Multiple annotations separated by "|". NOT MANDATORY. Annotations for Interactor B. NOT MANDATORY. Annotations for the interaction, represented as topic:"text", where topic is the name of the topic as defined in the PSI-MI controlled vocabulary* and text is free text associated with the topic (linebreak and other MITAB reserved characters should be properly escaped, replaced and/or removed). For example causal statement : sentence: "We hypothesize that phosphorylation of ser523 in jak2 by erks 1 and/or 2 or other as-yet-unidentified kinases acts in a negative feedback manner. The text is optional and only a topic could be given. This column would also be used for tagging interactions and in such a case, topics for tags are also defined in the PSI-MI controlled vocabulary* except for the complex expansion tags that have their own column. Ex: internally-curated Multiple annotations separated by "|". NOT MANDATORY. NCBI Taxonomy identifier for the host organism. represented as taxid:identifier(organism name) where the identifier is the taxon id of the organism and organism name can either be the common name or scientific name. Multiple identifiers can be separated by "|". Cells and tissues cannot be described in this column. If both scientific name and common name are given, they should be represented with : taxid:id1(common name1)|taxid:id1(scientific name1). Note: Currently no taxonomy identifiers other than NCBI taxid are anticipated, apart from the use of -1 to indicate "in vitro", -2 to indicate "chemical synthesis", -3 indicates "unknown", -4 indicates "in vivo" and -5 indicates "in silico". NOT MANDATORY. Parameters of the interaction, for example kinetics. Representation as type:value(text). The type can be taken from the corresponding PSI-MI controlled vocabulary*. Multiple parameters separated by "|". Ex: kd:"2.34x1/100000". NOT MANDATORY. Creation date when the curation of the publication started. Representation as yyyy/mm/dd. NOT MANDATORY. Update date when the interaction was updated for the last time. Ex:2011-12-13 Representation as yyyy/mm/dd. NOT MANDATORY. Checksum for interactor A, for instance the ROGID of the interactor which takes into consideration both the sequence and the organism of the interactor. Representation as methodName:checksum where methodName is the name of the method used to create the checksum. It is recommended to give the ROGID and CROGID for proteins and the standard Inchi key for small molecules. Ex: rogid:UcdngwpTSS6hG/pvQGgpp40u67I9606|crogid:UcdngwpTSS6hG/pvQGgpp40u67I9606. TO BE REMOVED Checksum for interactor B. TO BE REMOVED Checksum for interaction, for instance the RIGID of the interaction. Representation as methodName:checksum where methodName is the name of the method used to create the checksum. Ex: rigid:"+++94o2VtVJcuk6jD3H2JZXaVYc". TO BE REMOVED negative Boolean value to distinguish positive interactions (false) from negative interactions (true). By default, if the column is empty ('-'), the negative value is considered to be false (positive interaction). Ex: true. TO BE REMOVED Feature(s) for interactor A: describe features for participant A such as binding sites, PTMs, tags, etc. Represented as feature_type:range(text), where feature_type is the feature type as described in the PSI-MI controlled vocabulary*. For the PTMs, the MI ontology terms are obsolete and the PSI-MOD ontology* should be used instead. The text can be used for feature type names, feature names, interpro cross references, etc. For instance : sufficient to bind:27-195,201-133 (IPR000785). The use of the following characters is allowed to describe a range position : Ô?Õ (undetermined position), ÔnÕ (n terminal range), ÔcÕ (c-terminal range), Ô>xÕ (greater than x), Ô<Õ (less than x), Ôx1..x1Õ (fuzzy range position Ex : 5..5-9..10). The character '-' is used to separate start position(s) from end position(s). Multiple features separated by '|'. Multiple ranges per feature separated by ','. However, It is not possible to represent linked features/ranges. Ex: gst tag:n-n(n-terminal region)|sufficient to bind:23-45. or binding site:23..24-46,33-33. NOT MANDATORY. Feature(s) for interactor B. NOT MANDATORY. Stoichiometry for interactor A: A numerical value describing the count of instance of the molecule participating in the interaction. If no stoichiometry is available for interactor A, the column should be empty ('-'), otherwise a positive Integer value should be given. Several specific cases should be taken into consideration : in case of auto-catalysis, only one interactor is given and the stoichiometry should be 1. In case of homodimers, homotrimers, etc., the stoichiometry of one interactor should be 0 and the stoichiometry of the other should be a valid positive Integer. In case of homo-oligomer, the stoichiometry of both interactors should be 0. Example: for self interactors e.g. a kinase occluding its kinase domain by an internal phospho-tyrosine/SH2 domain interaction, only Interactor A column will show the molecule accession number with the stoichiometry 1. The columns for Interactor B will be empty. Ex: 4. NOT MANDATORY. Stoichiometry for interactor B. NOT MANDATORY. Participant identification method for interactor A: taken from the corresponding PSI-MI controlled Vocabulary*, and represented as databaseName:identifier(methodName), separated by "|". As the identification methods are taken from the PSI-MI ontology, the database name is 'psi-mi'. Participant detection method is recommended by MIMIx so it is recommended to always give this information. Ex: psi-mi:"MI:0102"(sequence tag identification). NOT MANDATORY. Participant identification method for interactor B. NOT MANDATORY. Causal statement, taken from the corresponding PSI-MI controlled vocabulary*, and represented as dataBaseName:identifier(interactionType), separated by "|". MANDATORY. Direct effect in other interactions: taken from the PSI PSI-MI controlled vocabulary cooperative interaction(MI:1149)Ó. NOT MANDATORY Empty columns should be represented with '-' to keep track of the columns. Syntax Columns are normally formed by fields delimited by "|", with a structure like this one: :() Due to the unsafe use of reserved characters in the values, we have recently added the possibility to surround , or with quotes if they contain a special symbol. In MI-TAB, the reserved characters are: | ( ) : \t (tabulation) Whenever this happen in your data, surround the value with double quotes: "":""("") Note that the quotes are before and after each part. The escaped data should look like in the following examples: psi-mi:"MI:0000"(a cv term) psi-mi:"MI:0000"("I can now use braces ()()() or pipes ||| here and ::colons::") If you want to use a quote within a quote, escape it: uniprotkb:P12345("a \"nice\" protein")