1 The first step In getting an accession number
Before doing anything else, authors should get a copy of a sequence data submission form. This form solicits all of the information needed to make a database entry; that is, the primary sequence data together with descriptive information such as the source of the sequenced segment (e.g., organism, strain, tissue) and the location of interesting regions within the sequence (e.g., coding regions, regulatory signals). It also contains information about data formats. The data submission form exists in both a paper and a computer-readable version; the latter can be completed using a text editor. These versions are available from the following sources:
Paper form: printed at the end of this article, from the Development editorial office and available upon request from EMBL, GenBank® and the DNA Databank of Japan (DDBJ) at the addresses given in Appendix 2.
With all releases of the EMBL and GenBank® databases since January 1987 and with DDBJ releases since January 1988.
From EMBL by electronic mail (computer network) via our file server. Anyone with access to BITNET (either directly or via a gateway) can send a request to the EMBL file server, which will automatically return a copy of the data submission form by electronic mail. Instructions for using the EMBL file server are given in Appendix I.
From EMBL, on Macintosh or IBM-compatible ( or ) floppy diskettes. Complete information on how to contact the EMBL Data Library is given in Appendix II.
From GenBank® via electronic mail or on floppy diskette. For information on requesting the form from GenBank® via Telenet, contact David Benton (+1-415-962-7360). Researchers in Japan can obtain the form by dialing up the DDBJ computer system (0559-75-6026).
2 What to submit to the EMBL Data Library
A data submission should include the following (for further details, see the data submission form itself):
the sequence itself, in computer-readable form (computer network mail, magnetic tape or IBM-compatible or Macintosh floppy diskette). Printouts will be accepted only if the authors have no access to a computer.
a completed data submission form for each submitted sequence. The form is available from the sources listed in section 1(a).
a computer network address, a telex number or a telefax number (advisable, to help speed things up, but not required).
3 How to send data to the EMBL Data Library
Data can be sent to the Data Library in one of several ways:
Electronic file transfer: files can be sent via computer network to DATASUBS@EMBL.BITNET. This BITNET address can be reached directly (by people at BITNET sites) or via various gateways from Arpanet, Usenet, JANET, etc. Ask your local network expert for help or phone us (+49-6221-387-258). ‘
Telefax to Data Submissions, EMBL Data Library. Our fax number is: +49-6221-387-306.
Normal post. See address given in Appendix II.
4 How long will It take to get an accession number?
We will process data submissions within 7 working days of receipt and send authors notification of either what accession number(s) their data have been assigned or what additional information is needed. There are several things authors can do to minimise the time it takes to get an accession number:
Be sure that submissions include all the necessary materials and that all relevant questions on the data submission form have been answered.
Check the data to be sure that they do not contain inconsistencies/errors (e.g., a stop codon in the middle of a region listed on the form as an exon).
Be sure to include either a computer network address or a telex or telefax number. If this information is not provided, notification of accession numbers will be sent by regular post. Telephoning is costly and time-consuming, and the Data Library will therefore not attempt to contact authors by phone.
Although we will process data submissions as quickly as we can, we strongly encourage authors to submit their data at or before the time they begin writing the manuscript, rather than once it is finished. This way we can process the data while the manuscript is being written, and authors will not have to delay submission of their manuscript while they wait for notification of their accession number.
It should be emphasised that authors are responsible for communicating their accession number(s) to the journal at the time they submit their manuscript; the Data Library will not contact the journal.
5 Data security
The data submission form asks authors whether their submitted data can be made available to the public immediately or whether it should be withheld until publication.
6 Updating your data
Once a database entry has been created from a submission, a copy is sent to the submittor for his/her reference and for comments or corrections. However, it often happens that the entry is correct when it is created but, with the passage of time, becomes out of date: the authors may make corrections to the sequence itself, or may discover new features of the sequence. Since such findings are generally not published, the only way to keep entries correct and up to date is if the authors communicate their new findings to the database. This can be done by normal post or electronic mail to the address given in Appendix II.
One type of update which merits separate mention is that relating to citations. Most submissions represent data not yet been accepted for publication, and therefore the journal citation is not available when the entry is created. Adding this information at a later date requires that the database staff identify which submissions correspond to which publications; while this is often straightforward, it can also be problematic, especially if the journal does not print an accession number in the article, or if the submitted and the published data are not identical. We therefore strongly encourage researchers to let us know when and where and when data they have submitted to us are published.
Appendix I. EMBL network file server
Computer users with access to BITNET (directly or via a gateway) can obtain copies of the data submission form, or of database entries, by sending commands to a file server running on the VAXcluster at EMBL. The file server facility is provided free of charge, though users may have to meet some or all of the communication costs, depending on the accounting system of their local computer service.
To use this facility, send file server commands (as electronic mail) to the address NETSERV@EMBL. BITNET. Each line of the mail message should consist of a single file server command, and nothing else. The mail can be sent over BITNET, or from any other network which has a gateway into BITNET (e.g., JANET in the UK or ARPANET in the USA).
The most important file server command, to get users started, is HELP. If the file server receives this command, it will return a help file to the sender, explaining in some detail how to use the facility.
In order to send electronic mail to a BITNET address, users must find out which command they have to use on their own local machine and how they should format the address NETSERV@EMBL.BITNET. Users who don’t already know how to do this should contact their local computer service, or if all else fails, contact the Data Library and we will do our best to help. Below are some examples which illustrate how to send commands to the file server using a VAX/VMS system that is a BITNET node running JNET software.
To send a HELP command to the file server, you could use the operating system command MAIL as follows:$ MAIL <filename> “JNET% ““NETSERV@EMBL”““
where <filename> is the name of a file containing file server commands.
To request help information the file should contain the following command:
To request a copy of the data submission form, it should contain the following GET command:
GET DATALIB: DATASUB.TXT
Users can also request specific sequences via the File Server. Information on how to do this is provided in the HELP file.
Appendix II. How to contact the nucleotide sequence databases
EMBL Data Library:
Computer network: firstname.lastname@example.org (for data submissions); email@example.com (for questions requiring a personal response)
Postal address: Data Submissions, EMBL Data Library, Postfach 10.2209, 6900 Heidelberg, Federal Republic of Germany
Telex: 461613 (embl d)
Computer network address: firstname.lastname@example.org
Postal address: GenBank® Submissions, Mail Stop K710, Los Alamos National Laboratory, Los Alamos, NM 87545, USA
DNA Databank of Japan:
Computer network: email@example.com (for data submissions); firstname.lastname@example.org (for other enquiries)
Postal address: Laboratory of Genetic Information Analysis, Center for Genetic Information Research, National Institute of Genetics, Mishima, Shizuoka 411, Japan
Telephone: +81-559-75-0771 x647
Sequence Data Submission Form
This form solicits the information needed for a nucleotide or amino acid sequence database entry. By completing and returning it to us promptly you help us to enter your data in the database accurately and rapidly. These data will be shared among the following databases: EMBL Data Library (Heidelberg, Federal Republic of Germany); GenBank (Los Alamos, NM, U.S.A. and Mountain View, CA, U.S.A), DNA Data Bank of Japan (DDBJ; Mishima, Japan); National Biomedical Research Foundation Protein Identification Resource (NBRF-PIR; Washington, D.C., U.S.A.); Martinsried Institute for Protein Sequence Data (MIPS; Martinsried, Federal Republic of Germany) and International Protein Information Database in Japan (JIPID; Noda, Japan).
Please answer all questions which apply to your data. If you submit 2 or more non-contiguous sequences, copy and fill out this form for each additional sequence. Please include in your submission any additional sequence data which is not reported in your manuscript but which has been reliably determined (for example, introns or flanking sequences). When submitting nucleic acid sequences containing protein coding regions, also include a translation (SEPARATELY from the nucleic acid sequence). Then send (1) this form, (2) a copy of your manuscript (if available) and (3) your sequence data (in machine readable form) to the address shown below. Information about the various ways you can send us your data and about formats for the sequence data is given in the following two sections.
SUBMITTING DATA TO THE EMBL DATA LIBRARY
We are happy to accept data submitted in any of the following ways: (1) Electronic file transfer: files can be sent via computer network to: DATASUBS@EMBL.EARN. This BITNET/EARN address can be reached via various gateways from Arpanet, Usenet, JANET, etc. Ask your local network expert for help or phone us. Please ensure that each line in your file is not longer than 80 characters; longer lines often get truncated when they are sent (2) Floppy disks: we can read Macintosh and IBM-compatible diskettes. Please use the ‘save as text only’ feature of your editor to save your sequence file, as otherwise we might have difficulty processing it. (3) Magnetic tapes: 9-track only (fixed-length records preferred); 800, 1600 or 6250 bpi (any blocksize); ASCH or EBCDIC character codes; any label type or unlabelled. Our address is:
EMBL Data Library Submissions Computer network DATASUBS@EMBL.BITNET
Postfach 10.2209 Telefax (+49) 6221 387 306
D-6900 Heidelberg Telephone (+49) 6221 387 258
Federal Republic of Germany
When we receive your data we will assign them an accession number, which serves as a reference that permanently identifies them in the database. We will inform you what accession number your data have been given and we recommend that you cite this number when referring to these data in publications.
If your manuscript has already been accepted for publication, the accession number can be included at the galley proof stage as a note added in proof. So that we can process your data and inform you of your accession number before you receive the galley proofs, please return this form to us as soon as possible. We suggest that the note added in proof should read approximately as follows: “The nucleotide sequence data reported will appear in the EMBL, GenBank and DDBJ Nucleotide Sequence Databases under the accession number.”
A computer-readable version of this form is available on the distribution tapes of the EMBL Data Library from Release 11 onwards and on GenBank Releases 48 onwards. The BIONET National Computer Resource for Molecular Biology (Mountain View, CA, U.S.A.) also has a copy. Feel free to use the computer-readable form rather than this printed one. In this case, the form should be filled out with a text editor and sent via computer network or normal post to the address indicated above.
FORMATS FOR SUBMITTED DATA
We would appreciate receiving the sequence data in a form which conforms as closely as possible to the following standards.
Each sequence should include the names of the authors.
Each distinct sequence should be listed separately using the same number of bases/residues per line. The length of each sequence in bases/residues should be clearly indicated.
Enumeration should begin with a “1” and continue in the direction 5’ to 3’ (or amino-to carboxy-terminus).
Amino acid sequences should be listed using the one-letter code.
Translations of protein coding regions in nucleotide sequences should be submitted in a separate computer file from the nucleotide sequences themselves.
The code for representing the sequence characters should conform to the IUPAC-IUB standards, which are described in: Nucl. Acids Res. 13: 3021-3030 (1985) (for nucleic acids) and J. Biol. Chem. 243: 3557-3559 (1968) and Eur. J. Biochem 5: 151-153 (1968) (for amino acids).
I. GENERAL INFORMATION
II. CITATION INFORMATION
III DESCRIPTION OF SEQUENCED SEGMENT
Wherever possible, please use standard nomenclature or conventions. If a question is not applicable to your sequence, answer by writing N.A.; if the information is relevant but not available, write a question mark (?).
IV. FEATURES OF THE SEQUENCE
Please list below the types and locations of all significant features experimentally identified within the sequence. Be sure that your sequence is numbered beginning with “1.”
In the column marked fill in
feature type of feature (see information below)
from number of first base/amino acid in the feature
to number of last base/amino acid in the feature
bp x, if your numbers refer to positions of base pairs in a nucleotide sequence
aa x, if your numbers refer to positions of amino acid residues in a peptide sequence
id method by which the feature was identified. E = experimentally, S = by similarity with known
sequence or to an established consensus sequence; P = by similarity to some other pattern, such as an open reading frame
comp x, if feature is located on the nucleic acid strand complementary to that reported here
Significant features include:
regulatory signals (e.g., promoters, attenuators, enhancers)
transcribed regions (e.g., mRNA, rRNA, tRNA). (indicate reading frame if start and stop codons are not present) regions subject to post-transcriptional modificaron (e.g., introns, modified bases) translated regions
extent of signal peptide, prepropeptide, propeptide, mature peptide
regions subject to post-translational modification (e.g., glycosylated or phosphorylated sites)
other domains/sites of interest (e.g., extracellular domain, DNA-binding domain, active site, inhibitory site)
sites involved in bonding (disulfide, thiolester, intrachain, interchain)
regions of protein secondary structure (e.g., alpha helix or beta sheet)
conflicts with sequence data reported by other authors variations and polymorphisms
The first 2 lines of the table are filled in with examples.
If you think you will need more space than the table below provides, please photocopy this page before you fill it out.