---++ Germplasm identifiers


Reply to [[GUIDsForCollectionsAndSpecimens][Donalds questions]] on the use of identifiers in the germplasm domain with examples from the [[http://www.nordgen.org/ngb][Nordic Gene Bank (NGB)]].

Germplasm is used here to describe conserved samples of the economically valuable crops and the crop wild relatives. [[http://en.wikipedia.org/wiki/Germplasm][Wikipedia describe germplasm]] as "_the genetic resources, or more precisely the DNA of an organism and collections of that material_".

See also a further description of other [[GermplasmIdentificators][Germplasm Identifiers here]] and a discussion of multiple [[GermplasmDatProviders][Germplasm Data Providers]].

[GUIDsForGermplasm]

----


---++ Specimen identifiers

1 What identifiers (how many per specimen) get assigned to specimens in your organisation or domain (field numbers, catalogue numbers, etc.)?
2 What is the scope of uniqueness for each of these identifiers (notebook page, collector, database, institution, global, etc.)?
3 Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)?
4 Can you give examples of how these identifiers are used to retrieve the specimen and/or information on the specimen?
5 Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?
6 In the case of subsamples from a specimen, can you identify issues around associating the sample and associated information with the source specimen and associated information?


---+++++ 1 What identifiers (how many per specimen) get assigned to specimens in your organisation or domain (field numbers, catalogue numbers, etc.)?

The central unit of the germplasm genebank collections are the conserved accession. The germplasm accessions are assigned *accession numbers* (catalogue number) unique within each collection. The original germplasm sample originates from a collecting/gathering event or a breeding/cultivation event identified by a *collecting sample number*, a *breeders line number* or simliar. Each time the germplasm accession is planted and regenerated a *field number* is assigned and recorded (NGB). These numbers decribe however three different 'units': original germplasm (source), conserved accession (conservation mandate) and physical seed sample (with harvest year). A further description of other [[GermplasmIdentificators][Germplasm Identifiers here]].


---+++++ 2 What is the scope of uniqueness for each of these identifiers (notebook page, collector, database, institution, global, etc.)?

The identifier of the *original sample* is unique within the collecting expedition. In the Nordic Gene Bank the intials of the collecting person followed by the collecting date and a serial number by collector and day is used. There is no mechanism in place with the ambition of bulding a global unique number. The breeding lines have various methods of assigning the identificator sometimes unique only to the breeders notebook page.
The germplasm *accession numbers* are unique within the collection and most often within the holding institute. The *field number* can be as simple as a serial number for the rows (x) combined with the serial number for the rows in the other direction (y) in the physical agricultural field, unique within each regeneration field plot.


---+++++ 3 Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)?

The original germplasm is 'created' from a collecting/gathering event or a breeding/cultivation event. During the collecting event the collected sample is assigned a collecting accession number. During the breeding event the germplasm is often acciciated with a breeders line number. The collecting site with longitude and latitude and a site description is important to describe this event. The pedigree decribing the crosses or source germplasm used to 'create' the new germplasm is also important to decribe the original sample from this event.

After the 'creation event' the germplasm is included in a germplasm collection and conserved by a genebank institute. The genebank assigns a germplasm accession number to the sample. If the original sample is included in more than one genebank collection, it is most often assigned a new accession number from each genebank institute collections.

Both the accession number and the collecting number as well as other standardized information about the origin of the germplasm sample follow the sample when distributed to research or breeding purposes. The *accession number* is the main identifier used for linking data like for example qualitative or characteristic observations to the germplasm sample.


---+++++ 4 Can you give examples of how these identifiers are used to retrieve the specimen and/or information on the specimen?

Most genebanks have online searchable catalogues of their germplasm collections. The most profiled unit key is the accession number. When scientists or breeders request to recieve living material they are requested to do so by indicating the accession number of interest.


---+++++ 5 Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?

The accession number, collecting number etc cannot be replaced by a GUID. These numbers already exists in too many context and in various printed forms and will remain the most important identifier for refering to germplasm samples. Documentation of germplasm is however in constant development and I believe most (local) germplasm information systems could be motivated to include a resolution of a GUID if this will add significant value through global resolution and interoperability. One major concer is the search for duplication of the original source germplasm duplicated through different accession numbers by different genebank institutes. A solution to improve the linking of conserved germplasm accessions to the source event could provide such an incentive to implement the GUID in the local institute systems/databases. But only as a supplement to the existing identifiers.


---+++++ 6 In the case of subsamples from a specimen, can you identify issues around associating the sample and associated information with the source specimen and associated information?

The germplasm as received is conserved as a living genebank accession. The accession loose viability with time and the amount of biological material (most often seeds) is reduced from distribution. If the viability/ germination is too low or the amout of living material too low the accession is sent out for regeneration. Each regeneration cycle is stored as a sub-sample (unique harvest year) under the same accession number.

Some accessions have the spike (head of a cereal crop) stored for reference or display of phenotype. Other crops can be stored on formalin, dried etc for the same purpose of phenotype reference or display. A herbarium sample of the accession is not entirely uncommon. These sub samples are often labeled with the accession number, collecting number, breeding line or commercal cultivar name.

Some leaf samples (for DNA extraction) can be stored and labled (in the database) by accession number (and if available which harvest year sample) used.


----


---++ Collection identifiers

7 How are your specimens organised into larger identifiable sets (collections, named collections, databases, institutions, etc.)?
8 What identifiers get assigned to each of these sets in your organization or domain (institution codes, collection codes, Index Herbarium acronyms, etc.)?
9 Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)?
10 Can you give examples of how these identifiers are used to locate the set and/or information on the set?
11 Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?


---+++++ 7 How are your specimens organised into larger identifiable sets (collections, named collections, databases, institutions, etc.)?

The genebank accessions are divided in separate datasets by the collecting expedition of the collecting event. Important research reference collections can be donated to the genebank and given a conservation mandate. The donated collection is identifiable as a dataset. The major crops like potato, cereals (wheat, maize, rice), vegetables, oil crops, fruit, forrage crops etc are dived in separate datasets (often for management purpose as the experts consulted have knowledge by crop group). Conservation status divide the accessions in longterm storage (for eternity) and short term (untill seed stock runs empty). Collected wild material is separated in separate datasets from commercial or locally cultivated material.


---+++++ 8 What identifiers get assigned to each of these sets in your organization or domain (institution codes, collection codes, Index Herbarium acronyms, etc.)?

The collecting expeditions are assigend codes. Often a code followed by the year, e.g. ISL2003, IN79 etc. The donated [[http://tor.ngb.se/sesto/index.php?scp=ngb&thm=sesto&lev=&rec=&lst=gbc][research reference]] collections are often assigned a code based on the topic or the name of the scientist.


---+++++ 9 Can you explain the life cycle of each of these identifiers (who assigns them, how they are subsequently tracked)?

The identifiers of the genebank sub collections (sub datasets) are assigned by the genebank staff. In the case of the research reference collections often based on a common used name for the particular dataset before donated to and mandated by the genebank. It is less common to track the sub collections and more common to track individual accessions (by accession number).


---+++++ 10 Can you give examples of how these identifiers are used to locate the set and/or information on the set?

The collecting expedition code is used to search 'duplicated' germplasm from the same collecting event.


---+++++ 11 Would there be any social or technical roadblocks to replacing these identifiers with a single identifier that was guaranteed to be unique?

The identifiers for the sub collections are less defined and less referenced and could perhaps be replaced (or rather complemented) by a GUID.


----

Category [[GUIDsForCollectionsAndSpecimens][Donalds questions]], [[GUIDUseCases][GUID Use Cases]], [[GermplasmIdentificators][Germplasm Identifiers]]

CategoryUseCases