igsn:syntax

This is an old revision of the document!


IGSN Syntax Guidelines

An International GeoSample Number (IGSN), is a unique string created to identify a sample object in an online environment.

Using ABNF notation the syntax proposed syntax for an IGSN is:

 <IGSN> = <Namespace><Code>
 <Namespace> = mUPPER (an m character code denoting the namespace, where m = 3. Exceptions may be defined by IGSN e.V.)
 <Code> = nCHAR (a n character code) 
 UPPER                        = %x41-5A                       (A-Z)
 DIGIT                        = %x30-39                       (0-9)
 CHAR                         = UPPER / DIGIT / "-" / "." 
 reserved                     = ":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / "_" / "~"

The Allocating Agent ensures that the n-character code is unique within their namespaces. (see also IGSN namespace governance)

Characters a-z and A-Z in the IGSN string are case insensitive (e.g. ABC is identical to AbC). It is recommended to use upper case characters in all cases.

Characters that may be confused with digits should be avoided (I = %x49, O = %x4F, i = %x69, o = %x6F)

The resolvable handle URI of an IGSN is made up of two components, a handle prefix 10273 and the IGSN as suffix, separated by a forward slash.

Using IGSNs in Manuscripts

IGSN e.V., Allocating Agents and academic publishers ask authors to tag IGSNs in their manuscripts. This will enable publishers to link the IGSN number to the respective samples sample when the paper is published online. To tag an IGSN, please use the syntax “IGSN: <IGSN>” (e.g., IGSN: HRV0035F0). (see also http://www.geosamples.org/news/tag)

Explanation

Using ABNF notation the recommended syntax for an IGSN is

 <IGSN> = <Namespace><Code>
 <Namespace> = UPPER (a character code denoting the allocating agent, usually 3 characters)
 <Code> = CHAR (usually a 6 character code) 
 UPPER                        = %x41-5A                       (A-Z)
 DIGIT                        = %x30-39                       (0-9)
 CHAR = UPPER and DIGIT (A-Z and 0-9)

Since IGSNs are intended to be combined into a URI, in order to retain maximum compatibility with URI production rules it is suggested to limit the characters that can be used in the code to the so-called 'unreserved' + 'reserved' set, but not allow any other or percent-encoded characters which may exist on the keyboard or other character sets (e.g. no accented characters or non-latin alphabets, no space, CR, LF characters) leading to:

 <IGSN>               = <Namespace><Code>
 <Namespace>          = UPPER                        ; a character code denoting the namespace of a collection of samples (usually 3 characters)
 <Code> = CHAR (usually a 6 character code) 
 UPPER                = %x41-5A                       (A-Z)
 DIGIT                = %x30-39                       (0-9)
 CHAR = UPPER and DIGIT
 unreserved           = UPPER / DIGIT / "-" / "." 
 reserved             = ":" / "/" / "?" / "#" / "[" / "]" / "@" / "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" / "_" / "~"

The Allocating Agent ensures that element <Code> is unique within their namespace.

Please note:

  • Characters 'a' - 'z' and 'A' - 'Z' in the IGSN string are case insensitive (e.g. ABC is identical to AbC). These characters in the IGSN string are converted to upper case upon registration and resolution. If an IGSN were registered as ABC, then abc would resolve it and a later attempt to register AbC would be rejected with an error message stating that the IGSN was already in existence. Comparison of two IGSNs (to decide if they match or not) should be done by first converting all characters 'a' - 'z' in IGSN strings to upper case, followed by octet-by-octet comparison of the entire IGSN string.
  • Consider URI Syntax (http://www.ietf.org/rfc/rfc2396.txt). UTF-8, which preserves ASCII characters, is the required coding.

IGSN members have requested to allow deviations from this recommended practice to fit the requirements of existing large core repositories.

The following is a summary of the guidelines for the IGSN:

  1. The suffix must be unique within the prefix and is case insensitive.
  2. The IGSN should be as concise as possible, in consideration of human readability. IGSNs will be displayed online and in print and will be re-typed by end users. The recommended format is a three character namespace followed by a six character sample name.
  3. In general, a DOI suffix should not be considered “derivable”. Although some IGSNs may be generated according to a formula or algorithm, it is preferable to look them up in IGSN, as there is no guarantee that a generated IGSN has been registered with IGSN or that it will resolve.
  4. Organisations assigning IGSNs may choose to adopt a consistent, logical system that can be easily documented and readily understood by employees of your organization. This helps to ensure the uniqueness of assigned IGSNs and makes it easier for the task of assigning IGSNs to be passed from one employee to the next. You might therefore want it to include existing internal identifiers in use within your organization.
  5. Suffix nodes may be used to reflect hierarchical information or levels of granularity. For instance, the first node might be a multiple-letter code for a drill core, while successive nodes encode sub-samples taken from the drill core. IGSN suffixes may be extensible, and the suffix nodes may be used for this purpose. For instance, in the future, further sub-samples taken from already subsampled materials might be assigned IGSNs. In trying to keep IGSNs as short as possible careful consideration should be taken before adopting a naming scheme that makes use of extending existing IGSN names.

Back to IGSN overview

igsn/syntax.1368081479.txt.gz · Last modified: 2013/05/09 06:37 by jklump