Methods

The methods and procedures used to generate data objects should be described in the EML <methods> tree and its child elements. As a ‘rule of thumb,’ the content of the <methods> tree is intended to be human readable instead of machine-readable, and should fully describe all steps used when collecting and processing the data. When these steps follow an established protocol, references to external sources, such as a published article (with DOI) containing the methods, may be given, but in most cases it is preferable to include detailed methods descriptions that can be readily browsed, indexed for searching, and will remain available with the data even if links to external documents fail. In addition to content in EML <methods>, images, documentation files (e.g. txt or pdf), or code illustrating methods may be included as “other entities” in your dataset. Although the <methods> tree can appear at the <dataset>, entity, and <attributes> levels, it is recommended to describe all methods used at the dataset level. This recommendation assures that a data reuser can find all important information in one place.

A <methods> tree must contain at least one <methodStep> element, and may optionally have a <sampling> element and any number of <qualityControl> elements. The <methodStep> element may have the optional child elements <dataSource>, <instrumentation>, and <software>. Figure 9.1 outlines the likely decision points needed to choose what child elements to include in <methods>.

Methods steps (<methodStep>)

At least one <methodStep> element is required under <methods>, and each step should contain a logical portion of the methods used to create the published data; for example, sampling design, field data collection, laboratory procedures, quality control/assurance, and statistical analysis. A dataset <methods> element may contain as many <methodStep> elements as are necessary. All text describing each step must be placed within a <description> element, which is a TextType (see Appendix B). For <description> elements that require complex formatting or layouts, the standard TextType elements (<section>, <para>, etc.), Markdown formatting (<markdown>), and LaTex typesetting may be used. Below are additional, optional child elements to <methodStep>.

Data provenance (<dataSource>)

The optional <dataSource> element is for creating references to published data that was used as source data for the dataset being described (i.e. a dataset’s provenance). For instance, if the data being described by EML is mean annual temperature values calculated from hourly temperature measurements published in a separate dataset, the <dataSource> element could be used to identify and provide a reference to these source data. This element may contain a fairly complete set of child elements describing the source data, including a <title>, <creator>, and <contact>, but the most important child element is a <distribution> element containing the source dataset’s DOI or other URL (distribution/online/url). It is strongly recommended to include <dataSource> elements when publishing any dataset derived from other sources.

Context note: The <dataSource> element is used by the EDI repository’s provenance tracking system for linking between derived and source data packages. For more information, see additional data repository resources from EDI.

Example 9.1: An example of a <methods> element with one <methodStep> element that includes a <dataSource> child element to indicate data provenance. The <description> element uses TextType <section>, <title>, and <para> formatting elements, but note that it is not sufficiently detailed for a derived dataset.

<methods>
  <methodStep>
    <description>
      <section>
        <title>Data collection</title>
        <para>
          We utilize NPP data collected from 1906 to 2006 from the FRS
          site. The FRS NPP data unit definition is kg/m\^2/yr.
        </para>
      </section>
    <section>
      <title>Quality control and analysis</title>
        <para>These data were cleaned before analysis.</para>
    </section>
    </description>
    <dataSource>
      <title>NPP data from FRS 1906 to 2006</title>
      <creator>
        <organizationName>Fictitious Research Site</organizationName>
      </creator>
      <distribution>
        <online>
          <onlineDescription>This DOI link references a source dataset that was used to create this derivative dataset.</onlineDescription>
          <url function="information">
            https://doi.org/10.6073/pasta/91789c93a8930c3091bc5849060ff672
          </url>
        </online>
      </distribution>
      <contact>
        <organizationName>Fictitious Research Site</organizationName>
        <positionName>Information Manager</positionName>
        <electronicMailAddress>
          data.manager@ficstate.edu
        </electronicMailAddress>
      </contact>
    </dataSource>
  </methodStep>
</methods>

Instrumentation and software

The <instrumentation> and <software> elements are both optional and only occasionally used. Both are most common in disciplines and projects that rely on extensive instrumentation and data post-processing, for example marine science, micrometeorology, sensor networks, or bioinformatics datasets. There are ongoing efforts to create standardized metadata and controlled vocabularies for these categories of metadata, which may improve their utility and ease-of-use in the future.

The <instrumentation> element may contain a full description of the instruments used, including manufacturer, model, calibration dates and accuracy. Changes in instrumentation and dates of changes should be mentioned earlier under the <description>. The <software> element describes software used to process the data. Include relevant details such as software title, author, implementation (hardware, operating system), and version.

Example 9.2: An example of a <methods> element with one <methodsStep>. The <description> element uses a <markdown> element for complex layouts, and there are two <instrumentation> child elements that indicate the equipment used to collect the data. Note that descriptive markdown text has been abbreviated to save space where indicated by ellipses.

<methods>
  <methodStep>
    <description>
      <markdown>
## Sampling design and supplies for arthropod biodiversity monitoring

At each monitoring site, between 10 and 21 pitfall traps were installed in randomized locations...

**Supplies used:**

* pitfall traps - P-16 plastic Solo cups with lids
* metal spades and large bulb planters (to dig holes in which to put traps)
* 70% ethanol (to preserve specimens)
* Qorpak glass jars with lids from the VWR Corporation, 120ml (4oz), cap size 58-400 (comes included), Qorpak no. 7743C, VWR catalog no. 16195-703.

## Data collection procedures

... On each collection date, all trapped taxa were retrieved from the traps and returned to the lab. In the lab arthropods were counted, measured (body length in mm), and taxa were identified to Family. Specimens were preserved...

## Tissue isotope analysis

... subsamples of arthropods were sorted by taxon, oven dried, ground in liquid nitrogen, and then analyzed in a combustion CHN elemental analyser coupled to an isotope ratio mass spectrometer (IRMS)...
      </markdown>
    </description>
    <instrumentation>Fisons 1110 CHN elemental analyzer; CE Elantech, Inc., Lakewood, NJ, USA</instrumentation>
    <instrumentation>Finnigan DELTAplus Advantage mass spectrometer; Thermo Scientific Inc., West Palm Beach , FL, USA</instrumentation>
  </methodStep>
</methods>

Citations

Any <methodStep> element may contain an indeterminate number of <citation> child elements for documenting references to external sources. The <citation> element is a CitationType designed for bibliographic information, which can be useful when referencing, for example, a methods paper or published protocol used to collect the data. With the introduction of new citation and referencing features in EML 2.2, it is no longer recommended to place <citation> elements within a <methodStep> - use the <literatureCited> element instead, preferably at the <dataset> level. See Chapter 8 and Appendix B for more information.

Optional <methods> child elements

There are two other optional children of the <methods> element: <sampling> and <qualityControl>. These elements are designed to contain very specific information about sampling procedures and post-collection data handling. In most cases this information is easy to include in one or more <methodsStep> elements as described above, but if more detailed and structured descriptions of sampling and QA/QC procedures are needed these elements may be used. See Appendix A for further information.

Decision flowchart

Figure 9.1 This flowchart outlines decision points for what children and content to include in an EML <methods> element.

A flowchart for determining the structure and content of a methods element

XPaths referenced in this chapter

Dataset methods: /eml:eml/dataset/methods

Dataset method steps: /eml:eml/dataset/methods/methodStep

TextType method step: /eml:eml/dataset/methods/methodStep/description/section/…

Markdown method step: /eml:eml/dataset/methods/methodStep/description/markdown

Provenance metadata: /eml:eml/dataset/methods/methodStep/dataSource

Instrumentation: /eml:eml/dataset/methods/methodStep/instrumentation

Software: /eml:eml/dataset/methods/methodStep/software