This dataset was derived from NEON data portal with data product ID 'DP1.20120.001'. Details about this data product can be found at https://data.neonscience.org/data-products/DP1.20120.001.

data_macroinvertebrate

Format

A data frame (also a tibble) with the following columns:

  • location_id: Location id.

  • siteID: NEON site code.

  • unique_sample_id: Identity of unique samples (equals sampleID).

  • observation_datetime: Observation date and time.

  • taxon_id: Accepted species code, based on one or more sources.

  • taxon_name: Scientific name, associated with the taxonID. This is the name of the lowest level taxonomic rank that can be determined.

  • taxon_rank: The lowest level taxonomic rank that can be determined for the individual or specimen.

  • variable_name: The variable name(s) represented by the value column.

  • value: Density (count per square meter).

  • unit: Unit of the values in the value column.

  • estimatedTotalCount: Estimated total count (summed across size classes).

  • individualCount: Raw individual count (summed across size classes).

  • subsamplePercent: Percent of the total sample contained in the subsample.

  • release: Version of data release by NEON.

  • benthicArea: Area sampled (square meter).

  • habitatType: Habitat type sampled.

  • samplerType: Type of sampler used to collect the sample.

  • substratumSizeClass: Size class of the substratum sampled.

  • remarks: Remarks of record.

  • ponarDepth: Depth (meter) of petite ponar sample.

  • snagLength: Length (meter) of snag sampled.

  • snagDiameter: Diameter (meter) of snag sampled.

  • latitude: The geographic latitude (in decimal degrees, WGS84) of the geographic center of the reference area.

  • longitude: The geographic longitude (in decimal degrees, WGS84) of the geographic center of the reference area.

  • elevation: Elevation (in meters) above sea level.

Details

To clean the data, we:

  1. Deduplicated inv_fieldData by sampleID using slice(1) to guard against NEON's known aquatic duplicate metadata issue.

  2. Filtered inv_taxonomyProcessed to targetTaxaPresent == "Y"; summed estimatedTotalCount and individualCount across size-class records sharing the same sampleID and acceptedTaxonID.

  3. Inner-joined taxonomy to field data (inner join required because benthicArea is needed to compute density; records without field metadata are dropped).

  4. Density = estimatedTotalCount / benthicArea (count per square meter).

Note

Details of locations (e.g. latitude/longitude coordinates can be found in neon_location).

Author

Stephanie Parker, Eric Sokol