Skip to contents

Process a data frame with taxonomic names and add standardized matches. This is a batch wrapper around `match_taxonomic_names()` that returns the original data with added columns for the best match.

Uses SQL-side fuzzy matching for optimal performance with slow connections.

Usage

standardize_taxonomic_batch(
  data,
  name_column,
  method = c("auto", "exact", "genus_constrained", "fuzzy"),
  min_similarity = 0.3,
  include_synonyms = TRUE,
  include_authors = FALSE,
  con = NULL,
  verbose = TRUE,
  keep_all_matches = FALSE
)

Arguments

data

A data frame or tibble containing taxonomic names

name_column

Name of column containing taxonomic names (quoted or unquoted)

method

Matching method: "auto" (default), "exact", "genus_constrained", "fuzzy"

min_similarity

Minimum similarity score (0-1, default: 0.3)

include_synonyms

Include synonym information (default: TRUE)

include_authors

Try matching with author names (default: FALSE)

con

Database connection (if NULL, will call call.mydb.taxa())

verbose

Show progress messages (default: TRUE)

keep_all_matches

Keep all matches (default: FALSE, only keeps best match)

Value

The input data frame with added columns: - matched_name: Best matching name from backbone (or NA if no match) - idtax_n: Taxa ID for matched name - idtax_good_n: Accepted taxa ID (for synonyms) - match_method: How the match was found - match_score: Similarity score - match_genus: Matched genus - match_species: Matched species epithet - match_family: Matched family - is_synonym: Whether match is a synonym - accepted_name: Accepted name (if synonym) If keep_all_matches = TRUE, returns one row per match with match_rank column

Author

Claude Code Assistant

Examples

if (FALSE) { # \dontrun{
# Standardize names in a data frame
data <- tibble(
  plot_id = c(1, 1, 2),
  tree_id = c("A01", "A02", "B01"),
  species = c("Pericopsis elata", "Garcinea kola", "Brachystegia laurentii")
)

# Add best match for each name
data_matched <- standardize_taxonomic_batch(data, name_column = "species")

# Keep all matches (for manual review)
data_all_matches <- standardize_taxonomic_batch(
  data,
  name_column = "species",
  keep_all_matches = TRUE
)
} # }