Using the Taxonomic Name Standardization App

Introduction

The launch_taxonomic_match_app() function provides an interactive Shiny application for standardizing taxonomic names against the Central African plant taxonomic backbone database. This visual interface is ideal for:

Exploring and cleaning taxonomic data interactively
Understanding match quality through visual feedback
Manually reviewing uncertain matches
Enriching data with species-level traits

Prerequisites

Before launching the app, ensure you have:

Database credentials configured (see setup_db_credentials())
Data to standardize in one of these formats:
- Excel file (.xlsx, .xls)
- CSV file (.csv)
A column containing taxonomic names (e.g., genus + species) or separate columns for genus, species, and family

Quick Start

Launch the app with a single command:

library(CafriplotsR)
launch_taxonomic_match_app()

Alternatively, pre-load your data:

# With R data.frame
my_data <- read.csv("tree_inventory.csv")
launch_taxonomic_match_app(data = my_data, name_column = "species_name")

# Adjust fuzzy matching sensitivity (default is 0.7)
launch_taxonomic_match_app(min_similarity = 0.5)  # More permissive matching

Step-by-Step Walkthrough

Phase 1: Initial View

When you first launch the app, you’ll see the main interface with a sidebar for configuration and tabs for different workflow phases:

Application initial view

The app uses a tabbed workflow that guides you through each phase sequentially:

Auto Match - Automatic matching
Review - Manual review of unmatched names
Export - Download results
Traits Enrichment - Add species traits

Phase 2: Upload Your Data

The first step is to provide your data. The app offers two input methods:

File Upload (Default)

Upload an Excel file using the file browser (supports .xlsx, .xls)
Upload a CSV file
Use pre-loaded R data (if you passed data parameter)

Data upload interface

For Excel files with multiple sheets, you can select which sheet to use. The app will display a preview of your uploaded data so you can verify it was read correctly.

Text Input (Copy-Paste) - NEW

For quick standardization of a few names, or when you have a list copied from another source, use the Text input method:

Text input interface

Select “Text input (paste/type)” from the input method radio buttons
Paste or type your taxonomic names in the text area
Click “Load names” to process the input

Accepted separators: - One name per line (recommended) - Comma-separated: Lophira alata, Terminalia superba, Aucoumea klaineana - Semicolon-separated: Lophira alata; Terminalia superba; Aucoumea klaineana - Tab-separated (useful when pasting from Excel)

The app automatically: - Removes empty lines and whitespace - Removes duplicate names (preserving order) - Creates a single column named taxon_name for matching

This method is ideal for: - Quick checks of a few species names - Pasting lists from emails or documents - Testing the app without preparing a file

Phase 3: Select Name Column(s)

Once data is loaded, you have two options for selecting taxonomic names:

Single Column Mode (Default)

Select one column containing the full taxonomic name:

Column selection - single mode

The dropdown menu shows all available columns from your dataset. Choose the one containing species names (typically formatted as “Genus species” or “Genus species Author”).

Multiple Column Mode (NEW)

If your data has separate columns for genus, species, and family, enable “Use multiple columns”:

Column selection - multiple columns

The app will automatically combine these columns into a single taxonomic name for matching, using a hierarchical approach: - If genus and species are available: “Genus species” - If only genus: “Genus” - If only family: “Family”

You can also optionally include an author column.

Phase 4: Automatic Matching

Click the “Start Matching” button to begin the automatic matching process. The app uses a five-tier matching strategy:

Exact match on species: Direct lookup of full name (genus + species)
Exact match on genus: Match at genus level
Exact match on family: Match at family level
Exact match on class: Match at higher taxonomic level
Fuzzy matching: Approximate string matching for remaining names

Matching in progress

The progress bar shows real-time status. For large datasets, this may take a few minutes. The sidebar displays live statistics:

Number of exact matches
Number of genus-level matches
Number of fuzzy matches
Number of unmatched names

Phase 5: Review Match Results

After matching completes, the Auto Match tab shows a summary table with all names and their match status:

Matching results summary

The results table includes:

Original name: Your input name
matched_name: Name found in backbone
match_method: How it was matched (exact_species, exact_genus, exact_family, fuzzy, manual)
match_score: Similarity score (0-1, higher is better)
idtax_n: Taxon ID in database
is_synonym: Whether matched name is a synonym
accepted_name: Current accepted name (if synonym)

Match quality indicators:

Exact match (1.0): Perfect match, no review needed
High similarity (>0.8): Very likely correct, quick review recommended
Medium similarity (0.5-0.8): Possible match, review suggested
Low similarity (<0.5): Uncertain, manual review required
No match: Requires manual selection

Phase 6: Manual Review

For unmatched or uncertain names, switch to the “Review” tab to manually review and select matches:

Manual review interface

The review interface provides two ways to find matches:

Fuzzy Suggestions Panel

Shows automatic suggestions ranked by similarity with advanced filtering options:

Fuzzy suggestions with filters

Filtering options:

Number of suggestions: Slider to show 5-30 suggestions
Minimum similarity: Adjust threshold (0.3-1.0)
Taxonomic level filter: Filter by All, Species, Genus, Family, Order, Class, or Infraspecific
Sort by: Similarity score or alphabetical order

Each suggestion card displays:

Name with color-coded similarity badge (green = high, blue = medium, yellow = low)
Taxonomic level and family
Synonym information if applicable
Select button for one-click acceptance

Manual Search Panel

For names without good suggestions, use the manual search:

Manual search interface

Type any search term to query the taxonomic backbone
Filter results by taxonomic level
View detailed information for each match
Select the correct match or mark as “unresolved”

Navigation:

Use Previous/Skip/Next buttons to browse unmatched names
Progress counter shows reviewed vs. remaining names
The app remembers your selections and automatically updates the results

Phase 7: Enrich Data with Traits

Switch to the “Traits Enrichment” tab to add species-level traits to your matched data:

Trait enrichment interface

Options:

Categorical aggregation mode:
- “mode” - Use most frequent value per taxon
- “concat” - Concatenate all unique values
Select columns to include:
- Original input names
- Corrected names
- Taxonomic IDs
- Match metadata

Available traits include:

Growth form
Wood density
Leaf traits
Ecological characteristics

The enriched data combines your matched taxa with selected traits:

Enriched data results

Note: The enriched export creates one row per unique taxon, not per input row. Input names are concatenated with pipe separators.

Phase 8: Export Results

Switch to the “Export” tab to download your standardized dataset:

Export options

Available formats:

Excel (.xlsx): Best for sharing with collaborators
CSV (.csv): Universal tabular format
RDS (.rds): R-native format preserving data types

Selectable columns:

Original data (all your input columns)
Matched IDs (idtax_n, idtax_good_n)
Corrected names (corrected_name, matched_name)
Match metadata (match_method, match_score, is_synonym, accepted_name)

A preview table shows the data before export with pagination controls.

Understanding Output Columns

The app adds these columns to your data:

Column	Description
`idtax_n`	Matched taxon ID in backbone database
`idtax_good_n`	Accepted taxon ID (for synonyms)
`matched_name`	Name found in backbone
`corrected_name`	Final standardized name
`match_method`	Matching strategy used (exact_species, exact_genus, exact_family, fuzzy, manual, unresolved)
`match_score`	Similarity score (0-1)
`is_synonym`	TRUE if matched name is a synonym
`accepted_name`	Current accepted name (if synonym)
`family`	Taxonomic family
`genus`	Taxonomic genus

Advanced Options

Language Selection

The app now supports bilingual operation with French and English interfaces. French is the default language.

In the App Interface:

A language toggle is located in the top-right corner of the app: - Click “FR” for French interface - Click “EN” for English interface

The language switch is instant and affects all UI elements including: - Tab labels - Button text - Instructions and help text - Column headers - Error messages and notifications

Setting Initial Language Programmatically:

# Launch app in English
launch_taxonomic_match_app(language = "en")

# Launch app in French (default)
launch_taxonomic_match_app(language = "fr")
# or simply:
launch_taxonomic_match_app()

The language setting is interactive - users can switch languages at any time during their session without losing work progress or data.

Adjusting Fuzzy Matching

Control matching sensitivity with the min_similarity parameter:

# Very strict - only high-quality matches
launch_taxonomic_match_app(min_similarity = 0.8)

# Default setting
launch_taxonomic_match_app(min_similarity = 0.7)

# More permissive - allows lower-quality matches
launch_taxonomic_match_app(min_similarity = 0.5)

Lower values cast a wider net but may include false positives. Higher values are more conservative but may miss valid matches.

Increasing Suggestions

Show more fuzzy match suggestions per name:

# Show top 20 suggestions instead of default 10
launch_taxonomic_match_app(max_suggestions = 20)

Useful when initial suggestions don’t include the correct match. You can also adjust this interactively in the Review tab using the slider.

Function Parameters

launch_taxonomic_match_app(
  data = NULL,           # Optional: pre-load data.frame

  name_column = NULL,    # Optional: pre-select column
  min_similarity = 0.7,  # Fuzzy match threshold (0-1)
  max_suggestions = 10   # Max suggestions per unmatched name
)

Troubleshooting

Connection Issues

Problem: “Failed to connect to database”

Solution:

# Check connection
db_diagnostic()

# Reset credentials if needed
remove_db_credentials()
setup_db_credentials()

No Fuzzy Matches Found

Problem: No suggestions appear for unmatched names

Possible causes: - min_similarity threshold too high - Taxonomic names contain typos or non-standard formatting - Names not present in the taxonomic backbone (e.g., non-African taxa)

Solutions: - Lower min_similarity: launch_taxonomic_match_app(min_similarity = 0.5) - Use the taxonomic level filter to search at genus or family level - Clean input names (remove extra spaces, fix obvious typos) - Verify names are African taxa

Slow Matching Performance

Problem: Matching takes very long for large datasets

Solutions: - Use batch processing instead: match_taxonomic_names() for programmatic workflow - Process data in chunks (split large datasets) - The app downloads the entire backbone once for efficiency, so initial load may be slow

When to Use the App vs. Programmatic Approach

Use the Shiny App when:

Exploring data interactively
You prefer visual interfaces
Dataset is small to medium size (<5,000 rows)
Need to manually review uncertain matches
Learning the matching process

Use `match_taxonomic_names()` when:

Processing large datasets (>5,000 rows)
Automating workflows in scripts
Integrating with data pipelines
Reproducibility is critical (NEVER REMOVE THE COLUMN THAT CONTAINS THE ORIGINAL NAME)
Batch processing multiple files

Example programmatic approach:

# Load data
my_data <- read.csv("tree_inventory.csv")

# Match names
matched <- match_taxonomic_names(
  names = my_data$species_name,
  min_similarity = 0.7
)

# Merge back with original data
result <- cbind(my_data, matched)

# Export
write.csv(result, "standardized_inventory.csv", row.names = FALSE)

Tips for Best Results

Clean your data first: Remove obvious typos, extra whitespace, and special characters
Understand your data: Know which taxonomic groups are in your dataset
Use multi-column mode: If you have separate genus/species/family columns, combine them for better matching
Filter by taxonomic level: Use the level filter in Review tab to find genus or family matches
Review match scores: Don’t blindly accept low-similarity matches (<0.6)
Save incrementally: Export intermediate results to avoid losing manual review work
Document parameters: Note which min_similarity value you used for reproducibility

Example Workflow

Here’s a complete workflow from start to finish:

# 1. Load your data
trees <- read.csv("forest_inventory.csv")
# Columns: plot_id, tree_number, species_name, dbh, height

# 2. Launch app with data
launch_taxonomic_match_app(
  data = trees,
  name_column = "species_name",
  min_similarity = 0.7
)

# 3. In the app:
#    - Review automatic matches in Auto Match tab
#    - Use Review tab to resolve unmatched names
#    - Apply taxonomic level filters if needed
#    - Optionally enrich with traits in Traits Enrichment tab
#    - Export as "forest_inventory_standardized.xlsx"

# 4. Continue analysis with standardized data
standardized <- readxl::read_excel("forest_inventory_standardized.xlsx")

# Now you have clean taxonomic IDs for further analysis!

This workflow ensures your taxonomic data is standardized and ready for downstream analyses like diversity metrics, trait-based analyses, or database integration.

Suggested Screenshots

To complete this documentation, the following screenshots should be captured:

app-initial-view.png - Full app interface after launch with all tabs visible
app-upload-data.png - Data upload panel with file browser and sheet selection
app-text-input.gif - Text input interface with text area and “Load names” button (NEW)
app-column-select.png - Single column selection dropdown
app-column-select-multi.png - Multiple column mode with genus/species/family selectors
app-matching-progress.png - Matching in progress with progress bar and live statistics
app-matching-results.png - Results table in Auto Match tab showing matched names
app-review-interface.png - Review tab overview with unmatched name display
app-review-suggestions.png - Fuzzy suggestions panel with filtering options (level filter, sort, slider)
app-review-manual-search.png - Manual search interface with search box and results
app-enrich-data-interface.png - Traits enrichment tab with aggregation mode and column selection
app-enrich-data-results.png - Preview of enriched data with traits
app-export-options.png - Export tab with format selection and column checkboxes