Skip to contents

Introduction

The launch_taxonomic_match_app() function provides an interactive Shiny application for standardizing taxonomic names against the Central African plant taxonomic backbone database. This visual interface is ideal for:

  • Exploring and cleaning taxonomic data interactively
  • Understanding match quality through visual feedback
  • Manually reviewing uncertain matches
  • Enriching data with species-level traits

Prerequisites

Before launching the app, ensure you have:

  1. Database credentials configured (see setup_db_credentials())
  2. Data to standardize in one of these formats:
    • Excel file (.xlsx, .xls)
    • CSV file (.csv)
  3. A column containing taxonomic names (e.g., genus + species) or separate columns for genus, species, and family

Quick Start

Launch the app with a single command:

Alternatively, pre-load your data:

# With R data.frame
my_data <- read.csv("tree_inventory.csv")
launch_taxonomic_match_app(data = my_data, name_column = "species_name")

# Adjust fuzzy matching sensitivity (default is 0.7)
launch_taxonomic_match_app(min_similarity = 0.5)  # More permissive matching

Step-by-Step Walkthrough

Phase 1: Initial View

When you first launch the app, you’ll see the main interface with a sidebar for configuration and tabs for different workflow phases:

Application initial view
Application initial view

The app uses a tabbed workflow that guides you through each phase sequentially:

  1. Auto Match - Automatic matching
  2. Review - Manual review of unmatched names
  3. Export - Download results
  4. Traits Enrichment - Add species traits

Phase 2: Upload Your Data

The first step is to provide your data. The app offers two input methods:

File Upload (Default)

  • Upload an Excel file using the file browser (supports .xlsx, .xls)
  • Upload a CSV file
  • Use pre-loaded R data (if you passed data parameter)
Data upload interface
Data upload interface

For Excel files with multiple sheets, you can select which sheet to use. The app will display a preview of your uploaded data so you can verify it was read correctly.

Text Input (Copy-Paste) - NEW

For quick standardization of a few names, or when you have a list copied from another source, use the Text input method:

Text input interface
Text input interface
  1. Select “Text input (paste/type)” from the input method radio buttons
  2. Paste or type your taxonomic names in the text area
  3. Click “Load names” to process the input

Accepted separators: - One name per line (recommended) - Comma-separated: Lophira alata, Terminalia superba, Aucoumea klaineana - Semicolon-separated: Lophira alata; Terminalia superba; Aucoumea klaineana - Tab-separated (useful when pasting from Excel)

The app automatically: - Removes empty lines and whitespace - Removes duplicate names (preserving order) - Creates a single column named taxon_name for matching

This method is ideal for: - Quick checks of a few species names - Pasting lists from emails or documents - Testing the app without preparing a file

Phase 3: Select Name Column(s)

Once data is loaded, you have two options for selecting taxonomic names:

Single Column Mode (Default)

Select one column containing the full taxonomic name:

Column selection - single mode
Column selection - single mode

The dropdown menu shows all available columns from your dataset. Choose the one containing species names (typically formatted as “Genus species” or “Genus species Author”).

Multiple Column Mode (NEW)

If your data has separate columns for genus, species, and family, enable “Use multiple columns”:

Column selection - multiple columns
Column selection - multiple columns

The app will automatically combine these columns into a single taxonomic name for matching, using a hierarchical approach: - If genus and species are available: “Genus species” - If only genus: “Genus” - If only family: “Family”

You can also optionally include an author column.

Phase 4: Automatic Matching

Click the “Start Matching” button to begin the automatic matching process. The app uses a five-tier matching strategy:

  1. Exact match on species: Direct lookup of full name (genus + species)
  2. Exact match on genus: Match at genus level
  3. Exact match on family: Match at family level
  4. Exact match on class: Match at higher taxonomic level
  5. Fuzzy matching: Approximate string matching for remaining names
Matching in progress
Matching in progress

The progress bar shows real-time status. For large datasets, this may take a few minutes. The sidebar displays live statistics:

  • Number of exact matches
  • Number of genus-level matches
  • Number of fuzzy matches
  • Number of unmatched names

Phase 5: Review Match Results

After matching completes, the Auto Match tab shows a summary table with all names and their match status:

Matching results summary
Matching results summary

The results table includes:

  • Original name: Your input name
  • matched_name: Name found in backbone
  • match_method: How it was matched (exact_species, exact_genus, exact_family, fuzzy, manual)
  • match_score: Similarity score (0-1, higher is better)
  • idtax_n: Taxon ID in database
  • is_synonym: Whether matched name is a synonym
  • accepted_name: Current accepted name (if synonym)

Match quality indicators:

  • Exact match (1.0): Perfect match, no review needed
  • High similarity (>0.8): Very likely correct, quick review recommended
  • Medium similarity (0.5-0.8): Possible match, review suggested
  • Low similarity (<0.5): Uncertain, manual review required
  • No match: Requires manual selection

Phase 6: Manual Review

For unmatched or uncertain names, switch to the “Review” tab to manually review and select matches:

Manual review interface
Manual review interface

The review interface provides two ways to find matches:

Fuzzy Suggestions Panel

Shows automatic suggestions ranked by similarity with advanced filtering options:

Fuzzy suggestions with filters
Fuzzy suggestions with filters

Filtering options:

  • Number of suggestions: Slider to show 5-30 suggestions
  • Minimum similarity: Adjust threshold (0.3-1.0)
  • Taxonomic level filter: Filter by All, Species, Genus, Family, Order, Class, or Infraspecific
  • Sort by: Similarity score or alphabetical order

Each suggestion card displays:

  • Name with color-coded similarity badge (green = high, blue = medium, yellow = low)
  • Taxonomic level and family
  • Synonym information if applicable
  • Select button for one-click acceptance

Manual Search Panel

For names without good suggestions, use the manual search:

Manual search interface
Manual search interface
  • Type any search term to query the taxonomic backbone
  • Filter results by taxonomic level
  • View detailed information for each match
  • Select the correct match or mark as “unresolved”

Navigation:

  • Use Previous/Skip/Next buttons to browse unmatched names
  • Progress counter shows reviewed vs. remaining names
  • The app remembers your selections and automatically updates the results

Phase 7: Enrich Data with Traits

Switch to the “Traits Enrichment” tab to add species-level traits to your matched data:

Trait enrichment interface
Trait enrichment interface

Options:

  • Categorical aggregation mode:
    • “mode” - Use most frequent value per taxon
    • “concat” - Concatenate all unique values
  • Select columns to include:
    • Original input names
    • Corrected names
    • Taxonomic IDs
    • Match metadata

Available traits include:

  • Growth form
  • Wood density
  • Leaf traits
  • Ecological characteristics

The enriched data combines your matched taxa with selected traits:

Enriched data results
Enriched data results

Note: The enriched export creates one row per unique taxon, not per input row. Input names are concatenated with pipe separators.

Phase 8: Export Results

Switch to the “Export” tab to download your standardized dataset:

Export options
Export options

Available formats:

  • Excel (.xlsx): Best for sharing with collaborators
  • CSV (.csv): Universal tabular format
  • RDS (.rds): R-native format preserving data types

Selectable columns:

  • Original data (all your input columns)
  • Matched IDs (idtax_n, idtax_good_n)
  • Corrected names (corrected_name, matched_name)
  • Match metadata (match_method, match_score, is_synonym, accepted_name)

A preview table shows the data before export with pagination controls.

Understanding Output Columns

The app adds these columns to your data:

Column Description
idtax_n Matched taxon ID in backbone database
idtax_good_n Accepted taxon ID (for synonyms)
matched_name Name found in backbone
corrected_name Final standardized name
match_method Matching strategy used (exact_species, exact_genus, exact_family, fuzzy, manual, unresolved)
match_score Similarity score (0-1)
is_synonym TRUE if matched name is a synonym
accepted_name Current accepted name (if synonym)
family Taxonomic family
genus Taxonomic genus

Advanced Options

Language Selection

The app now supports bilingual operation with French and English interfaces. French is the default language.

In the App Interface:

A language toggle is located in the top-right corner of the app: - Click “FR” for French interface - Click “EN” for English interface

The language switch is instant and affects all UI elements including: - Tab labels - Button text - Instructions and help text - Column headers - Error messages and notifications

Setting Initial Language Programmatically:

# Launch app in English
launch_taxonomic_match_app(language = "en")

# Launch app in French (default)
launch_taxonomic_match_app(language = "fr")
# or simply:
launch_taxonomic_match_app()

The language setting is interactive - users can switch languages at any time during their session without losing work progress or data.

Adjusting Fuzzy Matching

Control matching sensitivity with the min_similarity parameter:

# Very strict - only high-quality matches
launch_taxonomic_match_app(min_similarity = 0.8)

# Default setting
launch_taxonomic_match_app(min_similarity = 0.7)

# More permissive - allows lower-quality matches
launch_taxonomic_match_app(min_similarity = 0.5)

Lower values cast a wider net but may include false positives. Higher values are more conservative but may miss valid matches.

Increasing Suggestions

Show more fuzzy match suggestions per name:

# Show top 20 suggestions instead of default 10
launch_taxonomic_match_app(max_suggestions = 20)

Useful when initial suggestions don’t include the correct match. You can also adjust this interactively in the Review tab using the slider.

Function Parameters

launch_taxonomic_match_app(
  data = NULL,           # Optional: pre-load data.frame

  name_column = NULL,    # Optional: pre-select column
  min_similarity = 0.7,  # Fuzzy match threshold (0-1)
  max_suggestions = 10   # Max suggestions per unmatched name
)

Troubleshooting

Connection Issues

Problem: “Failed to connect to database”

Solution:

# Check connection
db_diagnostic()

# Reset credentials if needed
remove_db_credentials()
setup_db_credentials()

No Fuzzy Matches Found

Problem: No suggestions appear for unmatched names

Possible causes: - min_similarity threshold too high - Taxonomic names contain typos or non-standard formatting - Names not present in the taxonomic backbone (e.g., non-African taxa)

Solutions: - Lower min_similarity: launch_taxonomic_match_app(min_similarity = 0.5) - Use the taxonomic level filter to search at genus or family level - Clean input names (remove extra spaces, fix obvious typos) - Verify names are African taxa

Slow Matching Performance

Problem: Matching takes very long for large datasets

Solutions: - Use batch processing instead: match_taxonomic_names() for programmatic workflow - Process data in chunks (split large datasets) - The app downloads the entire backbone once for efficiency, so initial load may be slow

When to Use the App vs. Programmatic Approach

Use the Shiny App when:

  • Exploring data interactively
  • You prefer visual interfaces
  • Dataset is small to medium size (<5,000 rows)
  • Need to manually review uncertain matches
  • Learning the matching process

Use match_taxonomic_names() when:

  • Processing large datasets (>5,000 rows)
  • Automating workflows in scripts
  • Integrating with data pipelines
  • Reproducibility is critical (NEVER REMOVE THE COLUMN THAT CONTAINS THE ORIGINAL NAME)
  • Batch processing multiple files

Example programmatic approach:

# Load data
my_data <- read.csv("tree_inventory.csv")

# Match names
matched <- match_taxonomic_names(
  names = my_data$species_name,
  min_similarity = 0.7
)

# Merge back with original data
result <- cbind(my_data, matched)

# Export
write.csv(result, "standardized_inventory.csv", row.names = FALSE)

See Also

Tips for Best Results

  1. Clean your data first: Remove obvious typos, extra whitespace, and special characters
  2. Understand your data: Know which taxonomic groups are in your dataset
  3. Use multi-column mode: If you have separate genus/species/family columns, combine them for better matching
  4. Filter by taxonomic level: Use the level filter in Review tab to find genus or family matches
  5. Review match scores: Don’t blindly accept low-similarity matches (<0.6)
  6. Save incrementally: Export intermediate results to avoid losing manual review work
  7. Document parameters: Note which min_similarity value you used for reproducibility

Example Workflow

Here’s a complete workflow from start to finish:

# 1. Load your data
trees <- read.csv("forest_inventory.csv")
# Columns: plot_id, tree_number, species_name, dbh, height

# 2. Launch app with data
launch_taxonomic_match_app(
  data = trees,
  name_column = "species_name",
  min_similarity = 0.7
)

# 3. In the app:
#    - Review automatic matches in Auto Match tab
#    - Use Review tab to resolve unmatched names
#    - Apply taxonomic level filters if needed
#    - Optionally enrich with traits in Traits Enrichment tab
#    - Export as "forest_inventory_standardized.xlsx"

# 4. Continue analysis with standardized data
standardized <- readxl::read_excel("forest_inventory_standardized.xlsx")

# Now you have clean taxonomic IDs for further analysis!

This workflow ensures your taxonomic data is standardized and ready for downstream analyses like diversity metrics, trait-based analyses, or database integration.

Suggested Screenshots

To complete this documentation, the following screenshots should be captured:

  1. app-initial-view.png - Full app interface after launch with all tabs visible
  2. app-upload-data.png - Data upload panel with file browser and sheet selection
  3. app-text-input.gif - Text input interface with text area and “Load names” button (NEW)
  4. app-column-select.png - Single column selection dropdown
  5. app-column-select-multi.png - Multiple column mode with genus/species/family selectors
  6. app-matching-progress.png - Matching in progress with progress bar and live statistics
  7. app-matching-results.png - Results table in Auto Match tab showing matched names
  8. app-review-interface.png - Review tab overview with unmatched name display
  9. app-review-suggestions.png - Fuzzy suggestions panel with filtering options (level filter, sort, slider)
  10. app-review-manual-search.png - Manual search interface with search box and results
  11. app-enrich-data-interface.png - Traits enrichment tab with aggregation mode and column selection
  12. app-enrich-data-results.png - Preview of enriched data with traits
  13. app-export-options.png - Export tab with format selection and column checkboxes