Updating Existing Data in the Database

Overview

This vignette explains how to update existing records in the database using the update_records() function.

The main challenge: When you query data, you see a flat table that merges information from multiple database tables. The column you want to update may come from the main table, a lookup table, a features table, or be an aggregation of multiple records. Understanding this structure is essential to correctly updating data.

Prerequisites

library(CafriplotsR)
library(dplyr)

# Connect to database (requires write permissions)
con <- call.mydb()

Understanding the Data Structure

The Problem: Flat Tables vs. Relational Database

When you query plots or individuals, you see columns like:

# Query a plot
plot_data <- query_plots(plot_name = "my_plot", extract_individuals = TRUE)

# You see columns like:
# - plot_name, ddlat, ddlon (direct from data_liste_plots)
# - country (from table_countries via id_country)
# - method (from methodslist via id_method)
# - team_leader (aggregated from data_liste_sub_plots)
# - dbh, height (aggregated from data_traits_measures)

But these columns come from different sources:

Column Type	Example	Source	How to Update
Direct columns	`plot_name`, `ddlat`, `ddlon`	Main table	`table_type = "plots"`
Lookup columns	`country`, `method`	Lookup tables	Auto-resolved by `update_records()`
Subplot features	`team_leader`, `census_date`	`data_liste_sub_plots`	`table_type = "subplot_features"`
Individual features	`dbh`, `height`	`data_traits_measures`	`table_type = "individual_features"`
Aggregated values	Multiple DBH per tree	Aggregation	Cannot update directly

The `update_records()` Function

Basic Usage

update_records(
  data = your_data,           # Data frame with updates
  table_type = "individuals", # Which table type to update
  execute = FALSE,            # FALSE = dry run, TRUE = apply changes
  con = con
)

Available Table Types

`table_type`	Database Table	ID Column	Description
`"individuals"`	`data_individuals`	`id_n`	Individual tree records
`"plots"`	`data_liste_plots`	`id_liste_plots`	Plot metadata
`"individual_features"`	`data_traits_measures`	`id_trait_measures`	Tree measurements (DBH, height, etc.)
`"subplot_features"`	`data_liste_sub_plots`	`id_sub_plots`	Plot features (census dates, team, etc.)
`"individual_features_metadata"`	`data_ind_measures_feat`	`id_ind_meas_feat`	Metadata on measurements (POM height, etc.)
`"methodslist"`	`methodslist`	`id_method`	Sampling methods
`"table_colnam"`	`table_colnam`	`id_table_colnam`	Collector names
`"traitlist"`	`traitlist`	`id_trait`	Trait definitions
`"subplotype_list"`	`subplotype_list`	`id_subplotype`	Subplot feature types

Workflow: Deciding Which Table Type to Use

Step 1: Identify What Column You Want to Update

Ask yourself: Where does this column come from?

Use get_column_routing() to see the structure:

# See configuration for individuals
config <- get_column_routing("individuals", con)
print(config$direct_columns)   # Columns directly in data_individuals
print(config$feature_columns)  # Columns that are features (traits)

Step 2: Choose the Correct Table Type

Case A: Direct Columns (plot_name, coordinates, tag, etc.)

# Updating individual coordinates
update_data <- data.frame(
  id_n = c(1001, 1002),  # Individual IDs
  x = c(11.0, 26.0),     # New X coordinates
  y = c(6.0, 13.0)       # New Y coordinates
)

update_records(
  data = update_data,
  table_type = "individuals",
  execute = FALSE,
  con = con
)

Case B: Feature Columns (DBH, height, etc.)

Feature columns require you to work at the measurement level, not the individual level:

# WRONG: Trying to update DBH via individuals table
# This won't work because DBH is stored in data_traits_measures

# CORRECT: Get the measurement IDs first
measures <- query_individual_features(
  individual_ids = c(1001, 1002),
  trait_ids = 1,  # ID for DBH trait
  format = "long"
)

# Now update the measurements
update_data <- data.frame(
  id_trait_measures = measures$traits_num[[1]]$id_trait_measures,
  traitvalue = c(45.2, 32.1)  # New DBH values
)

update_records(
  data = update_data,
  table_type = "individual_features",
  execute = FALSE,
  con = con
)

Case C: Subplot Features (team_leader, census dates, etc.)

# First, get the subplot feature IDs
subplots <- query_subplot_features(plot_ids = 1, format = "long")

# Update specific subplot features
update_data <- data.frame(
  id_sub_plots = subplots$id_sub_plots[1:2],
  year = c(2024, 2024)  # Update census years
)

update_records(
  data = update_data,
  table_type = "subplot_features",
  execute = FALSE,
  con = con
)

Handling Special Cases

Case 1: Aggregated Values Cannot Be Updated Directly

If a column represents an aggregation (e.g., average DBH across multiple censuses), you cannot update it directly:

# When you query with aggregation:
individuals <- query_individual_features(
  plot_ids = 1,
  trait_ids = c(1, 2),
  format = "wide"  # Aggregates multiple measurements
)

# If individual has 3 DBH measurements across censuses,
# the displayed DBH is an average - you can't update "the average"

# SOLUTION: Work with the raw measurements
individuals_long <- query_individual_features(
  individual_ids = 1001,
  trait_ids = 1,
  format = "long"  # Shows all individual measurements
)

# Now update the specific measurement you want to change
update_records(
  data = data.frame(
    id_trait_measures = individuals_long$traits_num[[1]]$id_trait_measures[1],
    traitvalue = 45.2
  ),
  table_type = "individual_features",
  execute = FALSE,
  con = con
)

Case 2: Census-Specific Features

If your data has columns like dbh_census_1, dbh_census_2, these are census-specific and require special handling:

# Query with census detail
data_census <- query_individual_features(
  plot_ids = 1,
  include_multi_census = TRUE  # Keeps census-specific columns
)

# To update a specific census measurement:
# 1. Filter to the census you want
# 2. Get the measurement ID
# 3. Update using table_type = "individual_features"

Case 3: Lookup Table Values (Country, Method)

The update_records() function automatically resolves lookup values:

# You can provide the friendly name (country = "Gabon")
# instead of the ID (id_country = 5)

update_data <- data.frame(
  id_liste_plots = 1,
  country = "Cameroon"  # Will be resolved to id_country automatically
)

update_records(
  data = update_data,
  table_type = "plots",
  execute = FALSE,
  con = con
)

Practical Examples

Example 1: Update Plot Coordinates

# Step 1: Query the plot to get its ID
plot <- query_plots(plot_name = "my_plot")
plot_id <- plot$id_liste_plots

# Step 2: Prepare update data
update_data <- data.frame(
  id_liste_plots = plot_id,
  ddlat = -0.52,
  ddlon = 11.48
)

# Step 3: Dry run
update_records(
  data = update_data,
  table_type = "plots",
  execute = FALSE,
  con = con
)

# Step 4: Execute if satisfied
update_records(
  data = update_data,
  table_type = "plots",
  execute = TRUE,
  con = con
)

Example 2: Update Individual Taxonomy

# Update the taxon ID for an individual
update_data <- data.frame(
  id_n = 1001,
  idtax_n = 67890  # New taxon ID
)

update_records(
  data = update_data,
  table_type = "individuals",
  execute = FALSE,
  con = con
)

Example 3: Update DBH Measurements

# Step 1: Get the measurement IDs for the DBH values you want to update
measures <- query_individual_features(
  individual_ids = c(1001, 1002, 1003),
  trait_ids = 1,  # DBH
  format = "long"
)

# Step 2: Review existing data
print(measures$traits_num[[1]] %>% select(id_trait_measures, id_data_individuals, traitvalue))

# Step 3: Prepare updates (only change what you need)
update_data <- data.frame(
  id_trait_measures = c(5001, 5002, 5003),  # Measurement IDs
  traitvalue = c(45.2, 32.1, 28.5)          # New DBH values
)

# Step 4: Update
update_records(
  data = update_data,
  table_type = "individual_features",
  execute = FALSE,
  con = con
)

Example 4: Update Measurement Metadata (POM Height)

# Get measurement feature IDs
feat <- query_traits_measures_features(id_trait_measures = c(5001, 5002))

# Update the POM height metadata
update_data <- data.frame(
  id_ind_meas_feat = feat$id_ind_meas_feat,
  typevalue = c(1.3, 1.5)  # New POM heights
)

update_records(
  data = update_data,
  table_type = "individual_features_metadata",
  execute = FALSE,
  con = con
)

Dry Run vs Execute

Always use execute = FALSE first to preview changes:

# Dry run - shows what would change without modifying the database
result <- update_records(
  data = update_data,
  table_type = "individuals",
  execute = FALSE,  # IMPORTANT: Preview first!
  con = con
)

# Review the detected changes
print(result$changes)

# If satisfied, execute the update
update_records(
  data = update_data,
  table_type = "individuals",
  execute = TRUE,
  con = con
)

Decision Flowchart

I want to update a column...
│
├─► Is it plot_name, ddlat, ddlon, elevation, locality_name?
│   └─► YES → table_type = "plots"
│
├─► Is it tag, x, y, idtax_n, stem_code (individual core data)?
│   └─► YES → table_type = "individuals"
│
├─► Is it a measurement (dbh, height, etc.)?
│   │
│   ├─► Do you have the measurement ID (id_trait_measures)?
│   │   └─► YES → table_type = "individual_features"
│   │
│   └─► NO → First query with format = "long" to get measurement IDs
│
├─► Is it team_leader, census_date, principal_investigator (plot feature)?
│   │
│   ├─► Do you have the subplot ID (id_sub_plots)?
│   │   └─► YES → table_type = "subplot_features"
│   │
│   └─► NO → First query with query_subplot_features() to get IDs
│
├─► Is it metadata on a measurement (pom_height, method, etc.)?
│   └─► YES → table_type = "individual_features_metadata"
│
└─► Is it a lookup table entry (method definition, trait definition)?
    └─► YES → table_type = "methodslist", "traitlist", "table_colnam", etc.

Summary of Key Points

The flat table you see is a merge of multiple database tables
Use execute = FALSE first to preview all changes
Work at the correct level:
- Individual core data → table_type = "individuals"
- Plot core data → table_type = "plots"
- Measurements (DBH, height) → table_type = "individual_features"
- Plot features (census, team) → table_type = "subplot_features"
Aggregated values cannot be updated directly - get the raw measurement IDs first
Lookup values are auto-resolved - you can use friendly names like “Gabon” instead of IDs
Changes are backed up automatically in followup_updates_* tables when available