TidyTuesday 2026 - Week 1

Quick exploration and visualization of reservoirs in the New York City water system
TidyTuesday
Data visualization
R
ggplot2
Author

Seth Kasowitz

Published

January 7, 2026

The first TidyTuesday of 2026 starts off with a “Bring Your Own Data” week. I grabbed reservoir level data for the City of New York.

1 Load data

Show the code
library(tidyverse)

nyc_reservoirs <- read_csv(
  'Current_Reservoir_Levels.csv'
)
glimpse(nyc_reservoirs)
Rows: 2,866
Columns: 25
$ Point_time       <chr> "02/01/2019", "02/02/2019", "02/03/2019", "02/04/2019…
$ AUGEVolume       <dbl> 75.04, 74.48, 73.88, 73.31, 72.76, 72.26, 71.79, 71.3…
$ AUGEASTLEVANALOG <dbl> 585.17, 584.98, 584.62, 584.26, 583.90, 583.55, 583.2…
$ AUGWVOLUME       <dbl> 39.67, 39.50, 39.34, 39.17, 39.19, 39.52, 40.15, 41.3…
$ AUGWESTLEVANALOG <dbl> 584.65, 584.38, 584.21, 584.04, 583.85, 583.88, 584.2…
$ ASHREL           <dbl> 619, 619, 598, 586, 584, 583, 443, 412, 412, 412, 409…
$ SICRESVOLUME     <dbl> 16.81, 16.82, 16.84, 16.93, 17.16, 17.25, 17.41, 17.4…
$ SICRESELEVANALOG <dbl> 1125.75, 1125.75, 1125.75, 1125.81, 1126.03, 1126.70,…
$ STPALBFLW        <dbl> 48.3, 64.6, 68.1, 62.5, 57.5, 49.4, 20.8, 5.3, 1.5, 1…
$ RECRESVOLUME     <dbl> 47.33, 47.19, 47.09, 46.99, 46.96, 47.09, 47.37, 47.7…
$ RECRESELEVANALOG <dbl> 836.77, 836.57, 836.42, 836.27, 836.20, 836.42, 836.8…
$ RECREL           <dbl> 10.21, 10.21, 10.19, 10.20, 10.19, 10.19, 10.17, 10.1…
$ NICRESVOLUME     <dbl> 35.21, 35.21, 35.22, 35.23, 35.27, 35.28, 35.42, 35.4…
$ NICRESELEVANALOG <dbl> 1440.08, 1440.09, 1440.10, 1440.12, 1440.19, 1440.21,…
$ NICNTHFLW        <dbl> 58.0, 58.1, 58.1, 58.1, 58.0, 58.1, 58.0, 58.0, 57.9,…
$ NICSTHFLW        <dbl> 64.8, 64.8, 64.8, 64.9, 64.9, 64.9, 64.9, 65.0, 65.0,…
$ NICCONFLW        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ EDIRESVOLUME     <dbl> 142.01, 141.62, 141.25, 140.94, 140.92, 141.22, 142.2…
$ EDIRESELEVANALOG <dbl> 1279.56, 1279.35, 1279.15, 1278.98, 1278.97, 1279.13,…
$ EDRNTHFLW        <dbl> 225.8, 225.9, 225.7, 225.7, 225.8, 225.7, 225.7, 225.…
$ EDRSTHFLW        <dbl> 225.6, 226.1, 226.0, 225.6, 225.5, 225.8, 226.3, 226.…
$ EDRCONFLW        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
$ WDIRESVOLUME     <dbl> 88.91, 88.49, 88.16, 87.91, 88.24, 88.64, 89.94, 92.3…
$ WDIRESELEVANALOG <dbl> 1146.33, 1146.05, 1145.83, 1145.66, 1145.88, 1146.15,…
$ WDRFLW           <dbl> 959.2, 967.4, 965.7, 966.5, 960.3, 964.2, 965.6, 966.…

The data set contains daily measurements for nearly 8 years.

There are three types of values:

  • Storage Volume in billions of gallons (BG)

  • Elevation in feet

  • Release in millions of gallons per day (MGD)

    2 Wrangle for plotting

Show the code
df <- nyc_reservoirs |>
  mutate(date = mdy(Point_time)) |>
  rename(
    "Ashokan East Storage" = AUGEVolume,
    "Ashokan East Elevation" = AUGEASTLEVANALOG,
    "Ashokan West Elevation" = AUGWVOLUME,
    "Ashokan West Storage" = AUGWESTLEVANALOG,
    "Ashokan Release" = ASHREL,
    "Schoharie Storage" = SICRESVOLUME,
    "Schoharie Elevation" = SICRESELEVANALOG,
    "Schoharie Release" = STPALBFLW,
    "Rondout Storage" = RECRESVOLUME,
    "Rondout Elevation" = RECRESELEVANALOG,
    "Rondout Release" = RECREL,
    "Neversink Storage" = NICRESVOLUME,
    "Neversink Elevation" = NICRESELEVANALOG,
    "Neversink North Flow Release" = NICNTHFLW,
    "Neversink South Flow Release" = NICSTHFLW,
    "Neversink Conservation Flow Release" = NICCONFLW,
    "Pepacton Storage" = EDIRESVOLUME,
    "Pepacton Elevation" = EDIRESELEVANALOG,
    "Pepacton North Flow Release" = EDRNTHFLW,
    "Pepacton South Flow Release" = EDRSTHFLW,
    "Pepacton Conservation Flow Release" = EDRCONFLW,
    "Cannonsville Storage" = WDIRESVOLUME,
    "Cannonsville Elevation" = WDIRESELEVANALOG,
    "Cannonsville Release" = WDRFLW
  ) |>
  pivot_longer(
    -c(Point_time, date),
    names_to = 'reservoir',
    values_to = 'values'
  ) |>
  extract(
    reservoir,
    into = c('reservoir', 'type'),
    regex = '^(.+)\\s+(.+?)$'
  )

2.1 Create a data frame with storage data for a seasonal plot

Show the code
storage_data <- df |>
  filter(type == "Storage") |>
  mutate(year = year(date),
         month = month(date, label = TRUE))

3 Plotting

3.1 Data quality check

Show the code
ggplot(storage_data, aes(x = reorder(reservoir, values, median), y = values, fill = reservoir)) +
  geom_boxplot(outlier.color = "red", outlier.size = 1.5) +
  coord_flip() +
  labs(title = "Distribution of Water Storage by Reservoir",
       subtitle = "Red points indicate statistical outliers",
       x = "Reservoir",
       y = "Storage (Billions of Gallons)") +
  theme_minimal() +
  theme(legend.position = "none",
        plot.title = element_text(face = "bold", size = 14)) +
  scale_fill_brewer(palette = "Set2")

Suspicious 2024 data. Further investigation would be needed to work out if something was off with measurements or if this reflects genuinely dramatic changes in storage.

Show the code
ggplot(storage_data |> filter(reservoir != "Rondout" | values < 60), 
       aes(x = month, y = values, group = factor(year), color = factor(year))) +
  geom_line(alpha = 0.5, linewidth = 1.5) +
  facet_wrap(~reservoir, scales = "free_y", ncol = 2) +
  labs(title = "Seasonal Storage Patterns by Year",
       subtitle = "Each line represents one year (Rondout outliers removed)",
       group = "Year", color = "Year",
       x = "Month",
       y = "Storage (Billions of Gallons)") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", size = 14),
        strip.text = element_text(face = "bold"),
        axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_color_viridis_d()

Citation

BibTeX citation:
@online{kasowitz2026,
  author = {Kasowitz, Seth},
  title = {TidyTuesday 2026 - {Week} 1},
  date = {2026-01-07},
  url = {https://sethkasowitz.com/posts/2026-01-06_tidytuesday-wk1/},
  langid = {en}
}
For attribution, please cite this work as:
Kasowitz, Seth. 2026. “TidyTuesday 2026 - Week 1.” January 7, 2026. https://sethkasowitz.com/posts/2026-01-06_tidytuesday-wk1/.