2026-03-08 / Rtask / database, development, tips
DuckDB + dbplyr: When Your Pipeline Gives Different Results Every Time It Runs
Short on time? Here’s the gist: DuckDB parallelizes query execution and never guarantees row order unless you explicitly ask for it. If any step in your pipeline is order-sensitive, row_number(), cumsum(), lag(), distinct(.keep_all = TRUE), inequality joins, you are silently producing non-deterministic results. This post shows the four patterns that bite you and how to fix each one. The Setup: ...
