2026-03-08 / Rtask / database, development, tips

DuckDB + dbplyr: When Your Pipeline Gives Different Results Every Time It Runs

Short on time? Here’s the gist: DuckDB parallelizes query execution and never guarantees row order unless you explicitly ask for it. If any step in your pipeline is order-sensitive, row_number(), cumsum(), lag(), distinct(.keep_all = TRUE), inequality joins, you are silently producing non-deterministic results. This post shows the four patterns that bite you and how to fix each one. The Setup: ...

Moissonneuse batteuse en plein récolte avec un déplacement horizontal, ligne par ligne

2021-10-21 / Vincent Guyader / data, tidyverse, tips

Row-wise operations with the {tidyverse}

We are often asked how to perform row-wise operations in a data.frame (or a tibble) the answer is, as usual, “it depends” 🙂 Let’s look at some cases that should fit your needs. library(tidyverse) Let’s make an example dataset: base <- tibble::tibble( a = 1:10, b = 1:10, c = 21:30 ) %>% head() base ## # A tibble: 6 ...