pg_durable: Microsoft open sources in-database durable execution

(github.com)

166 points | by coffeemug 3 hours ago

11 comments

levkk 2 hours ago
2026 is the year of the Postgres queue! (DBOS[0], pgQue[1]) It's awesome that the community is contributing this and giving us the option to use it.
As an ex-app engineer though, I kind of prefer my queue logic to be in code, in Git, but maybe with the right tooling, you can change my mind. :)
[0]: https://www.dbos.dev/
[1]: https://github.com/NikolayS/pgque
[-]
- dietr1ch 1 hour ago
  Yeah, it's harder to work on, or maybe just different, but I guess the docs, info(searchable docs, posts, experience), and tooling are lacking.
  What's the story for version control, debugging, testing, releasing? It'd be cool to have everything together for data locality and simplifying the stack, but it feels you'd lose a lot of useful knowledge about how to do stuff "properly".
  [-]
  - gdecandia 1 hour ago
    Contributor here. Good points, we do need to develop some best practices around managing function versioning and lifecycle for pg_durable.
    https://github.com/microsoft/duroxide - also OSS, the durable execution framework pg_durable is built on itself supports function versions. We can leverage that to get similar support in pg_durable.
- babhishek21 1 hour ago
  +1 on "prefer my queue logic to be in code". The <shape> of my data doesn't change nearly as much as the actions I need to take on it; it doesn't make sense to me why I'd want to do a migration (which is an all or nothing op btw) every time I want to change how I behave with my data. This is also why I absolutely abhorred having to make postgres functions to do anything remotely non-trivial on Supabase.
  That said, we did hand-build a simple job queue (just lock, poll, reserve on a column, poll and update reservation to mark job done) on top of postgres at my previous startup. Something like pgque would have made that much more polished.
- giancarlostoro 37 minutes ago
  > As an ex-app engineer though, I kind of prefer my queue logic to be in code, in Git, but maybe with the right tooling, you can change my mind. :)
  I mean, we used to keep our SQL code in git too for projects where we had DB triggers. I think some were even shoved in there via Django migrations just to let someone setup locally and have the triggers available in their local database.
- moomoo11 1 hour ago
  same but this could be useful for db level things that are not business logic related.
  i have always had maintenance packages for this type of stuff. if i could deploy them alongside the database itself that could be kind of cool.
  but yeah i agree with you that i do prefer having this in the code layer.
jraedisch 2 hours ago
If understanding correctly, Absurd (by the Pi LLM harness devs) minimizes the pure db approach as much as possible. I only just started getting into the topic myself, though.
https://github.com/earendil-works/absurd
kilobaud 2 hours ago
> When not to use it > … > The workflow mostly lives outside Postgres and spans many heterogeneous systems.
How is this project at all comparable to something like Temporal? Am I misunderstanding the limitation implied by this particular recommendation?
[-]
- faxmeyourcode 2 hours ago
  I aggree - I'm not understanding the value of the project either if you look at the example here https://github.com/microsoft/pg_durable/blob/main/examples/i...
  It's an interesting technical achievement I guess, but it's very bizarre to try and read this
```
    SELECT df.start(
        @> (
            ($$SELECT ... FROM demo.invoices WHERE status = 'pending'$$ |=> 'inv')
            ~> df.if_rows('inv',
                $$UPDATE ... SET status = 'processing'$$
                ~> (df.http(...) |=> 'resp')
                ~> df.if($$SELECT $r.ok$$,
                    -- classify, branch, wait for signal ...
                ),
                df.sleep(5)
            )
        ),
        'invoice-approval-pipeline'
    );
```
  [-]
  - rswail 2 hours ago
    Without reading any of the doco, it appears to be a job definition called invoice-approval-pipeline that runs every 5 seconds.
    The steps are:
    1. Get all the pending invoices
    2. Set their state to "processing"
    3. Call out to an external service/process to do the actual processing, wait for a response.
    4. If the response is OK, do something
    5. Wait 5 seconds and then start again.
    Not sure I love the syntax and the way SQL is embedded between the $$
    But it is in the database, can be updated and modified in the same way as all the other stored procedures/functions, allows job control, I assume other control structures for parallel steps etc.
    Gonna go read the doco now.
    [-]
    - franckpachot 1 hour ago
      "dollar quoting" is the PostgreSQL way to quote strings with quotes, avoiding double quoting or escape characters. I like to use the tagged version of it, like $sql$ SELECT ... $sql$ to describe what is inside.
  - gdecandia 2 hours ago
    Contributor here - at Microsoft we've built AI workflows on pg_durable and seen it substantially reduce code and increase reliability. Agree that the DSL ergonomics can be improved. Our pipelines use a higher level language and therefore simplified, but pg_durable is meant to solve a wider array of problems. We're happy to take suggestions for improvements.
    [-]
    - faxmeyourcode 2 hours ago
      Somebody else in the thread brought up the benefit of snapshotting a database at a point in time stores not only the state of execution but also the code, etc. That is a unique benefit I'd be interested in exploring over storing your orchestration outside of the database.
      Not trying to dismiss the project - it looks like a lot of hard work has gone in and somebody has a use for it. I just come from an airflow style external orchestrator frame of mind that manages durability state in postgres but keeps the control flow out. Sorry if I came off as a bit snarky
    - Onavo 1 hour ago
      Do you plan to open source the high level wrapper too?
faxmeyourcode 2 hours ago
This feels like the wrong solution to an age old problem solved by the DAG schedulers like Apache Airflow for a while now.
Why would I want to store my control flow in the database and not in code? It feels strange.
Not trying to dismiss the project, I'm just not getting it yet I think.
[-]
- daxfohl 1 hour ago
  Microsoft has their own Durable Task framewor[1] for that kind of stuff, and it supports both running as a self-hosted standalone service like temporal, and running serverless on Azure Functions. It actually predated airflow, temporal, etc., IIRC.
  This one seems to be more database-specific use case. The advantage is probably that you can track the exact state of the job in the database itself, rather than having to cross-reference the workflow log with the codebase and trace through it line by line to figure out what the state is. Plus I assume it's less overhead and latency, and operationally one less thing to spin up.
  [1] https://learn.microsoft.com/en-us/azure/durable-task/common/...
  [-]
  - affandar 51 minutes ago
    (Author of both durable task framework and pg_durable/duroxide here)
    Indeed Durable tasks is an exceptional project and was a unique innovation at the time.
    pg_durable brings the same reliability and durablity semantics to long running operations within the database.
    We have tons of interesting scenarios on the roadmap. Stay tuned! :)
    [-]
    - alex_hirner 17 minutes ago
      Does ai.backfill() fill incomplete/dirty rows or does pg_durable have some notion of partial completion?
TuringNYC 1 hour ago
I'm trapped on Azure at work and we're constantly waiting for Azure pg to catch up with modernity.
For example, you cant use this: https://www.paradedb.com/blog/hybrid-search-in-postgresql-th...
Also for example, you dont get ultra-wide high dimensionality vectors.
It is nice they are open sourcing pg_durable, but how about adopting table stakes I'd get with AWS?
[-]
- FuriouslyAdrift 6 minutes ago
  Wouldn't Azure Cosmos DB be better suited for vector searches?
- moron4hire 24 minutes ago
  I'm sorry, I'm sure you've considered this, but why couldn't you create a bare VM with Postgres vCurrent installed?
- abeomor 59 minutes ago
  [flagged]
oa335 2 hours ago
Can anyone explain why I would want to use this over an orchestration tool that lives outside the DB? Read through the Readme and some of the examples, I still don't get it.
[-]
- rswail 2 hours ago
  Snapshot PITR of your database means everything restores including the durable jobs at the PIT.
  Don't need to synchronize the backups with anything else that is part of the same data store, good for ETL pipelines and other state machine type jobs.
  If your ETL is mostly SQL anyway, then having the actual job being run on the same server helps as well.
  [-]
  - regularfry 1 hour ago
    Yes, but that doesn't have to imply that the compute part of the durable jobs framework also needs to be part of the database snapshot. You almost certainly want that defined in code anyway, if only to have a sane versioning story. So then by having it also be part of the snapshot, you've now got the problem that there are apparently two sources of truth for that bit of the code.
- gdecandia 2 hours ago
  Contributor here. At Microsoft, our Postgres customers seem to split pretty evenly into 2 camps, those that want to do as much as they can in the database, and those that agree with your take - want to keep apps and compute outside the DB.
  [-]
  - guhidalg 12 minutes ago
    I bet this is correlated with how much they like/know Postgres already. When people don’t understand their database’s features, they want it to behave like something else they do understand (code). They’re leaving a lot of performance on the table by not leveraging everything their database can do.
- jpalomaki 2 hours ago
  It’s sometimes convenient if database is the only ”stateful” component in architecture.
  Also if all the "state" is in one database, then you have better chance of getting consistent backups.
- thibaut_barrere 2 hours ago
  You can have well-integrated applicative workflows (eg: progress report on a permalink in your front end app), app-restart-proof resumable workflows, and it avoids adding an extra piece of infrastructure.
  We use Postgres for that on https://transport.data.gouv.fr (Elixir app which does a fair bit of processing), and it helps.
  Not familiar yet with pg_durable though, but I have used or implemented similar solutions and can relate.
- keynha 34 minutes ago
  [dead]
joelthelion 1 hour ago
Isn't the database already one of the hardest piece of infras to scale? Why would you want to load it with additional long-running jobs?
[-]
- gdecandia 1 hour ago
  Long-running jobs on Postgres are not new at all. See pg_cron for one example. At the end of the day, these workloads would be running anyway against the database, whether triggered by an external component. HTTP queries from the database have also become more popular to avoid round-trips and failure points from additional components in data or AI pipelines. But yes, whether to bring the compute to the data or vice-versa is a design choice that has a lot of contention.
- greenavocado 1 hour ago
  Gotta set up the fall to rake in the dough later with consulting fees
rastignack 1 hour ago
I hope it could be used in the future to export pg_dump formated exports to s3.
One would be able to trigger maintenance jobs via simple lambda functions whose duration is capped.
[-]
- gdecandia 25 minutes ago
  Committer here. I would love to hear more about this scenario.
  Is the proposal to be able to export pg_dump formatted data on some schedule or trigger, entirely hosted in PostgreSQL and with timeouts? There are already extension that can export to blob/file storage and can be combined with pg_durable or pg_cron, so I assume the challenge is pg_dump compatible data export from SQL running in the database?
mikey_p 1 hour ago
Is this an open sourcing of something they use internally? My first thought on durable jobs was GHA aka Azure Devops.
[-]
- gdecandia 1 hour ago
  Please see https://learn.microsoft.com/en-us/azure/horizondb/ai/ai-pipe...
  [-]
  - mikey_p 19 minutes ago
    Thanks for answering, this makes tons of sense
redmonduser 1 hour ago
Seems like an interesting idea to add durability and resumability to lengthy cron jobs.
cpursley 2 hours ago
Looks pretty good but I wonder why they didn’t build it on pgmq? If you’re on elixir I maintain a DAG package around this (based on and compatible with pgflow.dev which is TS/Deno).
https://github.com/agoodway/pgflow
[-]
- affandar 2 hours ago
  (pg_durable committer here)
  The provider is an extensibility point. We just shipped the simplest version of it. Happy to take contribs if someone sends a pgmq based provider!
  [-]
  - cpursley 2 hours ago
    Cool! I maintain https://postgresisenough.dev, I'd love to get a PR for pg_durable up to include it: https://github.com/agoodway/postgresisenough
  - evntdrvn 1 hour ago
    When do we get mssql_durable :)