Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS. It's amazing and it works perfectly fine.
For client projects, however, I always try and sell them on paying the AWS fees, simply because it shifts the responsibility of the hardware being "up" to someone else. It does not inherently solve the downtime problem, but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."
Doesn't always work like that and isn't always a tried-and-true excuse, but generally lets me sleep much better at night.
With limited budgets, however, it's hard to accept the cost of RDS (and we're talking with at least one staging environment) when comparing it to a very tight 3-node Galera cluster running on Hetzner at barely a couple of bucks a month.
Or Cloudflare, titan at the front, being down again today and the past two days (intermittently) after also being down a few weeks ago and earlier this year as well. Also had SQS queues time out several times this week, they picked up again shortly, but it's not like those things ...never happen on managed environments. They happen quite a bit.
Over 20 year I've had lots of clients on self-hosted, even self-hosting SQL on the same VM as the webserver as you used to in the long distant past for low-usage web apps.
I have never, ever, ever had a SQL box go down. I've had a web server go down once. I had someone who probably shouldn't have had access to a server accidentally turn one off once.
The only major outage I've had (2/3 hours) was when the box was also self-hosting an email server and I accidentally caused it to flood itself with failed delivery notices with a deploy.
I may have cried a little in frustration and panic but it got fixed in the end.
I actually find using cloud hosted SQL in some ways harder and more complicated because it's such a confusing mess of cost and what you're actually getting. The only big complication is setting up backups, and that's a one-off task.
> Self-hosting is more a question of responsibility I'd say. I am running a couple of SaaS products and self-host at much better performance at a fraction of the cost of running this on AWS
It is. You need to answer the question: what are the changes of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security? Many people are technically unable, lack the time or the resources to be able to confidently address that question, hence paying for someone else to do it.
Your time is money though. You are saving money but giving up time.
Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
That’s pretty reductive. By that logic the opposite extreme is just as true: if using managed services is just as bad as outsourcing everything else, then a business shouldn’t rent real estate either—every business should build and own their own facility. They should also never contract out janitorial work, nor should they retain outside law firms—they should hire and staff those departments internally, every time, no nuance allowed.
You see the issue?
Like, I’m all for not procuring things that it makes more sense to own/build (and I know most businesses have piss-poor instincts on which is which—hell, I work for the government! I can see firsthand the consequences of outsourcing decision making to contractors, rather than just outsourcing implementation).
But it’s very case-by-case. There’s no general rule like “always prefer self hosting” or “always rent real estate, never buy” that applies broadly enough to be useful.
> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:
> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
This is funny. I'd argue the exact opposite. I would self host only:
* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or
* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...
> employing engineers to manage self-hosted databases is more cost effective than outsourcing
Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.
Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.
In-house vs Cloud Provider is largely a wash in terms of cost. Regardless of the approach, you are going need people to maintain stuff and people cost money. Similarly compute and storage cost money so what you lose on the swings, you gain on the roundabouts.
In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.
US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)
> Every company out there is using the cloud and yet still employs infrastructure engineers
Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.
You're paying people to do the role either way, if it's not dedicated staff then it's taking time away from your application developers so they can play the role of underqualified architects, sysadmins, security engineers.
From experience (because I used to do this), it’s a lot less time than a self-hosted solution, once you’re factoring in the multiple services that need to be maintained.
It depends entirely on your use case. If all you need is a DB and Python/PHP/Node server behind Nginx then you can get away with that for a long time. Once you throw in a task runner, emails, queue systems, blob storage, user-uploaded content, etc. you can start running beyond your own ability or time to fix the inevitable problems.
As I pointed out above, you may be better served mixing and matching so you spend your time on the critical aspects but offload those other tasks to someone else.
Of course, I’m not sitting at your computer so I can’t tell you what’s right for you.
For small companies things like vercel, supabase, firebase, ... wipe the floor with Amazon RDS.
For medium sized companies you need "devops engineers". And in all honesty, more than you'd need sysadmins for the same deployment.
For large companies, they split up AWS responsibilities into entire departments of teams (for example, all clouds have math auth so damn difficult most large companies have -not 1- but multiple departments just dealing with authorization, before you so much as start your first app)
Working in a university Lab self-hosting is the default for almost anything. While I would agree that cost are quite low, I sometimes would be really happy to throw money at problems to make them go away. Without having the chance and thus being no expert, I really see the opportunity of scaling (up and down) quickly in the cloud. We ran a postgres database of a few 100 GB with multiple read replica and we managed somehow, but actually really hit our limits of expertise at some point. At some point we stopped migrating to newer database schemas because it was just such a hassle keeping availability. If I had the money as company, I guess I would have paid for a hosted solution.
> Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).
Also your argument is incorrect in my experience.
At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.
>At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
How sure are you about that one? All of my hetzner vm`s reach an uptime if 99.9% something.
I could see more then one small business stack fitting onto a single of those vm`s.
Just because your VM is running doesn't mean the service is accessible. Whenever there's a large AWS outage it's usually not because the servers turned off. It also doesn't guarantee that your backups are working properly.
100% certain because I started by self hosting before moving to AWS services for specific components and improved the uptime and reduced the time I spent keeping those services alive.
Yes, mix-and-match is the way to go, depending on what kind of skills are available in your team. I wouldn't touch a mail server with a 10-foot pole, but I'll happily self-manage certain daemons that I'm comfortable with.
Just be careful not to accept more complexity just because it is available, which is what the AWS evangelists often try to sell. After all, we should always make an informed decision when adding a new dependency, whether in code or in infrastructure.
Of course AWS are trying to sell you everything. It’s still on you and your team to understand your product and infrastructure and decide what makes sense for you.
> Do you account for frequency and variety of wakeups here?
Yes. In my career I've dealt with way more failures due to unnecessary distributed systems (that could have been one big bare-metal box) rather than hardware failures.
You can never eliminate wake-ups, but I find bare-metal systems to have much less moving parts means you eliminate a whole bunch of failure scenarios so you're only left with actual hardware failure (and HW is pretty reliable nowadays).
If this isn't the truth. I just spent several weeks, on and off, debugging a remote hosted build system tool thingy because it was in turn made of at least 50 different microservice type systems and it was breaking in the middle of two of them.
There was, I have to admit, a log message that explained the problem... once I could find the specific log message and understand the 45 steps in the chain that got to that spot.
I don’t think it’s a lie, it’s just perhaps overstated. The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.
Whether or not you need that equivalence is an orthogonal question.
> The number of staff needed to manage a cloud infrastructure is definitely lower than that required to manage the equivalent self-hosted infrastructure.
There's probably a sweet spot where that is true, but because cloud providers offer more complexity (self-inflicted problems) and use PR to encourage you to use them ("best practices" and so on) in all the cloud-hosted shops I've been in a decade of experience I've always seen multiple full-time infra people being busy with... something?
There was always something to do, whether to keep up with cloud provider changes/deprecations, implementing the latest "best practice", debugging distributed systems failures or self-inflicted problems and so on. I'm sure career/resume polishing incentives are at play here too - the employee wants the system to require their input otherwise their job is no longer needed.
Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff, but in practice I've never seen anything but solo founders actually achieve that.
> but because cloud providers offer more complexity (self-inflicted problems)
It's complexity but it's also providing features. If you didn't use those cloud features, you'd be writing or gluing together and maintaining your own software to accomplish the same tasks, which takes even more staff
> Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff
So let's put it another way: either you're massively reducing/eliminating staff to achieve the same level of functionality, or you're keeping the equivalent staff but massively increasing functionality.
The point is, clouds let you deliver a lot more with a lot less people, no matter which way you cut it. The people spending money on them aren't mostly dumb.
Exactly. Companies with cloud infra often still have to hire infra people or even an infra team, but that team will be smaller than if they were self-hosting everything, in some cases radically smaller.
I love self-hosting stuff and even have a bias towards it, but the cost/time tradeoff is more complex than most people think.
You are missing that most services don't have high availability needs and don't need to scale.
Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.
I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.
The discussion isn't "what is more effective". The discussion is "who wants to be blamed in case things go south". If you push the decision to move to self-hosted and then one of the engineers fucks up the database, you have a serious problem. If same engineer fucks up cloud database, it's easier to save your own ass.
its not. I've been in a few shops that use RDS because they think their time is better spend doing other things.
except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.
One thing unaccounted for if you've only ever used cloud-hosted DBs is just how slow they are compared to a modern server with NVME storage.
This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.
On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.
I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.
its more of a general condition - its not that RDS is somehow really faulty, its just that when things do go wrong, its not really anybody's job to introspect the system because RDS is taking care of it for us.
in the limit I dont think we should need DBAs, but as long as we need to manage indices by hand, think more than 10 seconds about the hot queries, manage replication, tune the vacuumer, track updates, and all the other rot - then actually installing PG on a node of your choice is really the smallest of problems you face.
So, yeah, I guess there's much confusion about what a 'managed database' actually is? Because for me, the table stakes are:
-Backups: the provider will push a full generic disaster-recovery backup of my database to an off-provider location at least daily, without the need for a maintenance window
-Optimization: index maintenance and storage optimization are performed automatically and transparently
-Multi-datacenter failover: my database will remain available even if part(s) of my provider are down, with a minimal data loss window (like, 30 seconds, 5 minutes, 15 minutes, depending on SLA and thus plan expenditure)
-Point-in-time backups are performed at an SLA-defined granularity and with a similar retention window, allowing me to access snapshots via a custom DSN, not affecting production access or performance in any way
-Slow-query analysis: notifying me of relevant performance bottlenecks before they bring down production
-Storage analysis: my plan allows for #GB of fast storage, #TB of slow storage: let me know when I'm forecast to run out of either in the next 3 billing cycles or so
Because, well, if anyone provides all of that for a monthly fee, the whole "self-hosting" argument goes out of the window quickly, right? And I say that as someone who absolutely adores self-hosting...
It's even worse when you start finding you're staffing specialized skills. You have the Postgres person, and they're not quite busy enough, but nobody else wants to do what they do. But then you have an issue while they're on vacation, and that's a problem. Now I have a critical service but with a bus factor problem. So now I staff two people who are now not very busy at all. One is a bit ambitious and is tired of being bored. So he's decided we need to implement something new in our Postgres to solve a problem we don't really have. Uh oh, it doesn't work so well, the two spend the next six months trying to work out the kinks with mixed success.
This would be a strange scenario because why would you keep these people employed? If someone doesn't want to do the job required, including servicing Postgres, then they wouldn't be with me any longer, I'll find someone who does.
IMO, the reason to self-host your database is latency.
Yes, I'd say backups and analysis are table stakes for hiring it, and multi-datacenter failover is a relevant nice to have. But the reason to do it yourself is because it's literally impossible to get anything as good as you can build in somebody's else computer.
Self-host things the boss won't call at 3 AM about: logs, traces, exceptions, internal apps, analytics. Don't self-host the database or major services.
If you set it up right, you can automate all this as well by self hosting. There is really nothing special about automating backups or multi region fail over.
Trusted in what sense, that they'll always work perfectly 100% of the time? No, therefore one must still check them from time to time, and it's really no different when self hosting, again, if you do it correctly.
Once they convince you that you can’t do it yourself, you end up relying on them, but didn’t develop the skills you would need to migrate to another provider when they start raising prices. And they keep raising prices because by then you have no choice.
There is plenty of provider markup, to be sure. But it is also very much not a given that the hosted version of a database is running software/configs that are equivalent to what you could do yourself. Many hosted databases are extremely different behind the scenes when it comes to durability, monitoring, failover, storage provisioning, compute provisioning, and more. Just because it acts like a connection hanging off a postmaster service running on a server doesn’t mean that’s what your “psql” is connected to on RDS Aurora (or many of the other cloud-Postgres offerings).
I have not tested this in real life yet but it seems like all the argument about vendor lock in can be solved, if you bite the bullet and learn basic Kubernetes administration. Kubernetes is FOSS and there are countless Kubernetes as a service providers.
I know there are other issues with Kubernetes but at least its transferable knowledge.
since this is on the front page (again?) I guess I'll chime in: learn kubernetes - it's worth it. It did take me 3 attempts at it to finally wrap my head around it I really suggest trying out many different things and see what works for you.
And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.
Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.
Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.
I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.
As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.
And on a similar naming note yet totally unrelated, check out k9s, which is a TUI for Kubernetes cluster admin. All kinds of nifty features built-in, and highly customizable.
If we're talking about CLIs, check out Kamal, the build system that 37signals / Basecamp / DHH developed, specifically to move off the cloud. I think it uses Kubernetes but not positive, it might just be Docker.
I'm probably just an idiot, but I ran unmanaged postgres on Fly.io, which is basically self hosting on a vm, and it wasn't fun.
I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?
Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.
I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.
I still don't get how folks can hype Postgres with every second post on HN, yet there is no simple batteries-included way to run a HA Postgres cluster with automatic failover like you can do with MongoDB. I'm genuinely curious how people deal with this in production when they're self-hosting.
Beyond the hype, the PostgreSQL community is aware of the lack of "batteries-included" HA. This discussion on the idea of a Built-in Raft replication mentions MongoDB as:
>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.
CloudNativePG is automation around PostgreSQL, not "batteries included", and not the idea of Kubernetes where pods can die or spawn without impacting the availability. Unfortunately, naming it Cloud Native doesn't transform a monolithic database to an elastic cluster
It's largely cultural. In the SQL world, people are used to accepting the absence of real HA (resilience to failure, where transactions continue without interruption) and instead rely on fast DR (stop the service, recover, check for data loss, start the service). In practice, this means that all connections are rolled back, clients must reconnect to a replica known to be in synchronous commit, and everything restarts with a cold cache.
Yet they still call it HA because there's nothing else.
Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.
Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.
One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.
About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB
it's easy to through names out like this (pgbackrest is also useful...) but getting them setup properly in a production environment is not at all straightforward, which I think is the point.
…in which case, you should probably use a hosted offering that takes care of those things for you. RDS Aurora (Serverless or not), Neon, and many other services offer those properties without any additional setup. They charge a premium for them, however.
It’s not like Mongo gives you those properties for free either. Replication/clustering related data loss is still incredibly common precisely because mongo makes it seem like all that stuff is handled automatically at setup when in reality it requires plenty of manual tuning or extra software in order to provide the guarantees everyone thinks it does.
> IMO Maria has fallen behind MySQL. I wouldn't chose it for anything my income depends on.
Can you give any details on that?
I switched to MariaDB back in the day for my personal projects because (so far as I could tell) it was being updated more regularly, and it was more fully open source. (I don't recall offhand at this point whether MySQL switched to a fully paid model, or just less-open.)
I use Patroni for that in a k8s environment (although it works anywhere). I get an off-the-shelf declarative deployment of an HA postgres cluster with automatic failover with a little boiler-plate YAML.
Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.
It's basially push-button production PG.
There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.
And if you want a supabase-like functionality, I'm a huge fan of PostgREST (which is actually how supabase works/worked under the hood). Make a view for your application and boom, you have a GET only REST API. Add a plpgsql function, and now you can POST. It uses JWT for auth, but usually I have application on the same VLAN as DB so it's not as rife for abuse.
I hosted PostgreSQL professionally for over a decade.
Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.
Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.
I have ran (read: helped with infrastructure) a small production service using PSQL for 6 years, with up to hundreds of users per day. PSQL has been the problem exactly once, and it was because we ran out of disk space. Proper monitoring (duh) and a little VACUUM would have solved it.
Later I ran a v2 of that service on k8s. The architecture also changed a lot, hosting many smaller servers sharing the same psql server(Not really microservice-related, think more "collective of smaller services ran by different people"). I have hit some issues relating to maxing out the max connections, but that's about it.
This is something I do on my free time so SLA isn't an issue, meaning I've had the ability to learn the ropes of running PSQL without many bad consequences. I'm really happy I have had this opportunity.
My conclusion is that running PSQL is totally fine if you just set up proper monitoring. If you are an engineer that works with infrastructure, even just because nobody else can/wants to, hosting PSQL is probably fine for you. Just RTFM.
Standard Postgres compiled with some AWS-specific monitoring hooks
A custom backup system using EBS snapshots
Automated configuration management via Chef/Puppet/Ansible
Load balancers and connection pooling (PgBouncer)
Monitoring integration with CloudWatch
Automated failover scripting
I didn't know RDS had PgBouncer under the hood, is this really accurate?
The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.
What do you postgres self hosters use for performance analysis? Both GCP-SQL and RDS have their performance analysis pieces of the hosted DB and it's incredible. Probably my favorite reason for using them.
Just don't try to build it from source haha. Compiling Postgres 18 with the PostGIS extension has been such a PITA because the topology component won't configure to not use the system /usr/bin/postgres and has given me a lot of grief. Finally got it fixed I think though.
Self-hosting is one of those things that makes sense when you can control all of the variables. For example, can you stop the developers from using obscure features of the db, that suddenly become deprecated, causing you to need to do a manual rolling back while they fix the code? A one-button UI to do that might be very handy. Can you stop your IT department from breaking the VPN, preventing you from logging into the db box at exactly the wrong time? Having it all in a UI that routes around IT's fat fingers might be helpful.
I wish this article would have went more in-depth on how they're setting up backups. The great thing about sequel light is lightstream makes backup and restore something you don't really have to think about
> If your database goes down at 3 AM, you need to fix it.
Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.
I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it
From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am
No. You sit on the call and wait to restore your service to your users. There’s bullshit toil in disabling scale in as the outage gets longer.
Eventually, AWS has a VP of something dial in to your call to apologize. They’re unprepared and offer no new information. The get handed to a side call for executive bullshit.
AWS comes back. Your support rep only vaguely knows what’s going on. Your system serves some errors but digs out.
Really? That might be an anecdote sampled from unusually small businesses, then. Between myself and most peers I’ve ever talked to about availability, I heard an overwhelming majority of folks describe systems that really did need to be up 24/7 with high availability, and thus needed fast 24/7 incident response.
That includes big and small businesses, SaaS and non-SaaS, high scale (5M+rps) to tiny scale (100s-10krps), and all sorts of different markets and user bases. Even at the companies that were not staffed or providing a user service over night, overnight outages were immediately noticed because on average, more than one external integration/backfill/migration job was running at any time. Sure, “overnight on call” at small places like that was more “reports are hardcoded to email Bob if they hit an exception, and integration customers either know Bob’s phone number or how to ask their operations contact to call Bob”, but those are still environments where off-hours uptime and fast resolution of incidents was expected.
Between me, my colleagues, and friends/peers whose stories I know, that’s an N of high dozens to low hundreds.
IME the need for 24x7 for B2B apps is largely driven by global customer scope. If you have customers in North American and Asia, now you need 24x7 (and x365 because of little holiday overlap).
That being said, there are a number of B2B apps/industries where global scope is not a thing. For example, many providers who operate in the $4.9 trillion US healthcare market do not have any international users. Similarly the $1.5 trillion (revenue) US real estate market. There are states where one could operate where healthcare spending is over $100B annually. Banks. Securities markets. Lots of things do not have 24x7 business requirements.
I’ve worked for banks, multiple large and small US healthcare-related companies, and businesses that didn’t use their software when they were closed for the night.
All of those places needed their backend systems to be up 24/7. The banks ran reports and cleared funds with nightly batches—hundreds of jobs a night for even small banking networks. The healthcare companies needed to receive claims and process patient updates (e.g. your provider’s EMR is updated if you die or have an emergency visit with another provider you authorized for records sharing—and no, this is not handled by SaaS EMRs in many cases) over night so that their systems were up to date when they next opened for business. The “regular” businesses closed for the night generated reports and frequently had IT staff doing migrations, or senior staff working on something at midnight due the next day (when the head of marketing is burning the midnight oil on that presentation, you don’t want to be the person explaining that she can’t do it because the file server hosting the assets is down all the time after hours).
And again, that’s the norm I’ve heard described from nearly everyone in software/IT that I know: most businesses expect (and are willing to pay for or at least insist on) 24/7 uptime for their computer systems. That seems true across the board: for big/small/open/closed-off-hours/international/single-timezone businesses alike.
Sometimes it is nice to simplify the conversation with non-tech management. Oh, you want HA / DR / etc? We click a button and you get it (multi-AZ). Clicking the button doubles your DB costs from x to y. Please choose.
Then you have one less repeating conversation and someone to blame.
I recently was also doing some research into what projects exist that come close to a “managed Postgres on Digital Ocean” experience, sadly there’s some building blocks but nothing that really makes it a complete no-brainer.
I didnt even know there were companies that would host postgres for you. I self host it for my personal projects with 0 users and it works just fine, so I don't know why anyone would do it any differently.
I can't tell if this is satire or not with the first sentence and the "0 users" parts of your comment, but I know several solo devs with millions of users who self host their database and apps as well.
For a fascinating counterpoint (gist: cloud hosted Postgres on RDS aurora is not anything like the system you would host yourself, and other cloud deployments of databases should also not be done like our field is used to doing it when self-hosting) see this other front page article and discussion: https://news.ycombinator.com/item?id=46334990
Aurora is a closed-source fork of PostgreSQL. So it is indeed not possible to self-host it.
However a self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of. Especially at the same price points.
Yep! I was mostly replying to TFA’s claim that AWS RDS is
> Standard Postgres compiled with some AWS-specific monitoring hooks
… and other operational tools deployed alongside it. That’s not always true: RDS classic may be those things, but RDS Aurora/Serverless is anything but.
As to whether
> self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of
That’s often but not always true. Plenty of workloads will perform better on RDS (read auto scaling is huge in Serverless: you can have new read replica nodes auto-launch in response to e.g. a wave of concurrent, massive reporting queries; many queries can benefit from RDS’s additions to/modifications of the pg buffer cache system that work with the underlying storage)—and that’s even with the VM tax and the networked-storage tax! Of course, it’ll cost more in real money whether or not it performs better, further complicating the cost/benefit analysis here.
Also, pedantically, you can run RDS on bare metal with local NVMEs.
> Also, pedantically, you can run RDS on bare metal with local NVMEs.
Only if you like your data to evaporate when the server stops.
I'm relatively sure that the processing power and memory you can buy on OVH / Hetzner / co. is larger and cheaper even if you take into account peaks in your usage patterns.
> Only if you like your data to evaporate when the server stops.
(Edited to remove glib and vague rejoinder, sorry) Then hibernate/reboot it instead of stopping it? Alternatively, that’s what backup-to S3, periodic snapshot-to-EBS, clustering, or running an EBS-persisted zero-query-volume tiny replica are for.
> the processing power and memory you can buy on OVH / Hetzner / co. is larger and cheaper
Cheaper? Yeah, generally. But larger/more performant? Not always—it’s not about peaks/autoscaling, it’s about the (large) minority of workloads that will work better on RDS/Aurora/Serverless: auto-scale-out makes the reports run on time regardless of cost; bulk data loads are available on replicas a lot sooner on Aurora because the storage is the replication system, not the WAL; and so on—if you add up all the situations where the hosted RDBMS systems trump self hosted, you get an amount that’s not “hosted is always better/worth it”, but it’s not “hosted is just ops time savings and is otherwise just slower/more expensive” either. And that’s before you add reliability into the conversation.
Better yet, self host Postgres on your own open source PaaS with Coolify, Dokploy, or Canine, and then you can also self host all your apps on your VPS too. I use Dokploy but I'm looking into Canine, and I know many have used Coolify with great success.
> When self-hosting makes sense: 1. If you're just starting out in software & want to get something working quickly [...]
This is when you use SQLite, not Postgres. Easy enough to turn into Postgres later, nothing to set up. It already works. And backups are literally just "it's a file, incremental backup by your daily backups already covers this".
Huh? Maybe I missed something, but...why should self-hosting a database server be hard or scary? Sure, you are then responsible for security backups, etc...but that's not really different in the cloud - if anything, the cloud makes it more complicated.
Well for the clickops folks who've built careers on the idea that 'systems administration is dead'... I imagine having to open a shell and install some stuff or modify a configuration file is quite scary.
I'd say a managed dB, at minimum, should be handling upgrades and backups for you. If it doesn't, thats not a managed db, thats a self-service db. You're paying a premium to do the work yourself.
I wish this post went into the actual how! He glossed over the details. There is a link to his repo, which is a start I suppose: https://github.com/piercefreeman/autopg
A blog post that went into the details would be awesome. I know Postgres has some docs for this (https://www.postgresql.org/docs/current/backup.html), but it's too theoretical. I want to see a one-stop-shop with everything you'd reasonably need to know to self host: like monitoring uptime, backups, stuff like that.
And then there is the urge to Postgres everything.
I was disappointed alloy doesn't support timescaledb as a metrics endpoint. Considering switching to telegraf just because I can store the metrics on Postgres.
It's pretty easy these days to spin up a local Postgres container. Might as well use it for prototyping too, and save yourself the hassle of switching later.
It might seem minor, but the little things add up. Make your dev environment mirror prod from the start will save you a bunch of headaches. Then, when you're ready to deploy, there is nothing to change.
Even better, stage to a production-like environment early, and then deploy day can be as simple as a DNS record change.
Have you given thought to why you prototype with SQLite?
I have switched to using postgres even for prototyping once I prepared some shell scripts for various setup. With hibernate (java) or knex (Javascript/NodeJS) and with unit tests (Test Driven Development approach) for code, I feel I have reduced the friction of using postgres from the beginning.
Because when I get tired of reconstructing the contents of the database between my various dev machines (at home, at work, on a remote server, on my laptop) I can just scp the sqlite db across.
Because it's "low effort" to just fire it into sqlite and if I have to do ridiculous things to the schema as I footer around working out exactly what I want the database to do.
I don't want to use nodejs if I can possibly avoid it and you literally could not pay me to even look at Java, there isn't enough money in the world.
I've operated both self-hosted and managed database clusters with complex topologies and mission-critical data at well-known tech companies.
Managed database services mostly automate a subset of routine operational work, things like backups, some configuration management, and software upgrades. But they don't remove the need for real database operations.
You still have to validate restores, build and rehearse a disaster recovery plan, design and review schemas, review and optimize queries, tune indexes, and fine-tune configuration, among other essentials.
In one incident, AWS support couldn't determine what was wrong with an RDS cluster and advised us to "try restarting it".
Bottom line: even with managed databases, you still need people on the team who are strong in DBOps. You need standard operating procedures and automation, built by your team. Without that expertise, you're taking on serious risk, including potentially catastrophic failure modes.
I've had an RDS instance run out of disk space and then get stuck in "modifying" for 24 hours (until an AWS operator manually SSH'd in I guess). We had to restore from the latest snapshot and manually rebuild the missing data from logs/other artifacts in the meantime to restore service.
I would've very much preferred being able to SSH in myself and fix it on the spot. Ironically the only reason it ran out of space in the first place is that the AWS markup on that is so huge we were operating with little margin for error; none of that would happen with a bare-metal host where I can rent 1TB of NVME for a mere 20 bucks a month.
As far as I know we never got any kind of compensation for this, so RDS ended up being a net negative for this company, tens of thousands spent over a few years for laptop-grade performance and it couldn't even do its promised job the only time it was needed.
Recommends hosting postgres yourself. Doesn't recommend a distribution stack. If you try this at a startup to save $50 a month, you will never recoup the time you wasted setting it up. We pay dedicated managed services for these things so we can make products on top of them.
The one problem with using your distro's Postgres is that your upgrade routine will be dictated by a 3rd party.
And Postgres upgrades are not transparent. So you'll have a 1 or 2 hours task, every 6 to 18 months that you have only a small amount of control over when it happens. This is ok for a lot of people, and completely unthinkable for some other people.
"just use postgres from your distro" is *wildly* underselling the amount of work that it takes to go from apt install postgres to having a production ready setup (backups, replica, pooling, etc). Granted, if it's a tiny database just pg-dumping might be enough, but for many that isn't going to be enough.
I don't think any of these would take more than a week to setup. Assuming you create a nice runbook with every step it would not be horrible to maintain as well. Barman for backups and unless you need multi-master you can use the builtin publication and subscription. Though with scale things can complicated really fast but most of the time you won't that much traffic to have something complicated.
For client projects, however, I always try and sell them on paying the AWS fees, simply because it shifts the responsibility of the hardware being "up" to someone else. It does not inherently solve the downtime problem, but it allows me to say, "we'll have to wait until they've sorted this out, Ikea and Disney are down, too."
Doesn't always work like that and isn't always a tried-and-true excuse, but generally lets me sleep much better at night.
With limited budgets, however, it's hard to accept the cost of RDS (and we're talking with at least one staging environment) when comparing it to a very tight 3-node Galera cluster running on Hetzner at barely a couple of bucks a month.
Or Cloudflare, titan at the front, being down again today and the past two days (intermittently) after also being down a few weeks ago and earlier this year as well. Also had SQS queues time out several times this week, they picked up again shortly, but it's not like those things ...never happen on managed environments. They happen quite a bit.
Obviously it depends on the operational overhead of specific technology.
I have never, ever, ever had a SQL box go down. I've had a web server go down once. I had someone who probably shouldn't have had access to a server accidentally turn one off once.
The only major outage I've had (2/3 hours) was when the box was also self-hosting an email server and I accidentally caused it to flood itself with failed delivery notices with a deploy.
I may have cried a little in frustration and panic but it got fixed in the end.
I actually find using cloud hosted SQL in some ways harder and more complicated because it's such a confusing mess of cost and what you're actually getting. The only big complication is setting up backups, and that's a one-off task.
It is. You need to answer the question: what are the changes of your service being down for lets say 4 hours or some security patch isn't properly applied or you have not followed the best practices in terms of security? Many people are technically unable, lack the time or the resources to be able to confidently address that question, hence paying for someone else to do it.
Your time is money though. You are saving money but giving up time.
Like everything, it is always cheaper to do it (it being cooking at home, cleaning your home, fixing your own car, etc) yourself (if you don't include the cost of your own time doing the service you normally pay someone else for).
You see the issue?
Like, I’m all for not procuring things that it makes more sense to own/build (and I know most businesses have piss-poor instincts on which is which—hell, I work for the government! I can see firsthand the consequences of outsourcing decision making to contractors, rather than just outsourcing implementation).
But it’s very case-by-case. There’s no general rule like “always prefer self hosting” or “always rent real estate, never buy” that applies broadly enough to be useful.
> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app
> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.
This is funny. I'd argue the exact opposite. I would self host only:
* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or
* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.
I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...
Every company out there is using the cloud and yet still employs infrastructure engineers to deal with its complexity. The "cloud" reducing staff costs is and was always a lie.
PaaS platforms (Heroku, Render, Railway) can legitimately be operated by your average dev and not have to hire a dedicated person; those cost even more though.
Another limitation of both the cloud and PaaS is that they are only responsible for the infrastructure/services you use; they will not touch your application at all. Can your application automatically recover from a slow/intermittent network, a DB failover (that you can't even test because your cloud providers' failover and failure modes are a black box), and so on? Otherwise you're waking up at 3am no matter what.
In my experience you typically need less people if using a Cloud Provider than in-house (or the same number of people can handle more instances) due to increased leverage. Whether you can maximize what you get via leverage depends on how good your team is.
US companies typically like to minimize headcount (either through accounting tricks or outsourcing) so usually using a Cloud Provider wins out for this reason alone. It's not how much money you spend, it's how it looks on the balance sheet ;)
Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.
Local reproducibility is easier, and performance is often much better
As I pointed out above, you may be better served mixing and matching so you spend your time on the critical aspects but offload those other tasks to someone else.
Of course, I’m not sitting at your computer so I can’t tell you what’s right for you.
For medium sized companies you need "devops engineers". And in all honesty, more than you'd need sysadmins for the same deployment.
For large companies, they split up AWS responsibilities into entire departments of teams (for example, all clouds have math auth so damn difficult most large companies have -not 1- but multiple departments just dealing with authorization, before you so much as start your first app)
This doesn’t make sense as an argument. The reason the cloud is more complex is because that complexity is available. Under a certain size, a large number of cloud products simply can’t be managed in-house (and certainly not altogether).
Also your argument is incorrect in my experience.
At a smaller business I worked at, I was able to use these services to achieve uptime and performance that I couldn’t achieve self-hosted, because I had to spend time on the product itself. So yeah, we’d saved on infrastructure engineers.
At larger scales, what your false dichotomy suggests also doesn’t actually happen. Where I work now, our data stores are all self-managed on top of EC2/Azure, where performance and reliability are critical. But we don’t self-host everything. For example, we use SES to send our emails and we use RDS for our app DB, because their performance profiles and uptime guarantees are more than acceptable for the price we pay. That frees up our platform engineers to spend their energy on keeping our uptime on our critical services.
How sure are you about that one? All of my hetzner vm`s reach an uptime if 99.9% something.
I could see more then one small business stack fitting onto a single of those vm`s.
Just be careful not to accept more complexity just because it is available, which is what the AWS evangelists often try to sell. After all, we should always make an informed decision when adding a new dependency, whether in code or in infrastructure.
> The "cloud" reducing staff costs
Both can be true at the same time.
Also:
> Otherwise you're waking up at 3am no matter what.
Do you account for frequency and variety of wakeups here?
Yes. In my career I've dealt with way more failures due to unnecessary distributed systems (that could have been one big bare-metal box) rather than hardware failures.
You can never eliminate wake-ups, but I find bare-metal systems to have much less moving parts means you eliminate a whole bunch of failure scenarios so you're only left with actual hardware failure (and HW is pretty reliable nowadays).
There was, I have to admit, a log message that explained the problem... once I could find the specific log message and understand the 45 steps in the chain that got to that spot.
Whether or not you need that equivalence is an orthogonal question.
There's probably a sweet spot where that is true, but because cloud providers offer more complexity (self-inflicted problems) and use PR to encourage you to use them ("best practices" and so on) in all the cloud-hosted shops I've been in a decade of experience I've always seen multiple full-time infra people being busy with... something?
There was always something to do, whether to keep up with cloud provider changes/deprecations, implementing the latest "best practice", debugging distributed systems failures or self-inflicted problems and so on. I'm sure career/resume polishing incentives are at play here too - the employee wants the system to require their input otherwise their job is no longer needed.
Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff, but in practice I've never seen anything but solo founders actually achieve that.
It's complexity but it's also providing features. If you didn't use those cloud features, you'd be writing or gluing together and maintaining your own software to accomplish the same tasks, which takes even more staff
> Maybe in a perfect world you can indeed use cloud-hosted services to reduce/eliminate dedicated staff
So let's put it another way: either you're massively reducing/eliminating staff to achieve the same level of functionality, or you're keeping the equivalent staff but massively increasing functionality.
The point is, clouds let you deliver a lot more with a lot less people, no matter which way you cut it. The people spending money on them aren't mostly dumb.
I love self-hosting stuff and even have a bias towards it, but the cost/time tradeoff is more complex than most people think.
Can we honestly say that cloud services taking a half hour to two hours a month of someone's time on average is completely unheard of?
Most projects I have worked on in my career have never seen more than a hundred concurrent users. If something goes down on Saturday, I am going to fix it on Monday.
I have worked on internal tools were I just added a postgres DB to the docker setup and that was it. 5 Minute of work and no issues at all. Sure if you have something customer facing, you need to do a bit more and setup a good backup strategy but that really isn't magic.
except now they are stuck trying to maintain and debug Postgres without having the same visibility and agency that they would if they hosted it themselves. situation isn't at all clear.
This leads the developers to do all kinds of workarounds and reach for more cloud services (and then integrating them and - often poorly - ensuring consistency across them) because the cloud hosted DB is not able to handle the load.
On bare-metal, you can go a very long way with just throwing everything at Postgres and calling it a day.
I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.
in the limit I dont think we should need DBAs, but as long as we need to manage indices by hand, think more than 10 seconds about the hot queries, manage replication, tune the vacuumer, track updates, and all the other rot - then actually installing PG on a node of your choice is really the smallest of problems you face.
-Backups: the provider will push a full generic disaster-recovery backup of my database to an off-provider location at least daily, without the need for a maintenance window
-Optimization: index maintenance and storage optimization are performed automatically and transparently
-Multi-datacenter failover: my database will remain available even if part(s) of my provider are down, with a minimal data loss window (like, 30 seconds, 5 minutes, 15 minutes, depending on SLA and thus plan expenditure)
-Point-in-time backups are performed at an SLA-defined granularity and with a similar retention window, allowing me to access snapshots via a custom DSN, not affecting production access or performance in any way
-Slow-query analysis: notifying me of relevant performance bottlenecks before they bring down production
-Storage analysis: my plan allows for #GB of fast storage, #TB of slow storage: let me know when I'm forecast to run out of either in the next 3 billing cycles or so
Because, well, if anyone provides all of that for a monthly fee, the whole "self-hosting" argument goes out of the window quickly, right? And I say that as someone who absolutely adores self-hosting...
Corollary: rental/SaaS models provide that property in large part because their providers have lots of slack.
Yes, I'd say backups and analysis are table stakes for hiring it, and multi-datacenter failover is a relevant nice to have. But the reason to do it yourself is because it's literally impossible to get anything as good as you can build in somebody's else computer.
I would expect a little bit more as a cost of the convenience, but in my experience it's generally multiple times the expense. It's wild.
This has kept me away from managed databases in all but my largest projects.
If anything that’s a feature for ease of use and compatibility.
I know there are other issues with Kubernetes but at least its transferable knowledge.
And I really recommend starting with *default* k3s, do not look at any alternatives to cni, csi, networked storage - treat your first cluster as something that can spontaniously fail and don't bother keeping it clean learn as much as you can.
Once you have that, you can use great open-source k8s native controllers which take care of vast majority of requirements when it comes to self-hosting and save more time in the long run than it took to set up and learn these things.
Honerable mentions: k9s, lens(I do not suggest using it in the long-term, but UI is really good as a starting point), rancher webui.
PostgreSQL specifically: https://github.com/cloudnative-pg/cloudnative-pg If you really want networked storage: https://github.com/longhorn/longhorn
I do not recommend ceph unless you are okay with not using shared filesystems as they have a bunch of gotchas or if you want S3 without having to install a dedicated deployment for it.
As someone who has operated Postgres clusters for over a decade before k8s was even a thing, I fully recommend just using a Postgres operator like this one and moving on. The out of box config is sane, it’s easy to override things, and failover/etc has been working flawlessly for years. It’s just the right line between total DIY and the simplicity of having a hosted solution. Postgres is solved, next problem.
In case you want to self host but also have something that takes care of all that extra work for you
I did this for just under two years, and I've lost count of how many times one or more of the nodes went down and I had to manually deregister it from the cluster with repmgr, clone a new vm and promote a healthy node to primary. I ended up writing an internal wiki page with the steps. I never got it: if one of the purposes of clusters is having higher availability, why did repmgr not handle zombie primaries?
Again, I'm probably just an idiot out of my depth with this. And I probably didn't need a cluster anyway, although with the nodes failing like they did, I didn't feel comfortable moving to a single node setup as well.
I eventually switched to managed postgres, and it's amazing being able to file a sev1 for someone else to handle when things go down, instead of the responsibility being on me.
>> "God Send". Everything just worked. Replication was as reliable as one could imagine. It outlives several hardware incidents without manual intervention. It allowed cluster maintenance (software and hardware upgrades) without application downtime. I really dream PostgreSQL will be as reliable as MongoDB without need of external services.
https://www.postgresql.org/message-id/0e01fb4d-f8ea-4ca9-8c9...
CloudNativePG (https://cloudnative-pg.io) is a great option if you’re using Kubernetes.
There’s also pg_auto_failover which is a Postgres extension and a bit less complex than the alternatives, but it has its drawbacks.
Yet they still call it HA because there's nothing else. Even a planned shutdown of the primary to patch the OS results in downtime, as all connections are terminated. The situation is even worse for major database upgrades: stop the application, upgrade the database, deploy a new release of the app because some features are not compatible between versions, test, re-analyze the tables, reopen the database, and only then can users resume work.
Everything in SQL/RDBMS was thought for a single-node instance, not including replicas. It's not HA because there can be only one read-write instance at a time. They even claim to be more ACID than MongoDB, but the ACID properties are guaranteed only on a single node.
One exception is Oracle RAC, but PostgreSQL has nothing like that. Some forks, like YugabyteDB, provide real HA with most PostgreSQL features.
About the hype: many applications that run on PostgreSQL accept hours of downtime, planned or unplanned. Those who run larger, more critical applications on PostgreSQL are big companies with many expert DBAs who can handle the complexity of database automation. And use logical replication for upgrades. But no solution offers both low operational complexity and high availability that can be comparable to MongoDB
Until then it is nice to have options, even if they do require extra steps.
It’s not like Mongo gives you those properties for free either. Replication/clustering related data loss is still incredibly common precisely because mongo makes it seem like all that stuff is handled automatically at setup when in reality it requires plenty of manual tuning or extra software in order to provide the guarantees everyone thinks it does.
(I do use Maria at home for legacy reasons, and have used MySQL and Pg professionally for years.)
Can you give any details on that?
I switched to MariaDB back in the day for my personal projects because (so far as I could tell) it was being updated more regularly, and it was more fully open source. (I don't recall offhand at this point whether MySQL switched to a fully paid model, or just less-open.)
Patroni has been around for awhile. The database-as-a-service team where I work uses it under the hood. I used it to build database-as-a-service functionality on the infra platform team I was at prior to that.
It's basially push-button production PG.
There's at least one decent operator framework leveraging it, if that's your jam. I've been living and dying by self-hosting everything with k8s operators for about 6-7 years now.
Overall, a good experience. Very stable service and when performance issues did periodically arise, I like that we had full access to all details to understand the root cause and tune details.
Nobody was employeed as a full-time DBA. We had plenty of other things going on in addition to running PostgreSQL.
Later I ran a v2 of that service on k8s. The architecture also changed a lot, hosting many smaller servers sharing the same psql server(Not really microservice-related, think more "collective of smaller services ran by different people"). I have hit some issues relating to maxing out the max connections, but that's about it.
This is something I do on my free time so SLA isn't an issue, meaning I've had the ability to learn the ropes of running PSQL without many bad consequences. I'm really happy I have had this opportunity.
My conclusion is that running PSQL is totally fine if you just set up proper monitoring. If you are an engineer that works with infrastructure, even just because nobody else can/wants to, hosting PSQL is probably fine for you. Just RTFM.
The problem i find with RDS (and most other managed Postgres) is that they limit your options for how you want to design your database architecture. For instance, if write consistency is important to you want to support synchronous replication, there is no way to do this in RDS without either Aurora or having the readers in another AZ. The other issue is that you only have access to logical replication, because you don't have access to your WAL archive, so it makes moving off RDS much more difficult.
Is this actually the "common" view (in this context)?
I've got decades with databases so I cannot even begin to fathom where such an attitude would develop, but, is it?
Boggling.
Of all the places I've worked that had the attitude "If this goes down at 3AM, we need to fix it immediately", there was only one where that was actually justifiable from a business perspective. I'm worked at plenty of places that had this attitude despite the fact that overnight traffic was minimal and nothing bad actually happened if a few clients had to wait until business hours for a fix.
I wonder if some of the preference for big-name cloud infrastructure comes from the fact that during an outage, employees can just say "AWS (or whatever) is having an outage, there's nothing we can do" vs. being expected to actually fix it
From this perspective, the ability to fix problems more quickly when self hosting could be considered an antifeature from the perspective of the employee getting woken up at 3am
You wake up. It's not your fault. You're helpless to solve it.
Eventually, AWS has a VP of something dial in to your call to apologize. They’re unprepared and offer no new information. The get handed to a side call for executive bullshit.
AWS comes back. Your support rep only vaguely knows what’s going on. Your system serves some errors but digs out.
Then you go to sleep.
That includes big and small businesses, SaaS and non-SaaS, high scale (5M+rps) to tiny scale (100s-10krps), and all sorts of different markets and user bases. Even at the companies that were not staffed or providing a user service over night, overnight outages were immediately noticed because on average, more than one external integration/backfill/migration job was running at any time. Sure, “overnight on call” at small places like that was more “reports are hardcoded to email Bob if they hit an exception, and integration customers either know Bob’s phone number or how to ask their operations contact to call Bob”, but those are still environments where off-hours uptime and fast resolution of incidents was expected.
Between me, my colleagues, and friends/peers whose stories I know, that’s an N of high dozens to low hundreds.
What am I missing?
IME the need for 24x7 for B2B apps is largely driven by global customer scope. If you have customers in North American and Asia, now you need 24x7 (and x365 because of little holiday overlap).
That being said, there are a number of B2B apps/industries where global scope is not a thing. For example, many providers who operate in the $4.9 trillion US healthcare market do not have any international users. Similarly the $1.5 trillion (revenue) US real estate market. There are states where one could operate where healthcare spending is over $100B annually. Banks. Securities markets. Lots of things do not have 24x7 business requirements.
All of those places needed their backend systems to be up 24/7. The banks ran reports and cleared funds with nightly batches—hundreds of jobs a night for even small banking networks. The healthcare companies needed to receive claims and process patient updates (e.g. your provider’s EMR is updated if you die or have an emergency visit with another provider you authorized for records sharing—and no, this is not handled by SaaS EMRs in many cases) over night so that their systems were up to date when they next opened for business. The “regular” businesses closed for the night generated reports and frequently had IT staff doing migrations, or senior staff working on something at midnight due the next day (when the head of marketing is burning the midnight oil on that presentation, you don’t want to be the person explaining that she can’t do it because the file server hosting the assets is down all the time after hours).
And again, that’s the norm I’ve heard described from nearly everyone in software/IT that I know: most businesses expect (and are willing to pay for or at least insist on) 24/7 uptime for their computer systems. That seems true across the board: for big/small/open/closed-off-hours/international/single-timezone businesses alike.
Sometimes it is nice to simplify the conversation with non-tech management. Oh, you want HA / DR / etc? We click a button and you get it (multi-AZ). Clicking the button doubles your DB costs from x to y. Please choose.
Then you have one less repeating conversation and someone to blame.
https://blog.notmyhostna.me/posts/what-i-wish-existed-for-se...
However a self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of. Especially at the same price points.
> Standard Postgres compiled with some AWS-specific monitoring hooks
… and other operational tools deployed alongside it. That’s not always true: RDS classic may be those things, but RDS Aurora/Serverless is anything but.
As to whether
> self-hosted PostgreSQL on a bare metal server with NVMe SSDs will much faster than what RDS is capable of
That’s often but not always true. Plenty of workloads will perform better on RDS (read auto scaling is huge in Serverless: you can have new read replica nodes auto-launch in response to e.g. a wave of concurrent, massive reporting queries; many queries can benefit from RDS’s additions to/modifications of the pg buffer cache system that work with the underlying storage)—and that’s even with the VM tax and the networked-storage tax! Of course, it’ll cost more in real money whether or not it performs better, further complicating the cost/benefit analysis here.
Also, pedantically, you can run RDS on bare metal with local NVMEs.
Only if you like your data to evaporate when the server stops.
I'm relatively sure that the processing power and memory you can buy on OVH / Hetzner / co. is larger and cheaper even if you take into account peaks in your usage patterns.
(Edited to remove glib and vague rejoinder, sorry) Then hibernate/reboot it instead of stopping it? Alternatively, that’s what backup-to S3, periodic snapshot-to-EBS, clustering, or running an EBS-persisted zero-query-volume tiny replica are for.
> the processing power and memory you can buy on OVH / Hetzner / co. is larger and cheaper
Cheaper? Yeah, generally. But larger/more performant? Not always—it’s not about peaks/autoscaling, it’s about the (large) minority of workloads that will work better on RDS/Aurora/Serverless: auto-scale-out makes the reports run on time regardless of cost; bulk data loads are available on replicas a lot sooner on Aurora because the storage is the replication system, not the WAL; and so on—if you add up all the situations where the hosted RDBMS systems trump self hosted, you get an amount that’s not “hosted is always better/worth it”, but it’s not “hosted is just ops time savings and is otherwise just slower/more expensive” either. And that’s before you add reliability into the conversation.
This is when you use SQLite, not Postgres. Easy enough to turn into Postgres later, nothing to set up. It already works. And backups are literally just "it's a file, incremental backup by your daily backups already covers this".
Hiring and replacing engineers who can and want to manage database servers can be hard or scary for employers.
A blog post that went into the details would be awesome. I know Postgres has some docs for this (https://www.postgresql.org/docs/current/backup.html), but it's too theoretical. I want to see a one-stop-shop with everything you'd reasonably need to know to self host: like monitoring uptime, backups, stuff like that.
I was disappointed alloy doesn't support timescaledb as a metrics endpoint. Considering switching to telegraf just because I can store the metrics on Postgres.
SQLite when prototyping, Postgres for production.
If you need to power a lawnmower and all you have is a 500bhp Scania V8, you may as well just do it.
Even better, stage to a production-like environment early, and then deploy day can be as simple as a DNS record change.
I have switched to using postgres even for prototyping once I prepared some shell scripts for various setup. With hibernate (java) or knex (Javascript/NodeJS) and with unit tests (Test Driven Development approach) for code, I feel I have reduced the friction of using postgres from the beginning.
Because it's "low effort" to just fire it into sqlite and if I have to do ridiculous things to the schema as I footer around working out exactly what I want the database to do.
I don't want to use nodejs if I can possibly avoid it and you literally could not pay me to even look at Java, there isn't enough money in the world.
Managed database services mostly automate a subset of routine operational work, things like backups, some configuration management, and software upgrades. But they don't remove the need for real database operations. You still have to validate restores, build and rehearse a disaster recovery plan, design and review schemas, review and optimize queries, tune indexes, and fine-tune configuration, among other essentials.
In one incident, AWS support couldn't determine what was wrong with an RDS cluster and advised us to "try restarting it".
Bottom line: even with managed databases, you still need people on the team who are strong in DBOps. You need standard operating procedures and automation, built by your team. Without that expertise, you're taking on serious risk, including potentially catastrophic failure modes.
I would've very much preferred being able to SSH in myself and fix it on the spot. Ironically the only reason it ran out of space in the first place is that the AWS markup on that is so huge we were operating with little margin for error; none of that would happen with a bare-metal host where I can rent 1TB of NVME for a mere 20 bucks a month.
As far as I know we never got any kind of compensation for this, so RDS ended up being a net negative for this company, tens of thousands spent over a few years for laptop-grade performance and it couldn't even do its promised job the only time it was needed.
And Postgres upgrades are not transparent. So you'll have a 1 or 2 hours task, every 6 to 18 months that you have only a small amount of control over when it happens. This is ok for a lot of people, and completely unthinkable for some other people.