Looks interesting for something like local development. I don't intend to run production object storage myself, but some of the stuff in the guide to the production setup (https://garagehq.deuxfleurs.fr/documentation/cookbook/real-w...) would scare me a bit:
> For the metadata storage, Garage does not do checksumming and integrity verification on its own, so it is better to use a robust filesystem such as BTRFS or ZFS. Users have reported that when using the LMDB database engine (the default), database files have a tendency of becoming corrupted after an unclean shutdown (e.g. a power outage), so you should take regular snapshots to be able to recover from such a situation.
It seems like you can also use SQLite, but a default database that isn't robust against power failure or crashes seems suprising to me.
If you know of an embedded key-value store that supports transactions, is fast, has good Rust bindings, and does checksumming/integrity verification by default such that it almost never corrupts upon power loss (or at least, is always able to recover to a valid state), please tell me, and we will integrate it into Garage immediately.
We were not able to get good enough performance compared to LMDB. We will work on this more though, there are probably many ways performance can be increased by reducing load on the KV store.
I've been using minio for local dev but that version is unmaintained now. However, I was put off by the minimum requirements for garage listed on the page -- does it really need a gig of RAM?
It does not, at least not for a small local dev server. I believe RAM usage should be around 50-100MB, increasing if you have many requests with large objects.
The current latest Minio release that is working for us for local development is now almost a year old and soon enough we will have to upgrade. Curious what others have replaced it with that is as easy to set up and has a management UI.
I think that's part of the pitch here... swapping out Minio for Garage. Both scale a lot more than for just local development, but local dev certainly seems like a good use-case here.
That's not something you can do reliably in software, datacenter grade NVMe drives come with power loss protection and additional capacitors to handle that gracefully. If power is cut at the wrong moment the partition may not be mountable afterwards otherwise.
If you really live somewhere with frequent outages, buy an industrial drive that has a PLP rating. Or get a UPS, they tend to be cheaper.
Isn't that the entire point of write-ahead logs, journaling file systems, and fsync in general? A roll-back or roll-forward due to a power loss causing a partial write is completely expected, but surely consumer SSDs wouldn't just completely ignore fsync and blatantly lie that the data has been persisted?
As I understood it, the capacitors on datacenter-grade drives are to give it more flexibility, as it allows the drive to issue a successful write response for cached data: the capacitor guarantees that even with a power loss the write will still finish, so for all intents and purposes it has been persisted, so an fsync can return without having to wait on the actual flash itself, which greatly increases performance. Have I just completely misunderstood this?
you actually don't need capacitors for rotating media, Western Digital has a feature called "ArmorCache" that uses the rotational energy in the platters to power the drive long enough to sync the volatile cache to a non volatile storage.
If the drives continue to have power, but the OS has crashed, will the drives persist the data once a certain amount of time has passed? Are datacenters set up to take advantage of this?
> will the drives persist the data once a certain amount of time has passed
Yes, otherwise those drives wouldn't work at all and would have a 100% warranty return rate. The reason they get away with it is that the misbehavior is only a problem in a specific edge-case (forgetting data written shortly before a power loss).
I tried it recently. Uploaded around 300 documents (1GB) and then went to delete them. Maybe my client was buggy, because the S3 service inside the container crashed and couldn't recover - I had to restart it. It's a really cool project, but I wouldn't really call it "reliable" from my experience.
It's beautiful from an artistic point of view but also rather hard to read and probably not very accessible (haven't checked it, though, since I'm on my phone).
Works perfectly on an iphone. I can't attest to the accessibility features, but the aesthetic is absolutely wonderful. Something I love, and went for on my own portfolio/company website... this is executed 100x better tho, clearly a labor of love and not 30 minutes of shitting around in vi.
I'm not sure if it even has any sort of cluster consensus algorithm? I can't imagine it not eating committed writes in a multi-node deployment.
Garage and Ceph (well, radosgw) are the only open source S3-compatible object storage which have undergone serious durability/correctness testing. Anything else will most likely eat your data.
Garage looks really nice: I've evaluated it with test code and benchmarks and it looks like a winner. Also, very straightforward deployment (self contained executable) and good docs.
But no tags on objects is a pretty big gap, and I had to shelve it. If Garage folk see this: please think on this. You obviously have the talent to make a killer application, but tags are table stakes in the "cloud" API world.
I love garage. I think it has applications beyond the standard self host s3 alternative.
It's a really cool system for hyper converged architecture where storage requests can pull data from the local machine and only hit the network when needed.
I was looking at using this on an LTO tape library, it seems the only resiliency is through replication, but this was my main concern with this project, what happens with HW goes bad
If you have replication, you can lose one of the replica, that's the point. This is what Garage was designed for, and it works.
Erasure coding is another debate, for now we have chosen not to implement it, but I would personally be open to have it supported by Garage if someone codes it up.
I use Syncthing a lot. Is Garage only really useful if you specifically want to expose an S3 drop in compatible API, or does it also provide other benefits over syncthing?
Syncthing will synchronize a full folder between an arbitrary number of machines, but you still have to access this folder one way or another.
Garage provides an HTTP API for your data, and handles internally the placement of this data among a set of possible replica nodes. But the data is not in the form of files on disk like the ones you upload to the API.
Syncthing is good for, e.g., synchronizing your documents or music collection between computers. Garage is good as a storage service for back-ups with e.g. Restic, for media files stored by a web application, for serving personal (static) web sites to the Internet. Of course, you can always run something like Nextcloud in front of Garage and get folder synchronization between computers somewhat like what you would get with Syncthing.
But to answer your question, yes, Garage only provides a S3-compatible API specifically.
Losing a node is a regular occurrence, and a scenario for which Garage has been designed.
The assumption Garage makes, which is well-documented, is that of 3 replica nodes, only 1 will be in a crash-like situation at any time. With 1 crashed node, the cluster is still fully functional. With 2 crashed nodes, the cluster is unavailable until at least one additional node is recovered, but no data is lost.
In other words, Garage makes a very precise promise to its users, which is fully respected. Database corruption upon power loss enters in the definition of a "crash state", similarly to a node just being offline due to an internet connection loss. We recommend making metadata snapshots so that recovery of a crashed node is faster and simpler, but it's not required per se: Garage can always start over from an empty database and recover data from the remaining copies in the cluster.
To talk more about concrete scenarios: if you have 3 replicas in 3 different physical locations, the assumption of at-most one crashed node is pretty reasonable, it's quite unlikely that 2 of the 3 locations will be offline at the same time. Concerning data corruption on a power loss, the probability to lose power at 3 distant sites at the exact same time with the same data in the write buffers is extremely low, so I'd say in practice it's not a problem.
Of course, this all implies a Garage cluster running with 3-way replication, which everyone should do.
One really useful usecase for Garage for me has been data engineering scripts. I can just use the S3 integration that every tool has to dump to garage and then I can more easily scale up to cloud later.
Read-after-write consistency : yes (after PutObject has finished, the object will be immediately visible in all subsequent requests, including GetObject and ListObjects)
Conditionnal writes : no, we can't do it with CRDTs, which are the core of Garage's design.
> For the metadata storage, Garage does not do checksumming and integrity verification on its own, so it is better to use a robust filesystem such as BTRFS or ZFS. Users have reported that when using the LMDB database engine (the default), database files have a tendency of becoming corrupted after an unclean shutdown (e.g. a power outage), so you should take regular snapshots to be able to recover from such a situation.
It seems like you can also use SQLite, but a default database that isn't robust against power failure or crashes seems suprising to me.
If you really live somewhere with frequent outages, buy an industrial drive that has a PLP rating. Or get a UPS, they tend to be cheaper.
As I understood it, the capacitors on datacenter-grade drives are to give it more flexibility, as it allows the drive to issue a successful write response for cached data: the capacitor guarantees that even with a power loss the write will still finish, so for all intents and purposes it has been persisted, so an fsync can return without having to wait on the actual flash itself, which greatly increases performance. Have I just completely misunderstood this?
https://documents.westerndigital.com/content/dam/doc-library...
Unfortunately they do: https://news.ycombinator.com/item?id=38371307
Yes, otherwise those drives wouldn't work at all and would have a 100% warranty return rate. The reason they get away with it is that the misbehavior is only a problem in a specific edge-case (forgetting data written shortly before a power loss).
https://www.repoflow.io/blog/benchmarking-self-hosted-s3-com... was useful.
RustFS also looks interesting but for entirely non-technical reasons we had to exclude it.
Anyone have any advice for swapping this in for Minio?
https://github.com/versity/versitygw
I am also curious how Ceph S3 gateway compares to all of these.
Able/willing to expand on this at all? Just curious.
I'm not sure if it even has any sort of cluster consensus algorithm? I can't imagine it not eating committed writes in a multi-node deployment.
Garage and Ceph (well, radosgw) are the only open source S3-compatible object storage which have undergone serious durability/correctness testing. Anything else will most likely eat your data.
> Beijing Address: Area C, North Territory, Zhongguancun Dongsheng Science Park, No. 66 Xixiaokou Road, Haidian District, Beijing
> Beijing ICP Registration No. 2024061305-1
Garage looks really nice: I've evaluated it with test code and benchmarks and it looks like a winner. Also, very straightforward deployment (self contained executable) and good docs.
But no tags on objects is a pretty big gap, and I had to shelve it. If Garage folk see this: please think on this. You obviously have the talent to make a killer application, but tags are table stakes in the "cloud" API world.
It's a really cool system for hyper converged architecture where storage requests can pull data from the local machine and only hit the network when needed.
Erasure coding is another debate, for now we have chosen not to implement it, but I would personally be open to have it supported by Garage if someone codes it up.
Syncthing will synchronize a full folder between an arbitrary number of machines, but you still have to access this folder one way or another.
Garage provides an HTTP API for your data, and handles internally the placement of this data among a set of possible replica nodes. But the data is not in the form of files on disk like the ones you upload to the API.
Syncthing is good for, e.g., synchronizing your documents or music collection between computers. Garage is good as a storage service for back-ups with e.g. Restic, for media files stored by a web application, for serving personal (static) web sites to the Internet. Of course, you can always run something like Nextcloud in front of Garage and get folder synchronization between computers somewhat like what you would get with Syncthing.
But to answer your question, yes, Garage only provides a S3-compatible API specifically.
The assumption Garage makes, which is well-documented, is that of 3 replica nodes, only 1 will be in a crash-like situation at any time. With 1 crashed node, the cluster is still fully functional. With 2 crashed nodes, the cluster is unavailable until at least one additional node is recovered, but no data is lost.
In other words, Garage makes a very precise promise to its users, which is fully respected. Database corruption upon power loss enters in the definition of a "crash state", similarly to a node just being offline due to an internet connection loss. We recommend making metadata snapshots so that recovery of a crashed node is faster and simpler, but it's not required per se: Garage can always start over from an empty database and recover data from the remaining copies in the cluster.
To talk more about concrete scenarios: if you have 3 replicas in 3 different physical locations, the assumption of at-most one crashed node is pretty reasonable, it's quite unlikely that 2 of the 3 locations will be offline at the same time. Concerning data corruption on a power loss, the probability to lose power at 3 distant sites at the exact same time with the same data in the write buffers is extremely low, so I'd say in practice it's not a problem.
Of course, this all implies a Garage cluster running with 3-way replication, which everyone should do.
Conditionnal writes : no, we can't do it with CRDTs, which are the core of Garage's design.
[0] https://git.deuxfleurs.fr/Deuxfleurs/garage/issues/1052
[1] https://github.com/Barre/ZeroFS
this is the reliability question no?
https://archive.fosdem.org/2024/schedule/event/fosdem-2024-3...
Slides are available here:
https://git.deuxfleurs.fr/Deuxfleurs/garage/src/commit/4efc8...