JuiceFS is a distributed POSIX file system built on top of Redis and S3

(github.com)

58 points | by tosh 2 hours ago

6 comments

wgjordan 1 hour ago
Related, "The Design & Implementation of Sprites" [1] (also currently on the front page) mentioned JuiceFS in its stack:
> The Sprite storage stack is organized around the JuiceFS model (in fact, we currently use a very hacked-up JuiceFS, with a rewritten SQLite metadata backend). It works by splitting storage into data (“chunks”) and metadata (a map of where the “chunks” are). Data chunks live on object stores; metadata lives in fast local storage. In our case, that metadata store is kept durable with Litestream. Nothing depends on local storage.
[1] https://news.ycombinator.com/item?id=46634450
jeffbee 2 minutes ago
It is not clear that pjdfstest establishes full POSIX semantic compliance. After a short search of the repo I did not see anything that exercises multiple unrelated processes atomically writing with O_APPEND, for example. And the fact that their graphic shows applications interfacing with JuiceFS over NFS and SMB casts further doubt, since both of those lack many POSIX semantic properties.
Over the decades I have written test harnesses for many distributed filesystems and the only one that seemed to actually offer POSIX semantics was LustreFS, which, for related reasons, is also an operability nightmare.
willbeddow 1 hour ago
Juice is cool, but tradeoffs around which metadata store you choose end up being very important. It also writes files in it's own uninterpretable format to object storage, so if you lose the metadata store, you lose your data.
When we tried it at Krea we ended up moving on because we couldn't get sufficient performance to train on, and having to choose which datacenter to deploy our metadata store on essentially forced us to only use it one location at a time.
[-]
- AdamJacobMuller 20 minutes ago
  > It also writes files in it's own uninterpretable format to object storage, so if you lose the metadata store, you lose your data.
  That's so confusing to me I had to read it five times. Are you saying you lose the metadata, or that the underlying data is actually mangled or gone, or merely that you lose the metadata?
  One of the greatest features of something like this to me would be the ability to durable even beyond JuiceFS access to my data in a bad situation. Even if JuiceFS totally messes up, my data is still in S3 (and with versioning etc even if juicefs mangles or deletes my data, still). So odd to design this kind of software and lose this property.
  [-]
  - mrkurt 9 minutes ago
    It backs its metadata up to S3. You do need metadata to map inodes / slices / chunks to s3 objects, though.
    Tigris has a one-to-one FUSE that does what you want: https://github.com/tigrisdata/tigrisfs
- tptacek 59 minutes ago
  I'm betting this is on the front page today (as opposed to any other day; Juice is very neat and doesn't need us to hype it) because of our Sprites post, which goes into some detail about how we use Juice (for the time being; I'm not sure if we'll keep it this way).
  The TL;DR relevant to your comment is: we tore out a lot of the metadata stuff, and our metadata storage is SQLite + Litestream.io, which gives us fast local read/write, enough systemwide atomicity (all atomicity in our setting runs asymptotically against "someone could just cut the power at any moment"), and preserves "durably stored to object storage".
Plasmoid 2 hours ago
I was actually looking at using this to replace our mongo disks so we could easily cold store our data
IshKebab 1 hour ago
Interesting. Would this be suitable as a replacement for NFS? In my experience literally everyone in the silicon design industry uses NFS on their compute grid and it sucks in numerous ways:
* poor locking support (this sounds like it works better)
* it's slow
* no manual fence support; a bad but common way of distributing workloads is e.g. to compile a test on one machine (on an NFS mount), and then use SLURM or SGE to run the test on other machines. You use NFS to let the other machines access the data... and this works... except that you either have to disable write caches or have horrible hacks to make the output of the first machine visible to the others. What you really want is a manual fence: "make all changes to this directory visible on the server"
* The bloody .nfs000000 files. I think this might be fixed by NFSv4 but it seems like nobody actually uses that. (Not helped by the fact that CentOS 7 is considered "modern" to EDA people.)
[-]
- jabl 18 minutes ago
  > poor locking support (this sounds like it works better)
  File locking on Unix is in general a clusterf*ck. (There was a thread a few days ago at https://news.ycombinator.com/item?id=46542247 )
  > no manual fence support; a bad but common way of distributing workloads is e.g. to compile a test on one machine (on an NFS mount), and then use SLURM or SGE to run the test on other machines. You use NFS to let the other machines access the data... and this works... except that you either have to disable write caches or have horrible hacks to make the output of the first machine visible to the others. What you really want is a manual fence: "make all changes to this directory visible on the server"
  In general, file systems make for poor IPC implementations. But if you need to do it with NFS, the key is to understand the close-to-open consistency model NFS uses, see section 10.3.1 in https://www.rfc-editor.org/rfc/rfc7530#section-10.3 . Of course, you'll also want some mechanism for the writer to notify the reader that it's finished, be it with file locks, or some other entirely different protocol to send signals over the network.
- mrkurt 55 minutes ago
  FUSE is full of gotchas. I wouldn't replace NFS with JuiceFS for arbitrary workloads. Getting the full FUSE set implemented is not easy -- you can't use sqlite on JuiceFS, for example.
  The meta store is a bottleneck too. For a shared mount, you've got a bunch of clients sharing a metadata store that lives in the cloud somewhere. They do a lot of aggressive metadata caching. It's still surprisingly slow at times.
  [-]
  - huntaub 48 minutes ago
    > FUSE is full of gotchas
    I want to go ahead and nominate this for the understatement of the year. I expect that 2026 is going to be filled with people finding this out the hard way as they pivot towards FUSE for agents.
    [-]
    - dpe82 44 minutes ago
      Mind helping us all out ahead of time by expanding on what kind of gotchas FUSE is full of?
      [-]
      - huntaub 34 minutes ago
        It depends on what level of FUSE you're working with.
        If you're running a FUSE adapter provided by a third party (Mountpoint, GCS FUSE), odds are that you aren't going to get great performance because it's going to have to run across a network super far away to work with your data. To improve performance, these adapters need to be sure to set fiddly settings (like using Kernel-side writeback caching) to avoid the penalty of hitting the disk for operations like write.
        If you're trying to write a FUSE adapter, it's up to you to implement as much of the POSIX spec that you need for the programs that you want to run. The requirements per-program are often surprising. Want to run "git clone", then you need to support the ability to unlink a file from the file system and keep its data around. Want to run "vim", you need the ability to do renames and hard links. All of this work needs to happen in-memory in order to get the performance that applications expect from their file system, which often isn't how these things are built.
        Regarding agents in particular, I'm hopeful that someone (which is quite possibly us), builds a FUSE-as-a-service primitive that's simple enough to use that the vast majority of developers don't have to worry about these things.
        [-]
        IshKebab 17 minutes ago
        > you need to support the ability to unlink a file from the file system and keep its data around. Want to run "vim", you need the ability to do renames and hard links
        Those seem like pretty basic POSIX filesystem features to be fair. Awkward, sure... there's also awkwardness like symlinks, file locking, sticky bits and so on. But these are just things you have to implement. Are there gotchas that are inherent to FUSE itself rather than FUSE implementations?
- huntaub 1 hour ago
  > * The bloody .nfs000000 files. I think this might be fixed by NFSv4 but it seems like nobody actually uses that. (Not helped by the fact that CentOS 7 is considered "modern" to EDA people.)
  Unfortunately, NFSv4 also has the silly rename semantics...
  [-]
  - jabl 28 minutes ago
    AFAIU the NFSv4 protocol in principle allows implementing unlinking an open file without silly rename, but the Linux client still does the silly rename dance.
Eikon 1 hour ago
ZeroFS [0] outperforms JuiceFS on common small file workloads [1] while only requiring S3 and no 3rd party database.
[0] https://github.com/Barre/ZeroFS
[1] https://www.zerofs.net/zerofs-vs-juicefs
[-]
- huntaub 1 hour ago
  Respect to your work on ZeroFS, but I find it kind of off-putting for you to come in and immediately put down JuiceFS, especially with benchmark results that don't make a ton of sense, and are likely making apples-to-oranges comparisons with how JuiceFS works or mount options.
  For example, it doesn't really make sense that "92% of data modification operations" would fail on JuiceFS, which makes me question a lot of the methodology in these tests.
  [-]
  - Eikon 1 hour ago
    > but I find it kind of off-putting for you to come in and immediately put down JuiceFS, especially with benchmark results that don't make a ton of sense, and are likely making apples-to-oranges comparisons with how JuiceFS works or mount options.
    The benchmark suite is trivial and opensource [1].
    Is performing benchmarks “putting down” these days?
    If you believe that the benchmarks are unfair to juicefs for a reason or for another, please put up a PR with a better methodology or corrected numbers. I’d happily merge it.
    EDIT: From your profile, it seems like you are running a VC backed competitor, would be fair to mention that…
    [1] https://github.com/Barre/ZeroFS/tree/main/bench
    [-]
    - wgjordan 1 hour ago
      > The benchmark suite is trivial and opensource.
      The actual code being benchmarked is trivial and open-source, but I don't see the actual JuiceFS setup anywhere in the ZeroFS repository. This means the self-published results don't seem to be reproducible by anyone looking to externally validate the stated claims in more detail. Given the very large performance differences, I have a hard time believing it's an actual apples-to-apples production-quality setup. It seems much more likely that some simple tuning is needed to make them more comparable, in which case the takeaway may be that JuiceFS may have more fiddly configuration without well-rounded defaults, not that it's actually hundreds of times slower when properly tuned for the workload.
      (That said, I'd love to be wrong and confidently discover that ZeroFS is indeed that much faster!)
    - huntaub 1 hour ago
      Yes, I'm working in the space too. I think it's fine to do benchmarks, I don't think it's necessary to immediately post them any time a competitor comes up on HN.
      I don't want to see the cloud storage sector turn as bitter as the cloud database sector.
      I've previously looked through the benchmarking code, and I still have some serious concerns about the way that you're presenting things on your page.
      [-]
      - zaphirplane 29 minutes ago
        > presenting things
        I don’t have a dog in this race, have to say thou the vagueness of the hand waving in multiple comments is losing you credibility
- dpacmittal 37 minutes ago
  The magnitude of performance difference alone immediately makes me skeptical of your benchmarking methodology.
- maxmcd 24 minutes ago
  does having to maintain the slatedb as a consistent singleton (even with write fencing) make this as operationally tricky as a third party db?
  [-]
  - Eikon 22 minutes ago
    It’s not great UX on that angle. I am currently working on coordination (through s3, not node to node communication), so that you can just spawn instances without thinking about it.
- wgjordan 1 hour ago
  For a proper comparison, also significant to note that JuiceFS is Apache-2.0 licensed while ZeroFS is dual AGPL-3.0/commercial licensed, significantly limiting the latter's ability to be easily adopted outside of open source projects.
  [-]
  - anonymousDan 1 hour ago
    Why would this matter if you're just using the database?
    [-]
    - Eikon 1 hour ago
      It doesn’t, you are free to use ZeroFS for commercial and closed source products.
      [-]
      - wgjordan 53 minutes ago
        This clarification is helpful, thanks! The README currently implies a slightly different take, perhaps it could be made more clear that it's suitable for use unmodified in closed source products:
        > The AGPL license is suitable for open source projects, while commercial licenses are available for organizations requiring different terms.
        I was a bit unclear on where the AGPL's network-interaction clause draws its boundaries- so the commercial license would only be needed for closed-source modifications/forks, or if statically linking ZeroFS crate into a larger proprietary Rust program, is that roughly it?
        [-]
        wgjordan 9 minutes ago
        Also worth noting (as a sibling comment pointed out) that despite these assurances the untested legal risks of AGPL-licensed code may still cause difficulties for larger, risk-averse companies. Google notably has a blanket policy [1] banning all AGPL code entirely as "the risks outweigh the benefits", so large organizations are probably another area where the commercial license comes into play.
        [1] https://opensource.google/documentation/reference/using/agpl...
        Eikon 47 minutes ago
        > so the commercial license would only be needed for closed-source modifications/forks
        Indeed.
        andydang 31 minutes ago
        [dead]
- ChocolateGod 1 hour ago
  Let's remember that JuiceFS can be setup very easily to not have a single point of failure (by replicating the metadata engine), meanwhile ZeroFS seems to have exactly that.
  If I was a company I know which one I'd prefer.
- corv 1 hour ago
  Looks like the underdog beats it handily and easier deployment to boot. What's the catch?
  [-]
  - aeblyve 1 hour ago
    ZeroFS is a single-writer architecture and therefore has overall bandwidth limited by the box it's running on.
    JuiceFS scales out horizontally as each individual client writes/reads directly to/from S3, as long as the metadata engine keeps up it has essentially unlimited bandwidth across many compute nodes.
    But as the benchmark shows, it is fiddly especially for workloads with many small files and is pretty wasteful in terms of S3 operations, which for the largest workloads has meaningful cost.
    I think both have their place at the moment. But the space of "advanced S3-backed filesystems" is... advancing these days.