Interview with Kent Overstreet (Bcachefs) [audio]

(linuxunplugged.com)

60 points | by teekert 4 days ago

10 comments

  • LeFantome 17 hours ago
    Whenever Phoronix runs benchmarks, bcachefs does very poorly. It often gets attributed to the small 512 bytes instead of 4096 other file systems use.

    Why is 512 the default and, if 4096 is better, why is this not the default instead?

    • koverstreet 16 hours ago
      The default should be the device's native blocksize, and some devices misreport. You also lose performance if you use a larger blocksize than necessary.

      If we can, I'd like to get a quirks list in place, but there have been higher priorities.

      • ajb 1 hour ago
        Do each of the other filesystems have their own quirks list? That seems suboptimal. Oh, I guess it's because it's in the user space mkfs tool of each, not the kernel.
        • koverstreet 55 minutes ago
          ZFS is the only filesystem I know of with one, and theirs is pretty incomplete. It does need to be a shared project.
  • tombert 19 hours ago
    I didn't realize that Linux Unplugged was still going; I haven't followed anything in the Jupiter Broadcasting sphere in almost a decade.

    I'll queue this up to listen though I'm always afraid because I think Chris has gone pretty deep into conservative politics and that has the tendency of really pissing me off (though he's not as bad as Lunduke).

  • koverstreet 23 hours ago
    Oh cool.

    Happy to answer questions about all things bcachefs or what-have-you.

    just please no more questions about whether or not bcachefs will be in the kernel, I've been asked that enough :)

    • throwawaypath 23 hours ago
      Since bcachefs no longer mainlined (DKMS) and therefore sits in the same class as ZFS, why would/should I migrate from ZFS to bcachefs?
      • ThatPlayer 17 hours ago
        The tiered drive setup is more featured if you're interested in combining SSD+HDDs. Unlike other cache drive setups like ZFS's L2ARC/SLOG and even bcache, the SSD drives in this setup are still usable space. Otherwise I wouldn't want to use a 1TB SSD as a cache in front of a 4TB HDD for example.

        Another use case would be on a handheld like Steam Deck: internal SSD tiered with a microSD for unused games I keep installed 'just in case'.

        I would even want it on my home NAS. Instead of a separate root SSD and all my files on an HDD ZFS array (maybe with smaller cache SSDs), combine it all into a single bcachefs filesystems. Maybe with a subvolume root that stays pinned to the SSD(s). And also being able to use the full capabilities of different size drive unlike ZFS because it's a home NAS for cheap. No erasure coding yet though, so I'm in no rush to migrate my home NAS

      • infinet 21 hours ago
        One reason is to use the Linux page cache instead of dedicating RAM to ZFS, given how expensive memory is now. I am very happy with MGLRU and won't miss ZFS's ARC.
    • BlackLotus89 23 hours ago
      Actually listened to the podcast before. Happy that everything with the kernel situation kinda seemed to work out for you.

      You kinda talked about ec already, but is there an ETA for resilvering?

      You were talking about Valve helping in a big way. Was this monetary or development work? If development I would be interested because a while ago I know you do mainly correctness and features right now, but on the phoronix forum you were talking about low hanging fruits for performance work. Was that something of interest to valve/is it something being done right now to make bcachefs a good fs for gaming (whatever that means...)

      • koverstreet 22 hours ago
        I'm hoping to have erasure coding done sometime in the first half of next year (knock on wood).

        While reconcile was getting done we got a detailed outline of where EC resilvering is going to plug into that, so it's not looking like a huge amount of work anymore - and there's been people testing EC and reporting the occasional bug, it's been looking pretty solid.

        We did some performance testing not too long ago, and it looked like we were in better shape than I thought. I'm still more interested in tracking down performance bug than shaving cycles and going for raw IOPS.

        And the userbase isn't complaining about performance at all, aside from the odd thing like accounting read being slow (just fixed a couple issues there) or lack of defrag.

        After debugging and stabilization, it's going to be more about usability, fleshing out missing features, more integration work (there's some systemd integration that needs to happen in the mount path), telemetry/introspection improvements (I want all the data I can get for stabilization, and json reporting would be good for lots of things).

        So, if you're asking if you can help, that's a decent list to start from :)

        • BlackLotus89 22 hours ago
          Oh I'm already active on irc and still have to send you a few things, but sure always eager to oblige.
        • LeFantome 21 hours ago
          Hoping systemd will remain optional
          • koverstreet 21 hours ago
            yes, it will. But we do want to communicate properly with systemd and let the user know what's going on if mount has to take awhile because of some sort of recovery (instead of timing out), and various other things.

            related, plymouth integration to let users know when their machine is booting up if a drive or the filesystem is unhealthy

    • commandersaki 23 hours ago
      This isn't really a question.

      I love the idea of bcachefs, it gives a lot of the features of btrfs but includes encryption which means no luks song and dance. But having played around it on my laptop and raspberry pi(s), as root filesystem, it just can't be trusted at the moment. I can't remember the exact problem but I ran into bugs jumping to a new version of the kernel where bcachefs stopped working, and having to downgrade but then the format had changed (I think I caused this), and I was just in a completely broken state. I really wanted to figure it out, but after contemplating after the fact, I just don't want to deal with those kind of headaches for now.

      I want to be able to use it in a way that I can rely on it for say the next 10 or 20 years, but it just isn't in that state. I can only feel comfortable using it on data or systems that I am not vested in.

      • koverstreet 22 hours ago
        How long ago was this?

        We've been cranking through bugs fast, and there are still bug reports coming in but the severity and frequency has declined drastically, while the userbase has gone up; polling the userbase it's been stabilizing fast.

        But we won't really know we're there until we're there, so the main thing I can say is: if you report a bug like that, it'll get looked at fast; the debugging tools are top notch.

        • commandersaki 14 hours ago
          I had discussed on the OFTC IRC channel at the time which looks like around 2025-06-09, the last issue seemed to have been nasty. I think while you said it was fixed, I couldn't un-eff my specific situation, and I think I gave up.

          I am grateful that you make accessing support quite easy by being available on the #bcache IRC channel with a lot of community support as well, but it is sometimes hard to fix these issues -- in my case I was usually in a VGA console without network access, so I couldn't simply export information/logs/diagnostics to show you without pulling out my phone camera etc. and that becomes a bit tedious in itself. It is partly my fault for using bcachefs for the root filesystem, with encryption etc. but I also knew what I was in for and I wanted to help provide the feedback and experience needed to help out.

          It is just that, after awhile, I felt like I kept running into issue after issue, and I kind of just gave up. I do run bcachefs for a secondary drive that is used for storage purpose and it has been great. But yeah, I think running as root fs is just a scary proposition, especially if you don't want to put in the hard yards to diagnose and fix issues as they come and be on top of them. I used Arch, so I was at the time getting the latest version of bcachefs and upgrading constantly.

          • koverstreet 6 hours ago
            That does happen sometimes. But look on the bright side; a lot of people are getting crash courses in low level systems debugging, and those are skills that are not as common as they used to be - but they're still important.

            If you look at the field of filesystem developers and kernel developers, we don't have nearly as many young people getting involved and learning this stuff as we used to, and that's a problem. We need a pipeline of people building deep expertise, and if even a tiny fraction of the people getting involved with the bcachefs community start developing an interest and learning this stuff, that's a success.

            Six months ago was also still a very hectic time for debugging and stabilization, it's definitely gotten better.

      • rolandog 22 hours ago
        What I've heard is that Kent is very proactive in listening to any and all bug reports to chase down root causes of issues like yours. I'm sure that any information you send his way to try to reproduce the issue would be helpful.
    • lastpasstwitter 22 hours ago
      Thanks for answering! Can you share some recent benchmarks comparing bcachefs vs zfs vs extra...etc?
    • sho_hn 23 hours ago
      Well, do you think it will be in mainstream distributions?
      • koverstreet 22 hours ago
        Do Arch and NixOS count? We're in the core package repositories for them, and have packages available for a list of others.

        We're not aiming to be in GUI installers yet, that'll be sometime after taking the experimental label off. We're still going slow and steady; I don't think about doing things that will bring in more users until incoming bug reports are dead quiet (or as close to it as they ever get), and the userbase has been going up plenty fast all on its own by the activity I see.

        So, sometime next year we'll be working on distro stuff again. Dunno when, I expect another spike in new users and bug reports when I take the experimental label off.

        • evil-olive 20 hours ago
          now that 6.18 is the new LTS kernel, will I have a good experience with bcachefs if I stay on that LTS kernel instead of tracking newer stable kernel versions?

          I currently run NixOS with ZFS-on-root, and because ZFS is also out-of-tree, the "stable" ZFS version in nixpkgs isn't always compatible with the most recent stable kernel. to keep things simple I tend to just stick with the LTS kernels.

          previously when I've tried to experiment with bcachefs on NixOS I ran into a catch-22 where I needed to upgrade to a newer kernel to get bcachefs support but doing so wouldn't be compatible with ZFS.

          • koverstreet 17 hours ago
            yeah, we'll be supporting 6.18 ongoing.
    • tombert 19 hours ago
      Forgive a bit of ignorance on this as it might be a dumb question, but now that bcachefs is a kernel module and not part of the kernel directly, is it still realistic for people to run bcachefs as their root filesystem? Do you know anyone doing this?
      • baobun 38 minutes ago
        Just looking at that factor should be about as realistic as running ZFS (very).
      • koverstreet 19 hours ago
        Distros generally build everything they can as modules these days, including filesystems. No reason not too, we've had initramfs since forever; you can't build everything in that anyone might need to boot their machine.

        As long as the testing pipelines are in place to make sure the dkms module builds on every distro configuration (a good chunk of that is still manual, but there's a project to improve the test infrastructure) - in practice, no one will notice.

        I wouldn't have noticed the DKMS switch on my NixOS laptop if I didn't know it was happening.

      • LeFantome 18 hours ago
        bcachefs was always a module. You don’t want it in your kennel if you are not using it. The difference is that it used to ship in the mainline source code and be built as a module that was already built and on your drive.

        If you build bcachefs as a module yourself (via DMKS or directly), it works the same as if you got it with your distro.

        If you use bcachefs as root, the danger is booting with a kernel that lacks the module.

        I hate that bcachefs is not in the kernel, and my primary distro does not use DKMS. But, if you can get a module built, there is no loss of functionality or performance.

    • razighter777 21 hours ago
      Hi,

      I listened to the podcast it was interesting.

      Gonna throw some questions you may or may not have gotten.

      Are special devices like metadata or write-ahead log devices on the roadmap? Or distributed raid / other exotic raid types?

      It would be interesting to hear your thoughts on these.

      What do you think zfs got right with this and what did they get wrong?

      • koverstreet 6 hours ago
        You can do that now via the data_allowed parameter

        ZFS did a bunch of stuff right, it's just a much older design; pre-extents, and based on the original Unix filesystem design - filesystem as a database was still unproven at the time.

        They were just working incrementally, which for the amount of new features ZFS already had was the smart decision at the time.

    • nona 21 hours ago
      I was hoping to use bcachefs to have one pool with subvolumes for root (encrypted by tpm), and for the home folders (also encrypted but with different keys, for example for systemd-homed use).

      Any chance for different encryption keys per subvolume?

      • koverstreet 21 hours ago
        It'll happen eventually, it's been a frequently asked for feature.
    • brendoncarroll 23 hours ago
      I'm a happy bcachefs user. Haven't had any issues on a simple mirrored array, which I've been running since before it was in (and out) of the kernel. It's the best filesystem in 2025. Thank you for all your work.

      What is the status of scrub? Are there any technical barriers to implementing it, or is it just prioritization at this point? FWIW I think there are probably a lot of sysadmin types who would move over to bcachefs if scrub was implemented. I know there are other cooler features like RS and send/receive, but those probably aren't blocking many from switching over.

      • koverstreet 22 hours ago
        Scrub went in in 6.15. I think 6.17 might've been when it was fully solid; it took a bit for some bugs to shake out in the self healing paths.
    • octoberfranklin 22 hours ago
      The Linux kernel has well-defined internal interfaces for character streams, block devices, block-erase devices (mtd), and extent devices (LVM).

      Has it been considered to have an official (but not exposed to userspace) "btree device" interface?

      The idea being that you could write composable wrappers for btree devices the way you can write composable wrappers for block devices (dmsetup, etc). And have a common interface for these kinds of devices -- the kernel has at least two large and well-developed btree-on-a-block-device implementations (bcache/bcachefs and btrfs). Both of these implementations have been criticized as being quite monolithic and not as unixy ("many small sharp tools") as LVM/dmsetup are.

    • cyberax 21 hours ago
      I'd love to have configurable tiered storage with delayed migration. To let the spinning rust drives stay off in deep sleep for days, unless the frontend caches don't have the data.

      Sorry. Not a question, just a feature request.

  • LeFantome 18 hours ago
    Is the on disk format likely to change again. Or can we expect that to remain stable?
    • koverstreet 17 hours ago
      Still changing the on disk format as required, but we're at the point now where the end user impact should be negligible - and we aren't doing big changes.

      Just after reconcile, I landed a patch series to automatically run recovery passes in the background if they (and all dependents) can be run online; this allows the 1.33 upgrade to run in the background.

      And with DKMS, users aren't having to run old versions (forcing a downgrade) if they have to boot into an old kernel. That was a big support issue in the past, users would have to run old unsupported versions because of other kernel bugs (amdgpu being the most common offender).

  • typpilol 20 hours ago
    What's the end goal for bacachefs?
  • senectus1 20 hours ago
    this was a really cool interview.

    kinda makes me want to rebuild part of my homelab with bcacheFS

  • throw7 21 hours ago
    Unfortunately there doesn't seem to be any questions and answers about why bcachefs isn't in the kernel? It was but now it isn't. There was some hemming and hawwing about "testing"?
    • bitwize 21 hours ago
      Kent wasn't willing to play ball with how kernel dev gets done because of uh, personality differences, so he was booted from the mainline kernel. Linus's house, Linus's rules.
      • koverstreet 20 hours ago
        I'm here to talk filesystems and technical topics, not to take part in or stir up drama. There's been more than enough of that.

        This is hacker news, not drama queen news :)

        • wkat4242 20 hours ago
          Also the matter has been discussed here in detail when it broke the news a couple months ago so yeah focusing more on the technical merit is much more interesting IMO
        • RGBCube 18 hours ago
          > This is hacker news, not drama queen news

          Same thing.

        • dralley 18 hours ago
          Nonetheless, I do hope that when development slows down and time has passed, you get it upstreamed again.
        • bitwize 16 hours ago
          I was just trying to fill in a chap with a couple-sentence summary of what happened, since they asked, without trying to poke a hornet's nest.
  • wkat4242 20 hours ago
    Too bad this is audio. I have no patience for that (ADHD). There's no writeup anywhere is there? I don't see it.