A faster path to container images in Bazel

(tweag.io)

52 points | by malt3 6 days ago

6 comments

  • mgaunard 42 minutes ago
    My experience is that anything involving Bazel is slow, bloated, and complicated, hammers your disk, copies your files ten times over, and balloons your disk usage without ever collecting the garbage. A lot of essential features are missing so you realistically have to build a lot of custom rules if not outright additional tooling on top.

    I'm not too surprised that out of the box docker images exhibit more of this. While it's good they're fixing it, it feels like maybe some of the core concepts cause pretty systematic issues anytime you try to do anything beyond the basic feature set...

    • paulddraper 7 minutes ago
      To be clear, when you say “they’re fixing this”…the Bazel maintainers have nothing to do with this.

      Bazel is a general purpose tool like Make. But with caching and sandboxing.

      Make is no less focused on Docker than Bazel is.

      Unlike Make however, Bazel does make it easy to share rule sets.

      But you don’t need to use other people’s Bazel rule sets any more than you need to use other people’s Make recipes.

      This author has a clever way to minimize needing to touch layers at all.

  • paulddraper 13 minutes ago
    This is smart.

    Container layers are so large that moving them around is heavy.

    So defer that part for the non-hermetic push/load parts of the process, while retaining heremticity/reproducibility.

    You can sort of think of it like the IO monad in Haskell…defer it all until the impure end.

  • cyberax 3 hours ago
    I'm struggling with the caching right now. I'm trying to switch from the Github actions to just running stuff in containers, and it works. Except for caching.

    Buildkit from Docker is just a pure bullshit design. Instead of the elegant layer-based system, there's now two daemons that fling around TAR files. And for no real reason that I can discern. But the worst thing is that the caching is just plain broken.

    • klysm 2 hours ago
      The layers are tar files, I’m confused what behavior you actually want that isn’t supported.
      • cyberax 2 hours ago
        The original Docker (and the current Podman) created each layer as an overlay filesystem. So each layer was essentially an ephemeral container. If a build failed, you could actually just run the last successful layer with a shell and see what's wrong.

        More importantly, the layers were represented as directories on the host system. So when you wanted to run something in the final container, Docker just needed to reassemble it.

        Buildkit has broken all of it. Now building is done, essentially, in a separate system, the "docker buildx" command talks with it over a socket. It transmits the context, and gets the result back as an OCI image that it then needs to unpack.

        This is an entirely useless step. It also breaks caching all the time. If you build two images that differ only slightly, the host still gets two full OCI artifacts, even if two containers share most of the layers.

        It looks like their Bazel infrastructure optimized it by moving caching down to the file level.

        • cpuguy83 3 minutes ago
          Buildkit didn't break anything here except that it each individual build step is no longer exposed as a runnable image in docker. That was unfortunate, but you can actually have buildkit run a command in that filesystem these days, and buildx now even exposes a DAP interface.

          Buldkit is far more efficient than the old model.

    • paulddraper 17 minutes ago
      Huh?

      Each layer is a tarball.

      So build your tarballs (concurrently!), and then add some metadata to make an image.

      From your comment elsewhere it seems maybe you are expecting the docker build paradigm of running a container and snapshotting it at various stages.

      That is messy and has a number of limitations — not the least of which is cross-compilation. Reproducibility being another. But in any case, that definitely not what these rules are trying to do.

  • jeffbee 3 hours ago
    Funny that the article only obliquely references the compression issues. The OCI users that I have seen are using gzip due to inertia, while zstd layers have been supported for a while and are radically faster.
  • forrestthewoods 3 hours ago
    Uhhh what? Isn’t the whole point of Bazel that it’s a monorepo with all dependencies so you don’t need effing docker just to build or run a bloody computer program?

    It drives me absolute batshit insane that modern systems are incapable of either building or running computer programs without docker. Everyone should profoundly embarrassed and ashamed by this.

    I’m a charlatan VR and gamedev that primarily uses Windows. But my deeply unpopular opinion is that windows is a significantly better dev environment and runtime environment because it doesn’t require all this Docker garbage. I swear that building and running programs does not actually have to be that complicated!! Linux userspace got pretty much everything related to dependencies and packages very very very wrong.

    I am greatly pleased and amused that the most reliable API for gaming in Linux is Win32 via Proton. That should be a clear signal that Linux userspace has gone off the rails.

    • jakewins 3 hours ago
      You’re converging a lot of ground here! The article is about producing container images for deployment, and have no relation to Bazels building stuff for you - if you’re not deploying as containers, you don’t need this?

      On Linux vs Win32 flame warring: can you be more specific? What specifically is very very wrong with Linux packaging and dependency resolution?

      • forrestthewoods 3 hours ago
        > The article is about producing container images for deployment

        Fair. Docker does trigger my predator drive.

        I’m pretty shocked that the Bazel workflow involves downloading Docker base images from external URLs. That seems very unbazel like! That belongs in the monorepo for sure.

        > What specifically is very very wrong with Linux packaging and dependency resolution?

        Linux userspace for the most part is built on a pool of global shared libraries and package managers. The theory is that this is good because you can upgrade libfoo.so just once for all programs on the system.

        In practice this turns into pure dependency hell. The total work around is to use Docker which completely nullifies the entire theoretic benefit.

        Linux toolchains and build systems are particularly egregious at just assuming a bunch of crap is magically available in the global search path.

        Docker is roughly correct in that computer programs should include their gosh darn dependencies. But it introduces so many layers of complexity that are solved by adding yet another layer. Why do I need estargz??

        If you’re going to deploy with Docker then you might as well just statically link everything. You can’t always get down to a single exe. But you can typically get pretty close!

        • dilyevsky 2 hours ago
          > I’m pretty shocked that the Bazel workflow involves downloading Docker base images from external URLs. That seems very unbazel like! That belongs in the monorepo for sure.

          Not every dependency in Bazel requires you to "first invent the universe" locally. Lots of examples of this like toolchains, git_repository, http_archive rules and on and on. As long as they are checksum'ed (as they are in this case) so that you can still output a reproducible artifact, I don't see the problem

          • carolosf 35 minutes ago
            Also it is possible to air gap bazel and provide files as long as they have the same checksum offline.
          • forrestthewoods 2 hours ago
            Everything belongs in version control imho. You should be able to clone the repo, yank the network cable, and build.

            I suppose a URL with checksum is kinda sorta equivalent. But the article adds a bunch of new layers and complexity to avoid “downloading Cuda for the 4th time this week”. A whole lot of problems don’t exist if they binary blobs exist directly in the monorepo and local blob store.

            It’s hard to describe the magic of a version control system that actually controls the version of all your dependencies.

            Webdev is notorious for old projects being hard to compile. It should be trivial to build and run a 10+ year old project.

            • dilyevsky 2 hours ago
              Making heavy use of mostly remote caches and execution was one of the original design goals of Blaze (Google's internal version) iirc in an effort to reduce build time first and foremost. So kind of the opposite of what you're suggesting. That said, fully air-gapped builds can still be achieved if you just host all those cache blobs locally.
              • forrestthewoods 1 hour ago
                > So kind of the opposite of what you're suggesting.

                I don’t think they’re opposites. It seems orthogonal to me.

                If you have a bunch of remote execution workers then ideally they sit idle on a full (shallow) clone of the repo. There should be no reason to reset between jobs. And definitely no reason to constantly refetch content.

  • odie5533 3 hours ago
    Awful AI images everywhere. Can we not help ourselves?
    • CBLT 3 hours ago
      Is my adblocker blocking them? I only saw the stack of tars in a coat. Didn't break the article's flow for me.
      • comex 2 hours ago
        I also only saw that, but the text feels a bit fluffed out by AI as well, if I’m not mistaken.
        • Xophmeister 2 hours ago
          It’s not. It’s been through several editing rounds. (I was one of the editors.) In theory, we don’t have a problem with AI generated content if it meets our high editorial requirements, but all Tweag technical blogs go through a rigorous, manual review and editing process to keep standards high.
          • slekker 1 hour ago
            As I've read through the post, seeing phrases like "Why this matters for performance", usage of em-dashes and lists/bullet points, screams AI written to me. I appreciate you saying it wasn't, but such is the fate of who wrote this to write like LLMs do nowadays. I also liked to use em-dashes and bullet lists but am consciously avoiding them now.