Here’s a demo video: https://www.youtube.com/watch?v=m6T5p0qXcFE&t=4s
Infrastructure as Code (IaC) workflows were designed to help developers and work fine for small teams, but as organizations scale, they create bottlenecks, complexity, and endless firefighting.
After decades in ops and platform engineering, we kept running into the same problems: brittle pipelines (Terraform variables and input checks don’t catch errors until it’s too late), poor compliance integration (issues are caught during CI/CD, but by then, delays and rework are already inevitable), and a patchwork of tools that developers are forced to learn (Terraform, Kubernetes manifests, cloud APIs), adding more work and turning them into junior devops engineers when they should be shipping value.
We were working on a side project and ended up doing the Spider-Man pointing-meme: two experienced ops guys, neither wanting to touch the infrastructure. We started asking why. The “boring parts” weren’t just boring—they were time-consuming and error-prone, especially at scale. What if we didn’t just automate provisioning but handled all the messy stuff (permissions, compliance, networking, security groups) upfront? That’s when we realized we could encode ops knowledge directly into modules and let developers work off those.
The hard part isn’t just technical, it’s socio-technical. Conway’s Law is inescapable, and nowhere is its impact more painful than at the intersection of development, operations, and cloud APIs. Everyone talks about cloud complexity, but the real challenge is navigating the messy intersection of tools, teams, and processes. Ops teams want control, devs want speed, and compliance creates friction between them.
The traditional answer has been patchwork solutions like GitOps or retroactive guardrails, but these tend to shift complexity around instead of eliminating it. This often results in an ever-expanding CI/CD toolchain, where teams must maintain complex workflows just to enforce policies and validate infrastructure, adding friction rather than reducing it. We think the real challenge is designing abstractions that are simple enough for developers but powerful enough for ops.
Massdriver lets ops teams define reusable infrastructure modules that handle everything: provisioning, permissions, compliance, and cost constraints. Developers don’t write IaC or navigate cloud APIs—they draw a diagram to describe their architecture, and Massdriver provisions resources using those modules. Compliance and security rules are enforced proactively, and ephemeral CI/CD pipelines are spun up automatically. The result: no more brittle pipelines or last-minute guardrails.
How it works:
(1) We turn IaC into functional modules: use tools like Terraform/OpenTofu, Helm, or CloudFormation to create reusable modules with built-in validation, policies, and metadata for visualization and self-service.
(2) Stop pushing IaC code through pipelines: Instead of managing configuration changes in Git repos, create modules as releases—packaged and ready to deploy. Each release bundles both the IaC and policy tooling (e.g., Checkov, OPA), so developers don’t have to copy and maintain separate workflows. These checks are enforced automatically as part of Massdriver’s ephemeral CI/CD process, making it impossible to bypass them.
(3) Self-service with APIs and visual tools: provision infrastructure by interacting with pre-approved modules, without dealing directly with low-level IaC code or brittle YAML pipelines.
Massdriver is live, and we’d love to hear your thoughts on our approach to the IaC problem. If you’re interested in learning more about how we simplify configuration management, check out our demo video—here's the link again: https://www.youtube.com/watch?v=m6T5p0qXcFE&t=4s
Thanks for reading. We’re excited to hear what you think!
And $500-1000/mo is way too much, that’s more than a 10 person company spends every month on their entire CRM.
By pricing yourselves so high what you’re telling people is hey we’re a startup and we don't expect to scale big, and what I’m thinking is hey maybe this is risky having our critical infra tied up with this startup.
* You want to simplify infrastructure, but there's a new learning curve here. Why did you decide to go with diagramming as a solution? What other methods did you evaluate and discard?
* How does an organization with existing infrastructure implement Massdriver?
* How do you handle edge cases, custom configurations, complex logic, etc.? For example, workflows that use custom scripts or some other form of band-aid.
* The visual approach could make it too easy to piece together infrastructure without understanding the implications. How do you prevent developers from creating poorly architected systems just because you make it simple to connect components?
* When things go wrong, how do developers debug issues at the infrastructure level? Do they reach out to ops?
I had a similar idea. I have enough experience with visual programming environments to be wary. Here are my thoughts on why it might be a good approach here: * It would be possible to take a whiteboard scribble and turn it into a real system. Combining this with the services available in the cloud, you end up with something really powerful. It all comes down to the level of abstraction supported. You have to be able to draw boxes at a level that adds value, but also zoom in to parameters at the service/API level as necessary. * I've worked on a team that was responsible for designing and maintaining its own AWS infrastructure. Along with that comes the responsibility for controlling cost. The idea of having a living architectural diagram that also reported cost in near real-time is really helpful, especially if you could start to do things like project cost given a level of traffic or some other measure.
Once you have a decent library of TF modules, and an understanding of the networking and compute fundamentals, and an understanding of the services offered by your cloud provider, you have something really powerful. If a service can help accelerate that, it's worth it IMHO.
We imagined a world where you could go into architecture review and come out of that meeting with staging stood up and ready to run your application.
This makes sense for infra because it's mostly config management and API calls. Visual programming is rough because control structures are soo hard to visualize.
> * You want to simplify infrastructure, but there's a new learning curve here. Why did you decide to go with diagramming as a solution? What other methods did you evaluate and discard?
We try to make it so both teams have to learn as little as possible. For the ops team, we are built on the tools those teams are familiar with terraform, helm, ansible, etc. Our extension model is also ops-oriented. You add add'l provisioners by writing Dockerfiles, you enforce pre-validations with JSON Schema (this is the best we could come up w/, but figured it was a safe bet ops-wise since its a part of OpenAPI). For devs, they dont have to learn the ops teams tools to provision infrastructure, they just diagram. Massdriver was originally a wall of YAML to connect all the pieces, but it felt fumbly (and like everything else).
I wanted to make a VR version like something youd see in a bad hacker movie, but Dave told me not to get ahead of myself. :D
> * How does an organization with existing infrastructure implement Massdriver?
Depends on if they have IaC or not. If they have IaC, they publish the modules. If their IaC has a state backend, its usually just good to go, if they are using localfiles for state, we offer a state server they can push state into.
If teams dont have IaC, we run workshops on "reverse terraforming" or "tofuing" and also offer professional services to codify that stuff for you.
> * How do you handle edge cases, custom configurations, complex logic, etc.? For example, workflows that use custom scripts or some other form of band-aid.
As noted above, its all based off common ops tooling. Lets say you wanted to use a new sec scanning tool for IaC and we don't have it in our base provisioners, you can write a dockerfile, build the image, then you can include that scanning tool in any of your massdriver configs. We also have folks doing day-2 operations with the platform. Things like database migrations and whatnot. The lines in the graph actually carry information and can push that info across different tools, so you can do things like have helm charts get information from a terraform run. You can build a provisioner with say the psql tool or a helm chart running bucardo and use it to set up replication between an old and new postgres instance.
> * The visual approach could make it too easy to piece together infrastructure without understanding the implications. How do you prevent developers from creating poorly architected systems just because you make it simple to connect components?
The lines and connections are actually a type system that you can extend (also based on JSON Schema). That way ops teams can encode common things into the platform once. ie. this is how we authenticate to postgres, its an AWS secret, security gruops and these IAM policies. All of that information flows across the line into the other module. The modules reject invalid types so common misconfigurations _cant_ happen. It also lets you "autocomplete" infrastructure. Lets say I'm a dev and I want to deploy a database. I can drop it on the canvas, since massdriver understands the types, it'll automatically connect it to a subnet that dev has access to.
> * When things go wrong, how do developers debug issues at the infrastructure level? Do they reach out to ops?
They may, we have a lot of stuff built in though to make the system as truly self-service (through day 2) as possible. There are runbooks per module so ops teams that have built out a module around a use case can put in common trouble shooting steps and its all accessible from the same graph. Alarms and metrics also show up there. Ops teams can also publish day-2 modules to the catalog, so developers can drag and drop common one-off tasks for maintenance onto their canvas and perform it.
Unrelated but could be confused with what was seen in Jurassic Park as "Unix".
[1] https://archive.org/details/vw_ca-unicenter-tng-demo
[2] https://en.wikipedia.org/wiki/CA_Technologies
That's really neat! Thank you for answering my questions and all the best with your launch!
I'm not a seasoned DevOps professional but I'm usually the one who ends up provisioning or setting up VMs, serverless stuff and DBs. I just don't understand the product.
You make reusable TF modules that have security and policies baked in. Engineers use a UI to hookup those modules and Massdriver does the deployment work for you.
Sounds like a godsend for big teams but I don't see pre-funded Startups being able to afford a $500/mo fee. For funded ones that's highly approachable but their problems with their IaC wouldn't be as visible.
Honestly, in smaller teams you can get pretty far with just setting thing sup through your cloud providers web console and just focus on what your building.
Since the fee is kind of steep, what's the justification for this. Is it that the workflow improvements would significantly improve productivity which would justify the cost or is the service itself expensive to run and maintain.
Massdriver isn’t aimed at pre-funded startups. Early-stage teams are often better off with a PaaS or setting things up manually until ops challenges become a bottleneck.
Our pricing (5-seat minimum) is intentional to dissuade smaller teams. The real value kicks in when teams need self-service. Ops teams build the modules (not us), and Massdriver acts as the interface. Developers diagram what they need, and Massdriver provisions using the ops team’s standards. This keeps developers focused on building while giving ops visibility and control over what’s deployed.
I have a friend whos a manager at a large e-commerce company who's teams entire responsibility is to oversee all matters regarding their private and public cloud usage. They also manage and maintain services for internal use.
I would love to recommend you guys to them because managing deployments from over a dozen teams located around the world is hell for them. However they have an extensive private cloud setup, would your solution be as applicable to them as it is to companies running on public clouds?
Private cloud isn't the best experience right now, its possible, but it requires our platform being able to 'get inside' so we either need a control plane exposed to us or a VPN connection in.
Self-hosted is our #1 requested feature, so we are cranking away at it. Its in alpha, and we're looking for testers/feedback. Would love an intro!
It's a long game, but might be worth it.
Not trying to critizice, just don’t understand how this works. I’ve got my company to pay for Pulumi after several years of usage, but I needed to be able to use it to get that far.
>The hard part isn’t just technical, it’s socio-technical. Ops teams want control, devs want speed, and compliance creates friction between them.
You got that right and I'm not sure another tool is going to fix it.
I wouldn't say Ops wants control, it's we want to stop being paged after hours because Devs yolo stuff into production without a care in the world. Not sure tooling will fix that.
I love your (https://www.massdriver.cloud/blogs/devops-is-bullshit) blog article though. It prompted a great discussion.
You have hit the nail on the head here. Our base hypothesis is the only way to solve this problem is to start with a self-service approach. If I deploy an RDS instance and nobody ever connects to it, it will never have an issue. The moment a Dev starts firing N+1's at it, I have to get up at 1am. Developers need to have ownership and accountability for their infrastructure without having to become absolute cloud experts.
Our goal is to enable Ops teams to build catalogs of solid building blocks that developers can build into novel architectures and safely own and operate. The collaboration between Ops and Dev is delegated to software and eases this friction.
> It looks like a functional platform and another "Cloud defaults are too scary? Here is a sane default option."
I would push back on this notion. An Ops team builds reusable modules that match their reliability and compliance requirements. You _can_ use modules we have created but we expect that you own your IaC modules. They will conform and evolve with your organization's best practices.
The DevOps is bullshit article is the inspiration for making a platform that manages the relationship between Dev and Ops which I think separates us from our competitors in the space.
This will never happen. You can’t own something you don’t understand.
Ops and Dev are different roles for a reason, and the only reason we’ve shifted away from that is to accelerate profits; yes, you can spend your way to growth, and yes, you can run massively complex systems on hardware you have never seen, nor understand. That doesn’t make it a good idea.
The hyperscalers have convinced people that you don’t need to know how to run a database, you can just use RDS et al. You don’t need to know how to manage K8s, you can just use EKS. This is dangerously untrue, because those tools are good enough that most people can get them going and they’ll work reasonably well, right up until they don’t (especially RDBMS). Then you hit edge cases that require you to have a solid understanding of their architecture, as well as Linux administration – for example, understanding how Postgres’ bgwriter operates, and how it is affected by disk IOPS and latency, not to mention various kernel-level tunings. None of this matters in the slightest with small DBs, say, < 100 GiB. It may not even matter at larger scales depending on your query patterns.
The various DB offerings (I’m going heavily on the DB example because that’s my current job) like Neon and Planetscale mostly have the right idea, IMO – stop assuming devs know or want to do ops work. They want an endpoint, and they want things to be performant, so have automatic schema and index reviews. Critically, in this model, devs are not responsible for nor accountable for the infra’s performance (more or less; discussions on schema design and its impact on DB performance aside). In other words, this has separated ops and dev.
I say they’ve mostly got it right because they do still allow you to make bad decisions, like chucking everything into a JSONB column, using UUIDv4 as a PK, etc. Realistically a service would fail if they tried refusing to allow that flexibility, so I get it.
For an in-house solution, though, this can be the case. The old school, rigid mentality of having to ask cranky graybeards to add a column had an extremely powerful benefit: bad decisions were much less likely to occur. Yes, it’s slower, and that’s a good thing. Everywhere I’ve been, bad decisions made in the name of velocity have come calling, and it’s a nightmare to fix them.
In summary, I like the idea of Massdriver quite a bit, actually; I just don’t think it’s a good idea nor accurate to say that it allows devs to be responsible for their own infra, because they largely don’t want that, nor are they capable. Not for lack of intelligence, but lack of experience. Let specialists be specialists. You want a DB? Ask the DB team, and don’t get mad when they tell you your proposed schema is hot garbage. You want more compute? Ask the infra team, and don’t be surprised when they ask you if you’ve profiled your code to determine if you actually need more.
DevOps Is Bullshit (2022) - https://news.ycombinator.com/item?id=36354049 - June 2023 (278 comments)
DevOps is broken - https://news.ycombinator.com/item?id=33274988 - Oct 2022 (348 comments)
What does a seat entail? You talk about self serve (I love it!), but would the users that self-serve take up a seat? Or are seats just for the folks creating the modules?
For larger teams we do buckets of seats with price reductions per-seat as the team size goes up.
We don't rate limit deploys - as that would make you less agile, we don't charge based on resources because that penalizes teams with stable infrastructure.
We know ops budgets can be tight, we're ops engineers, so we try our best to make the pricing work well for those budgets.
* It turns the connections between components owned by different IaC tools/systems into _typed_ JSON. Think patch panels, where the connections are fully typed.
* Kelsey mentioned that you can introspect on the metadata in the live system using absolutely anything you want, down to just bash scripts. So it's _very_ hackable
Glad you liked it! Yeah, the typed connections are a big part of what makes Massdriver powerful. It makes sure infrastructure components integrate right without devs having to worry about all the low-level details.
We're expanding the graph metadata with a querying system coming out of alpha soon that lets you ask stuff like "Where are all my t3 instances in production?" or "Which services are using a kubernetes version less than 1.25" Makes it way easier to understand what’s running where.
And since it’s all API-first, it’s easy to write quick scripts for reporting or automating changes across environments.
I’ll admit a graph query makes it easier, but the information is there.
This is nice because it's still all using git and you can continuously merge in updates as the 'best practice' definitions in the intent get updated. It allows teams to maintain their own customizations and eventually promote them to up to the intent level to capture them as a best practice so other teams can benefit from them.
Cloud APIs mix operational and developer concerns, which is part of the problem. A single API call for something like a database forces you to think about instance types, failover strategies, and security settings when all a developer really cares about is, "I need a database that can handle X traffic and supports Y extensions."
I’m actually working on a write-up about abstractions for a CNCF infra white paper and would love to get your thoughts. A lot of teams struggle with the balance between standardization and flexibility, and it sounds like you’ve thought a lot about this too. Let me know if you’d be up for a chat.
Also, here’s a post I wrote about it recently:
https://www.massdriver.cloud/blogs/the-case-for-abstractions...
I actually have a boilerplating tool I use to set up projects and it picks a random name from a text file at ~/cool-names.txt ...