Keep Shipping

What's new? We've been shipping new platform capabilties at maniacal speed

Anand — Fri, 22 May 2026 09:59:32 GMT

Over the past 10-15 days, our team has been continuously shipping. Of the things I can reveal publicly today, here are some improvements that have meaningful impact on teams who deploy using LocalOps.

Record language runtime metrics or custom metrics:

Applications now start to expose /metrics endpoint in their services and declare them in ops.json like below. Those metrics will be read, recorded and exposed in the in-built Gafana dashboard. They can expose language run time metrics like JVM metrics or NodeJS metrics, or record business metrics like # of transactions or # of notifications sent and so on.

{
    "metrics": {
        "endpoint": "/metrics",
        "interval": 15,
        "port": 9090
    }
}

Checkout our developer docs here to learn how you can use this in your stack: https://docs.localops.co/environment/services/instrument-service

Valkey support:

Services can now spin up Valkey Elastic cache clusters (As a in drop in and cheaper replacement to Elastic cache Redis) in AWS as their dependency and use them via VPC private networking seamlessly. All they have is to do add this in their ops.json configuration and this service will be created/ensured to exist as dependency for their service to use via those environment variables exported under “exports” key.

{
  "dependencies": {
    "elasticache": {
      "clusters": [
        {
          "id": "cache",
          "prefix": "cache",
          "engine": "valkey",
          "version": "8.0",
          "instance_type": "cache.t4g.small",
          "num_nodes": 1,
          "exports": {
            "CACHE_CLUSTER_NAME": "$name",
            "CACHE_CLUSTER_ARN": "$arn",
            "CACHE_CLUSTER_ENDPOINT": "$endpoint",
            "CACHE_CLUSTER_ADDRESS": "$address",
            "CACHE_CLUSTER_PORT": "$port"
          }
        }
      ]
    }
  }
}

Checkout our developer docs here: https://docs.localops.co/environment/services/aws/elasti-cache#elasticache-redis-valkey-and-memcache

Web sockets / long living connections support:

Services that have long living connections with their clients can now specific the actual connection and request timeout in their “ops.json” configuration. This will let services control their connections and load at peak traffic much better.

“ops.json” configuration:

{
  "send_timeout": 600,
  "read_timeout": 600,
  "connection_timeout": 300
}

Learn more in our developer docs here:

https://docs.localops.co/environment/services/ops-json#request-timeouts

New built-in env vars:

We are starting to pass POD_IP and HOST_IP to the running containers of your service by default. These are private IPs assigned by the kubernetes cluster dynamically to your running containers and can be useful in many scenarios.

Checkout this and other built-in environment variables / secrets here in our docs: https://docs.localops.co/environment/services/secrets#built-in-secrets

New search bar in service selection menu

Handy if you have too many repos in your org ;)

SSL passthrough:

Instead of letting platform’s inbuilt nginx ingress handle SSL termination, your services can handle 443/SSL connections directly and terminate connections. Highly useful in mTLS use cases. Just pass this option in your “ops.json” configuration and Nginx will pass through SSL requests/connections directly to your service.

{
  "ssl_passthrough": true
}

Scale down to zero for services:

Now you can set 0 as the replica count in scaling tab. This will stop all your service’s running containers.

Stop button:

There is a new stop button on top right corner of your service. It does the same thing - set replica count to 0.

Deletion protection for services

Say you have critical services in your environment that when deleted can cause serious downtime or data loss. For eg. you have defined RDS instances as dependencies and they are owned by a service but used by other services, it is a good idea to ensure that that service is not deleted for any reason.

You can then turn on “Delete protection” under service settings. And it will prevent anyone from deleting the service without turning off this setting.

To learn more about our releases in realtime, follow our change log here at: https://docs.localops.co/changelogs/2026

Public roadmap:

We made our roadmap public here: https://github.com/orgs/localopsco/projects/9 so teams using us know what we’re exploring or working on.

Get started with LocalOps for free at https://console.localops.co/signup and start deploying on your own AWS account(s) or your customer’s AWS account in under 30mins.

If you like to get your account setup, schedule a time to talk with us: https://go.localops.co/tour

Cheers.

When Should a SaaS Startup Invest in an Internal Developer Platform? A Framework for Engineering Leaders

Madhushree Sivakumar — Mon, 11 May 2026 04:45:24 GMT

At some point, a growing engineering team stops shipping features and starts managing infrastructure. It happens slowly. Someone needs a new staging environment and waits three days for a DevOps ticket. A bug hits production that staging never caught because the two environments had different configs. An enterprise prospect asks if you can deploy inside their AWS account, and the honest answer is not yet.

These are not isolated problems. They are the same problem showing up in different places.

An internal developer platform is how teams fix this without just hiring more DevOps engineers. But when does investing in one actually make sense? This post breaks that down with specific signals, a cost model, and a straight build vs. buy comparison.

TL;DR

A DevOps queue that blocks product engineers is an infrastructure tooling problem, not a headcount problem
Team size is not the right signal for when to invest in an IDP. This post covers the signals that actually matter
Environment drift between staging and production is common, preventable, and rarely the first thing teams look at when debugging incidents
Building an IDP in-house goes well beyond writing Terraform. Most teams underestimate the full scope by months
There is a structured way to make the build vs. buy call. This post walks through it

What Is an Internal Developer Platform?

An internal developer platform is a set of tools, workflows, and golden paths that abstract away infrastructure complexity for engineering teams. Engineers still interact with the cloud, just through guardrails and pre-configured templates rather than raw configs.

The internal developer platform architecture typically covers:

Network setup: VPC, public and private subnets, NAT gateway
Compute: a managed Kubernetes cluster with auto-scaling and auto-healing
CI/CD: automated build and deploy pipelines triggered on every git push
Observability: logging, metrics, and dashboards configured per environment
Security: encrypted storage, encrypted secrets, role-based access, hardened defaults

Without this layer, each of these gets handled manually, inconsistently, and at different times. Engineers can still override configs, debug infrastructure issues, and access lower layers when needed. The IDP removes the need to do that by default.

What it guarantees is baseline consistency. Every environment starts from the same foundation. Teams extend from there.

For a deeper breakdown of IDP components and how they fit together, this post covers it in detail.

What Problems Does an IDP Actually Solve?

Most teams don’t feel the absence of an IDP as one big problem. It shows up as friction in four specific places.

DevOps becoming a bottleneck

No self-serve infrastructure layer means every environment change goes through one person or one team
New VPC, new RDS instance, updated security group — each request sits in a queue
Product engineers context-switch out of feature work and wait
The DevOps team isn’t the problem. The absence of self-service is

Environment drift

Dev, staging, and production get provisioned separately, often by different people using scripts that evolve independently
VPC configurations diverge, IAM roles have different permission boundaries, security groups don’t match
Teams start fearing deployments, release cycles slow down, and bugs hide in configuration differences that nobody checks first

Observability without a default setup

Logging and metrics don’t get skipped because they’re hard. They get skipped because there’s no default and no clear ownership
It’s always “we’ll add it later” — until the first serious incident, when there’s nothing useful to look at

See how LocalOps handles observability out of the box.

QA Bottlenecks

Shared staging creates its own queue. One broken service blocks every engineer waiting to test their PR
What IDPs actually unlock is ephemeral, per-pull-request environments where each change gets tested in isolation

See how ephemeral preview environments work in LocalOps.

When Is the Right Time for a SaaS Startup to Invest in an Internal Developer Platform?

The signal is not team size. It is when infrastructure work starts competing directly with product work for the same engineering capacity. Here are five concrete indicators.

1. Your DevOps queue has become a dependency for product engineers

Engineers are filing tickets to get environments provisioned, configs updated, or cloud resources created
Feature work pauses while they wait. This is not a people problem, it is a process gap
At this point, every sprint has hidden infrastructure overhead that never shows up in velocity metrics

2. Spinning up a new environment takes more than a day

A production-grade environment on AWS involves a VPC, subnets, NAT gateway, a managed Kubernetes cluster, compute nodes, a load balancer, CI/CD configuration, and an observability stack
If your team is assembling this manually each time, the setup cost alone justifies automation
The benchmark worth targeting: a fully configured environment in under 30 minutes, not days

3. Staging and production have diverged

Different IAM role boundaries, mismatched security group rules, inconsistent environment variables
When staging passes and production breaks, the first suspect should be configuration drift, not the code
Teams that have experienced this more than twice are already paying the cost of not having standardised environment provisioning

4. Security reviews are exposing gaps

Enterprise procurement teams and SOC2 auditors ask specific questions: how are secrets managed, how is access controlled, what is encrypted at rest
If the answer involves manually checking each environment, the IDP decision has already been delayed too long

5. Engineers are making infrastructure decisions without guardrails

Ad hoc cloud resource creation, inconsistent IAM policies, no standardised secrets management
Each new engineer who joins makes slightly different infrastructure decisions
The inconsistency compounds quietly until a security incident or a failed audit surfaces it
At that point the cost of fixing it is significantly higher than preventing it

The IDP Investment Framework: How Engineering Leaders Should Think About This Decision

Most IDP conversations start with “should we build one” before answering “what problem are we actually solving.” That ordering leads to bad decisions in both directions.

Here is a structured way to make the call.

Step 1: Track where engineering time is actually going

Measure time spent outside product work for two weeks: environment provisioning, deployment debugging, infrastructure configuration
Get specific: how long does a new environment take to spin up, how many deploys failed due to config issues last month, how much time went into rework after staging and production diverged
If this number is not visible in your sprint tracking, it is still there, just invisible and compounding

Step 2: Map the compounding cost of doing nothing

A slow DevOps queue today becomes delayed releases next quarter
Environment drift today becomes production incidents next month
No observability defaults today becomes a difficult compliance audit next year
These costs do not stay flat. They grow with every engineer you hire and every service you add. Waiting is not a neutral decision

Step 3: Scope what building actually requires

Most teams budget 2-3 engineers and 4-6 months for an IDP build
The realistic minimum is 12-18 months to a usable platform, based on DORA research and platform engineering practitioner data
The full scope includes infrastructure orchestration, environment lifecycle management, secrets management, deployment abstraction, CI/CD golden paths, RBAC, and observability integration
That is not a side project. It is a product with internal customers and an on-call rotation

Step 4: Model the 3-year cost honestly

Apply fully-loaded salaries, not base pay
Account for the product features that do not get built while your best engineers are on platform work
Add ongoing maintenance: Kubernetes version upgrades, cloud API changes, security baseline updates, and internal support tickets consume 60-80% of platform team capacity after launch, based on Puppet State of DevOps data
Compare that total against subscription and infrastructure costs for a commercial platform

Step 5: Stress-test the failure scenario

What happens if the in-house build stalls at month 14?
What is the rollback path?
If you cannot answer this, the risk model is incomplete

How Do You Measure the ROI of an Internal Developer Platform?

ROI on an IDP is not abstract. It shows up in four measurable places.

Engineering time recovered

Calculate hours per week spent on infrastructure work that is not product development: environment setup, deployment failures, config debugging, onboarding new engineers to the stack. Multiply that by your average fully-loaded engineer cost. That number is your current infrastructure tax.

An IDP reduces and redistributes it. Some work shifts to platform maintenance, but the volume that falls on every product engineer drops significantly. A team of 10 engineers spending 20% of their time on infrastructure is effectively running with 8 engineers on product work.

Onboarding speed

In teams without a standard platform, a new engineer can spend one to two weeks just understanding how to get something into production. With a self-serve IDP, that path is documented, consistent, and repeatable. At a hiring pace of 10 engineers per year, that is weeks of recovered productivity across the team.

Deployment frequency as a signal

DORA’s State of DevOps research identifies deployment frequency as one of four key engineering performance metrics. An IDP removes common bottlenecks that limit frequency: manual environment setup, shared QA queues, inconsistent release steps. It is one enabler, not the only factor. Culture, test coverage, and architecture all matter too.

Time to revenue on new customer environments

For teams where infrastructure readiness sits on the critical path to closing deals or onboarding customers, the revenue impact is direct. Faster provisioning compresses the gap between signed contract and live customer.

Should You Build or Buy an Internal Developer Platform?

The honest answer depends on one question: is building and maintaining a developer platform core to your business, or is it infrastructure you need out of the way so you can focus on what is?

For most SaaS startups, it is the latter.

What building actually requires

Most teams start with a focused scope: 2-3 engineers, a few months, an MVP. That is a reasonable way to start. The challenge is what comes after. Once internal teams depend on the platform, you cannot wind it down. It becomes a product with internal customers, a backlog, and an on-call rotation. The full scope of a production-grade IDP includes:

Infrastructure orchestration: VPC design, subnet layout, cluster provisioning, IAM policies
Environment lifecycle management: creation, promotion, teardown, and drift detection across dev, staging, production, and customer environments
Secrets management: per-environment secret scoping, rotation policies, integration with Vault or AWS Secrets Manager
Deployment abstraction layer: normalising deployment primitives across Kubernetes, ECS, or whatever comes next
Observability integration: Prometheus, Loki, Grafana per environment, maintained ongoing
RBAC and audit trails: required by enterprise customers and compliance audits

For mature platforms, Puppet State of DevOps data shows 60-80% of platform team capacity going to maintenance after launch rather than new features. That is not universal, but it is a realistic benchmark for what long-term commitment looks like.

What buying replaces

A commercial platform handles the infrastructure layer for you. You connect your cloud account, link your repositories, and get production-grade environments without writing Terraform, Helm charts, or Dockerfiles. The tradeoffs are real: vendor roadmap dependency, lock-in risk, and feature constraints. But for most teams where platform engineering is not a core competency, those tradeoffs are smaller than the cost of a long-term in-house commitment.

When building makes sense

Size matters, but it is not the only factor. Complexity and constraints matter more:

A permanent, dedicated platform team that will not compete with product hiring
Regulatory constraints that genuinely rule out external control planes
Specific architectural requirements no commercial platform accommodates

For a detailed cost breakdown with 3-year TCO modelling, this post covers it in full.

Backstage, Open Source IDPs, or Managed IDP: Which One Is Right for Your Team?

Backstage is the most widely adopted open source internal developer platform. It is also frequently misunderstood as a free option.

The license is free. Running it is not. An independent analysis of a Backstage-based portal for 300 developers estimated 7 full-time engineers for the first 12 months, followed by 6 full-time engineers ongoing, with a total 3-year cost of approximately $3.25M including infrastructure. That figure comes from a Roadie TCO analysis and uses fully-loaded salary costs.

It is also worth being clear about what Backstage is: primarily a developer portal, a UI layer for service discovery, documentation, and catalog management. It can be part of an IDP stack, but it does not provision infrastructure, configure CI/CD pipelines, or manage your observability stack on its own. If environment provisioning and deployment automation are the problems you are solving, Backstage addresses a different layer.

Purpose-built platforms handle the infrastructure layer directly. The right choice depends on whether your primary problem is service discoverability or environment provisioning. For a deeper comparison, this post breaks it down in detail.

What Should Engineering Leaders Look for in an Internal Developer Platform?

Evaluating an IDP is not just about features. The architecture and ownership model matter as much as what the platform does on day one.

Infrastructure visibility and control

The platform should give your team full visibility into what is being provisioned and where. Ideally, resources run inside your own cloud account so your security policies, compliance controls, and cost monitoring apply directly. Some teams are fine with managed control planes where the vendor handles orchestration remotely. The question to ask is: if this vendor disappears tomorrow, what do you still own and what do you lose?

Understand the lock-in before you commit

Every platform has some lock-in. The goal is not to avoid it entirely but to understand it clearly. How portable are your deployment configs? What does migration look like if you outgrow the platform? Platforms that provision standard infrastructure in your cloud account and support export paths carry lower switching costs than those that abstract everything into proprietary formats.

Observability included, not bolted on

If logging and metrics require a separate setup step, they will get skipped. The platform should provision your observability stack alongside your environments, pre-configured and ready from day one. The practical outcome: when something breaks in production, your team has logs and metrics for the window that matters, not a gap where nobody thought to set up monitoring yet.

Self-serve environment provisioning

The test is simple: can a product engineer spin up a new environment without filing a ticket or writing infrastructure code? If the answer is no, the bottleneck problem is not solved.

Extensibility for non-standard requirements

No platform covers every use case. Look for Terraform or Pulumi extension support so teams can add custom cloud resources without leaving the platform entirely.

AWS internal developer platform support and flexibility

Most teams run on AWS and stay there for years. When evaluating an AWS internal developer platform, the more relevant question is not multi-cloud support but whether the platform can deploy across multiple AWS accounts and regions cleanly, and whether adding a second cloud provider later requires a full migration or just a new connection.

FAQs

1. What does it actually take to build an internal developer platform in-house?

More than most teams scope for. The full build covers infrastructure orchestration, environment lifecycle management, secrets management, a deployment abstraction layer, CI/CD golden paths, observability integration, and RBAC. Most teams budget 2-3 engineers and 4-6 months. The realistic minimum based on DORA research and platform engineering practitioner data is 12-18 months to a usable platform. That is not an MVP. That is a system your entire engineering org can depend on.

2. Internal developer portal vs platform: what is the difference?

A portal is a UI layer: service catalog, documentation, API registry, ownership tracking. A platform is the operational system underneath: environment provisioning, deployment automation, secrets management, observability. Backstage is a portal. It can sit on top of a platform, but it does not replace one. Teams that build a portal and assume they have a platform are solving discoverability when the real problem is infrastructure automation.

3. Is Backstage a good starting point for an internal developer platform?

Depends on what problem you are solving. The Backstage internal developer platform is well suited for service discoverability and developer documentation at scale. It is not an infrastructure provisioning tool. Running it is also not free: an independent analysis estimated a 3-year total cost of approximately $3.25M for a 300-developer org using fully-loaded salaries. If environment provisioning and deployment automation are your primary problems, Backstage addresses a different layer.

4. What is the relationship between platform engineering and an internal developer platform?

Platform engineering is the discipline. An internal developer platform is the output. A platform engineering team builds and maintains the IDP, defines golden paths, and owns the self-serve infrastructure layer that product engineers use. The IDP is what makes platform engineering tangible for the rest of the organisation.

5. How do you evaluate and choose the best internal developer platform for your team?

Start with the problem, not the feature list. The best internal developer platforms for developer experience are the ones that remove the specific bottlenecks your team actually has: slow environment setup, environment drift, observability gaps, or manual release steps. Evaluate on infrastructure ownership model, exit path, security defaults, observability out of the box, and whether self-serve provisioning actually works without tickets or infrastructure code.

Conclusion

Engineering teams usually know they need a better infrastructure setup long before they do anything about it. The cost of waiting is not dramatic. It is slow: a sprint here, an incident there, an enterprise deal that takes longer than it should because the environment was not ready.

An IDP does not fix everything. But it removes the class of problems that have no business sitting on a product engineer’s plate.

The right time to invest is when infrastructure work is visibly competing with product work for the same engineering capacity. If you are already past that point, the decision is overdue.

If you’re figuring out how this would fit into your setup, then LocalOps team can help you work through it:

Book a Demo -- Walk through how environments, deployments, and AWS infrastructure are handled in practice for your setup.

Get started for free -- Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs -- A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Suggested Articles

Platform Engineering vs DevOps: What Your Team Should Own and How an Internal Developer Platform Makes It Possible

Madhushree Sivakumar — Sun, 10 May 2026 05:01:46 GMT

DevOps changed how engineering teams ship. Shared ownership, CI/CD, infrastructure as code, on-call rotations that included developers. It worked. Teams that did it well moved faster and broke less.

What it did not solve was consistency at scale.

DevOps tells teams to own their stack. It does not define what that ownership should look like across ten different teams. Over time, things drift. Pipeline configs diverge. Terraform modules get duplicated. Staging no longer matches production because nobody is responsible for enforcing a standard. The engineers who understand the infrastructure become a bottleneck because everything routes through them.

This is not a culture problem. It is a structural one.

An internal developer platform solves it.

Platform engineering is the discipline of building that platform. The IDP is what teams actually use. A shared, self-serve layer that standardizes what every team was previously solving on their own.

This post covers where the ownership line sits between DevOps and platform engineering, why it keeps shifting, and how an IDP holds it in place.

TL;DR

DevOps is about how teams work. Platform engineering is about how systems are structured.
The gap between them shows up at scale: Dockerfiles, pipelines, environments, and database provisioning end up inconsistently owned and repeatedly re-built.
An internal developer platform closes that gap by standardizing the infrastructure layer. Platform teams define it. Application teams consume it.
Self-serve cloud access, consistent environments, and pre-wired observability are what that ownership model looks like in practice.

Platform Engineering vs DevOps: Who Owns What and Why It Matters

DevOps and platform engineering are not competing approaches. They operate at different layers.

DevOps defines how teams work together:

Shared ownership of the deployment lifecycle
Developers responsible for what they ship to production
Automated pipelines and continuous delivery

Platform engineering defines what teams work on top of:

A shared infrastructure layer every application team uses
Built and maintained as an internal product by a dedicated platform team
Application teams are its users, not its maintainers

The ownership split:

Without a platform team holding that infrastructure layer, every application team ends up owning a part of it by default. They write their own Terraform. Configure their own pipelines. Set up their own monitoring. None of it is wrong individually. Collectively it creates an infrastructure estate that is inconsistent, expensive to maintain, and impossible to audit.

The IDP is what prevents that. It is the mechanism that keeps the ownership line real.

Why DevOps Stops Working at Scale

DevOps works at small team sizes because coordination is implicit.

In a team of 8 to 10 engineers, most people understand the stack. Infrastructure decisions happen informally. One or two people might own Terraform or CI/CD, but the scope is small enough that it does not create friction.

As the number of teams increases, that model breaks down.

Tooling divergence

Teams choose their own CI/CD systems, deployment patterns, and observability setups. Over time, there is no consistent baseline.

This creates two problems:

Engineers have to relearn infrastructure when they move between teams
Cross-service debugging becomes slower because systems behave differently

Bottlenecks around infrastructure knowledge

Infrastructure knowledge concentrates in a small number of engineers.

They handle:

pipeline issues
environment provisioning
access requests

This creates a queue. Most of their time goes into support work. There is little capacity left to build shared systems or improve the overall setup.

Environment inconsistency

Staging and production environments drift.

Differences in configuration, dependencies, or data handling lead to cases where something works in staging but fails in production. Fixes are usually applied locally, so the underlying inconsistency persists.

Cost of fragmentation

A significant portion of engineering time is spent dealing with infrastructure friction instead of product work.

The 2024 Atlassian State of Developer Experience report found that 69% of developers lose eight or more hours per week due to tooling and environment inefficiencies.

Root cause

Infrastructure responsibilities are distributed across teams, but there is no shared system enforcing consistency.

Adding more process does not solve this. It increases coordination overhead.

A shared infrastructure layer is required to standardize how environments, pipelines, and access are managed.

The Ownership Grey Zone: Dockerfiles, Databases, and Pipeline Config

The ownership split between platform and application teams looks straightforward until you hit the specific cases where neither side has a clean answer.

Dockerfiles

If application teams own them, you get different base images across services, inconsistent OS patch levels, and builds that behave differently per team. If the platform team owns them, every build change goes through a ticket. Application teams lose control over their own build process.

Database Provisioning

Application teams understand the schema and access patterns. Platform teams know how to provision RDS with encryption, automated backups, and least-privilege IAM. In most orgs neither side fully owns it. It becomes a back-and-forth that slows both down.

Pipeline Configuration

The platform team ships a standard pipeline. An application team hits an edge case it does not cover and forks the config. The fork gets committed. Now there are two versions in production and the platform team finds out six months later when something breaks.

Environment-Specific Config

If application teams control what staging looks like, staging drifts from production. If the platform team controls it, they become a bottleneck for every environment change.

These do not have clean policy answers. The way an IDP handles them is by making the standard path easier to follow than the custom one. When the opinionated default covers 90% of cases, the grey zone shrinks.

What an Internal Developer Platform Actually Provides

An IDP is not a developer portal with a service catalog. The portal is a UI layer. The internal developer platform architecture is the infrastructure underneath it: environment provisioning, observability, IAM, and security baseline all running as a shared layer.

Environment Provisioning

Each environment needs a consistent infrastructure topology:

Dedicated VPC with private and public subnets
NAT gateway and internet gateway
Managed Kubernetes cluster with compute nodes
Load balancer for inbound traffic

Test, staging, and production get the same topology. There is minimal manual configuration at the infrastructure level.

See what gets provisioned inside a LocalOps environment

Observability

The platform provisions Prometheus, Loki, and Grafana pre-configured and wired together. Developers get logs and metrics from day one without writing scrape configs or setting up log aggregation pipelines.

What the platform handles:

Prometheus scraping system metrics from compute nodes
Loki collecting application logs from all services
Grafana dashboards connected to both, accessible out of the box

If the platform does not own this, each team sets up their own stack. You end up with multiple Grafana instances, inconsistent metric naming, and no way to correlate logs across services during an incident.

See how LocalOps sets up monitoring by default

IAM and Keyless Cloud Access

Example, on AWS, the internal developer platform handles cloud access through an OIDC-based trust relationship

An OIDC provider is created in the target AWS account
When a deployment fires, the platform assumes a scoped IAM role
AWS STS issues short-term credentials that expire in under 60 minutes
Developers interact with the platform interface, not the AWS console

No static access keys. No IAM users per developer. Direct AWS access is minimized for routine deployments.

Here’s How LocalOps connects to AWS

Security Baseline

Every environment the platform provisions includes:

Encryption at rest on all volumes
Security groups locked to least-privilege by default
Services running in private subnets with no public IP unless explicitly required

The application team does not configure any of this. A new environment is secure by default. There is no security checklist to run through before going to production.

How an Internal Developer Platform Enforces the Ownership Boundary

A documented ownership model does not enforce itself. Teams work around it when the platform path is slower than doing it themselves. The IDP makes the ownership boundary real by making the platform path the path of least resistance.

Developers can provision environments without filing a ticket or writing infrastructure code
Cloud access is handled by the platform, not handed to developers as raw AWS credentials
Every environment the platform provisions is structurally identical, so infrastructure is not redefined per team

When those three hold, the ownership line stays where it was drawn. Application teams cannot accidentally own the infrastructure layer because the platform has already absorbed it.

See how LocalOps defines the ownership boundary between platform, developer, and cloud provider

DevOps Without an IDP vs Using an IDP: The Real Cost Difference

At small team sizes the gap is manageable. As headcount grows, the cost of not having a platform layer compounds across every dimension below.

FAQs

1. Does an internal developer platform replace DevOps completely?

No. An internal developer platform and DevOps operate at different layers. DevOps is a cultural model: shared ownership, continuous delivery, developers responsible for what they ship. The internal developer platform is the structural layer that makes those practices scalable. It absorbs the infrastructure complexity that was slowing DevOps down at scale. Teams still own their deployments, their SLOs, and their release decisions. The platform owns the infrastructure underneath.

2. What is the difference between an internal developer portal vs platform?

A portal is a UI. Backstage is the most common open source internal developer platform example. It gives developers a single interface to find services, read docs, and trigger workflows. But Backstage is a frontend. The platform is everything behind it: environment provisioning, CI/CD pipelines, IAM, observability, secrets management. Backstage without a platform underneath it gives you a service catalog and not much else. The open source internal developer platform ecosystem has matured, but the portal and the platform are still two distinct layers that teams often conflate.

3. How do you build an internal developer platform?

Start with the problems you are actually solving, not the tooling. Define the ownership model first: what the platform team owns, what application teams own, and where the grey zone sits. From there, the platform needs to cover environment provisioning, CI/CD templates, observability, IAM, and a security baseline as a minimum. Building all of that in-house requires dedicated platform engineers and ongoing maintenance. The alternative is using an existing platform that covers those layers so your team focuses on product work instead of building internal tooling.

4. What are the best internal developer platforms and what should you look for?

The basics: self-service environment provisioning, standardized CI/CD pipelines, pre-configured observability, keyless cloud access, and a security baseline on by default. Beyond that, three things matter practically. First, does it run infrastructure in your own cloud account. Second, can you extend it when your requirements go beyond what it covers out of the box. Third, can you eject and export your infrastructure configuration to run it independently if you need to. A platform that locks you in without an exit path is a liability as your requirements evolve.

5. How does platform engineering relate to an internal developer platform?

Platform engineering and internal developer platform go hand in hand. Platform engineering is the discipline. The IDP is what it produces. A platform engineering team treats the IDP as an internal product, with developers as its users. They define the ownership model, build the golden paths, and maintain the infrastructure layer that application teams run on. Without platform engineering as a practice, an IDP is just tooling with no one accountable for keeping it useful.

6. Do teams use Backstage as an internal developer platform?

A common misunderstanding. Backstage is a developer portal, not a platform. Teams adopt Backstage expecting self-service infrastructure and end up with a service catalog and a software inventory. Backstage gives developers a UI to find services, read documentation, and trigger workflows. It does not provision environments, manage IAM, run CI/CD pipelines, or enforce a security baseline. Those capabilities come from the platform layer underneath it. Backstage can sit on top of that platform as an interface. It cannot replace it.

Conclusion

DevOps solved the collaboration problem. This was a significant shift. Before it, development and operations teams worked in silos, and shipping was slow. The cultural change improved how teams built and deployed software.

What DevOps did not solve is how systems behave at scale. With 30 engineers, multiple services, and no shared infrastructure layer, ownership fragments. Terraform gets duplicated. Pipelines diverge. Staging drifts. The few engineers who understand the setup spend their time on support instead of building systems.

That is not a culture problem. Adding more DevOps process does not fix it.

The IDP is the structural answer. The platform team owns the infrastructure layer. Application teams own what runs on it. Developers get self-service access to environments, pipelines, and observability without managing the underlying systems. The ownership line holds because the platform makes the standard path easier than working around it.

If your engineers are spending significant time on infrastructure that has nothing to do with your product, you are missing a platform layer. That is the problem the IDP exists to solve.

If you are evaluating whether a platform approach fits your team, the best way to understand it is to see how it maps to your current setup. You can book time with our engineer to walk through your existing workflows and identify where an internal developer platform would make a difference.

Get started for free -- Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs -- A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

How Internal Developer Platforms Automate CI/CD Deployments Directly From GitHub

Madhushree Sivakumar — Sat, 09 May 2026 05:01:35 GMT

Engineering teams accumulate deployment complexity faster than they accumulate engineers. A GitHub Actions workflow gets written for staging. A slightly different one gets written for production. Someone adds a step here, removes one there, and six months later nobody is sure which version is canonical. The engineer who knows the pipeline best becomes a single point of failure.

This is not primarily a CI/CD problem. It is a deployment infrastructure problem, and it sits squarely in the domain of platform engineering, internal developer platforms are one of the most effective ways teams address it.

TL;DR

Internal developer platforms connect directly to your GitHub repo and trigger automated builds and deployments on every push to a configured branch
No pipeline YAML required. The IDP handles the build, infrastructure provisioning, and deployment as a platform function
Branch-to-environment mapping replaces the pipeline-as-config model. Push to staging branch, it deploys to staging. Push to main, it deploys to production.
Per-PR preview environments spin up automatically on pull request open and tear down on merge or close
Cloud dependencies like RDS, S3, and SQS are declared in a single config file and provisioned automatically during deployment
IAM permissions for those dependencies are generated and applied by the platform, not manually configured

What Most Engineering Teams Are Actually Managing Before an IDP

Engineering teams rarely plan for pipeline complexity. It arrives gradually, one workaround at a time. A new environment needs a pipeline. A new service needs a Dockerfile. A new AWS resource needs an IAM role. None of it feels like a problem at the time. Collectively, it becomes one.

Where the pipeline complexity accumulates

The first deployment pipeline gets written when the product needs to go somewhere. GitHub Actions is the default choice. Someone writes a workflow file, it works, and nobody touches it again. Then a second environment needs its own pipeline. Then a third. Each one starts as a copy of the previous one and drifts from it within weeks.

By the time a team hits ten engineers, the typical setup looks like this:

A staging pipeline that someone modified three months ago to add a build step nobody documented
A production pipeline that is slightly different and nobody is sure why
A Dockerfile per service, each with a different base image and different layer ordering
IAM roles created manually for each service, with permissions added incrementally and never audited
No per-PR environments because building that logic takes weeks nobody has

The person who understands all of this is usually one engineer. When they leave, the team inherits infrastructure they cannot fully explain.

Deployment logic is spread across individual pipeline files, individual Dockerfiles, and individual IAM configurations. There is no central place where deployment behaviour is defined, versioned, or enforced. Each engineer who touches the system makes local decisions that compound over time.

This is not a broken system. It is a fragmented one.

What Is an Internal Developer Platform and How Is It Different From a CI/CD Tool

A CI/CD tool runs a pipeline. It takes a trigger, executes a sequence of steps, and reports success or failure. It assumes the environment it runs in is already defined and correctly configured.

An internal developer platform architecture operates at a broader layer. It provisions the infrastructure the deployment runs on, manages the environments those deployments target, handles the cloud dependencies services rely on, and enforces consistent configuration across all of it. CI/CD behaviour is built into the platform, not assembled separately.

The distinction matters because most deployment failures are not pipeline failures. They are infrastructure issues that surface during deployment. A service fails because an IAM role lacks the right permissions. A migration conflicts with application startup. A staging environment diverges from production due to manual configuration.

A CI/CD tool can attempt to handle these cases, but it cannot standardise or prevent them. An internal developer platform operates at the layer where these problems originate, defining how environments, dependencies, and services are consistently created and run.

How an IDP Integrates With GitHub for Continuous Deployment

The integration model is simpler than most teams expect. You connect your GitHub organisation, grant the platform access to selected repositories, and it listens for events like pushes or merges. When a change occurs, the platform pulls the latest code and handles the deployment workflow from that point.

Access is typically read-focused for source code, with optional permissions to update commit statuses or deployment checks. The platform does not rely on embedding complex pipeline logic inside the repository. Instead, it centralises deployment behaviour within the platform itself.

The pull model: what triggers a deployment and what doesn’t

Not every push triggers a deployment. Deployments are initiated based on mappings between branches and services or environments. For example, a push or merge into a mapped branch can trigger a deployment, while activity on other branches is ignored.

This differs from how GitHub Actions workflows are typically structured. In that model, each repository defines its own triggers, steps, and environment targets. Deployment logic lives inside the repo. With an internal developer platform, that logic is defined centrally, and the repository primarily contains application code.

How push-to-deploy works at the infrastructure level

When a deployment is triggered, the platform builds the application artifact, often as a container image, and deploys it to a pre-configured runtime environment such as a Kubernetes cluster. The environment already exists, and service configurations define health checks, dependencies, and runtime behaviour.

There is no provisioning step during each deployment. Infrastructure is created when environments are set up, allowing deployments to focus purely on delivering code.

See how deployments are triggered in LocalOps

How Do You Set Up Automatic Deployments Triggered by a Git Push in an IDP

The setup typically involves four steps. You connect your GitHub organisation, connect your cloud account, create an environment, and define a service within that environment.

The service is where deployment behaviour is configured. You specify a repository, a branch to monitor, and runtime settings such as how the application should start. From that point, changes to the mapped branch can automatically trigger builds and deployments.

Connecting GitHub and mapping branches to environments

GitHub integration is usually handled through an OAuth flow. You authorise the platform to access selected repositories, allowing it to listen for events like pushes and merges. Access is primarily read-focused for source code, with limited permissions to update commit statuses or deployment checks.

See how GitHub connection works in LocalOps

Each service is mapped to a specific branch. A backend service might track the main branch in one repository, while a frontend service tracks a release branch in another. Deployments are triggered when changes land on these mapped branches.

For teams using protected branches, the model still works. Merging a pull request into the configured branch triggers a deployment using the latest commit. The trigger is tied to repository events rather than individual users, ensuring deployments run consistently.

Deployment Configuration Without a Pipeline File

Traditional CI/CD puts deployment logic in the repo. A .github/workflows directory, a Dockerfile, maybe a helm/ folder. Every engineer who touches the repo can see it, modify it, and break it.

An IDP moves that logic to the platform. The only file you need in your repo is ops.json, and it does not define pipeline steps. It defines the runtime state your service needs to reach.

What the deployment contract looks like in practice

ops.json sits at the root of your repository. The platform reads it on every deployment. It covers four things:

What needs to run before the service starts (init jobs: migrations, seed scripts, dependency checks)
What cloud resources the service depends on (S3, RDS, SQS, ElastiCache)
What health check to use (HTTP, TCP, gRPC, or shell command)
What scheduled jobs to wire up (cron paths and intervals)

A migration step is a good example. Declare an init job with the migration command, set once: true so it runs once across all pods rather than per container. The platform runs it, waits for it to finish, then starts the main service. If the migration fails, the deployment stops. The previous version keeps running.

Cloud dependencies follow the same pattern. Declare an RDS instance or S3 bucket in the dependencies block. The platform provisions it, generates the IAM permissions, and injects the connection string as an environment variable. No hardcoded credentials, no manual policy writing.

Per-PR Preview Environments Without Custom Engineering

Most teams skip per-PR environments because building them properly is a significant engineering project. You need logic to dynamically name Kubernetes namespaces, spin up isolated dependencies, inject the right secrets, post a URL back to the pull request, and tear everything down cleanly on merge. Teams that have done this know how long it takes to get right.

Here’s how LocalOps handles preview environments

An IDP handles this as a platform feature, not a custom build.

When a pull request is opened against a configured branch, the platform typically:

Creates a new isolated service running the PR branch code
Provisions dependencies for that PR based on what the service configuration declares
Posts a comment on the GitHub PR with the public URL and deployment logs link
Triggers new deployments to the preview service on every subsequent commit to the PR branch
Cleans up the preview service when the PR is merged or closed, though cleanup behaviour varies by platform

The level of isolation depends on how the platform is configured and what the service actually needs. Some teams run preview environments with full database isolation per PR. Others share a staging database and only isolate the application layer. Both are valid depending on cost tolerance and test requirements.

Preview environments also give application code a way to detect they are running in a non-production context, which is useful for handling third party integrations, seed data, and feature flag behaviour differently during review.

What an IDP Does Not Replace in Your CI Setup

An IDP is not a replacement for everything in your existing CI setup. Understanding the boundary matters.

What the platform owns:

Infrastructure provisioning
Environment management
Build and deployment on git push
Cloud dependency provisioning and IAM
Health checks and deployment sequencing
Preview environment lifecycle

What GitHub Actions or your existing CI tool still owns:

Running tests before a deployment proceeds
Linting and static analysis
Security scanning
External integrations like Slack notifications or third party audit hooks
Custom pre-deployment logic specific to your organisation

The two systems are not in conflict. Most teams run tests in GitHub Actions on every push and let the IDP handle everything after the code is considered deployable. The CI pipeline is the quality gate. The IDP is the delivery layer.

Trying to consolidate both into one system usually creates more complexity than it removes. Clean handoff: CI validates, IDP deploys.

FAQ

1. Can an IDP replace GitHub Actions entirely?

Not entirely. GitHub Actions handles testing, linting, security scanning, and any custom pre-deployment logic your team needs. An IDP handles everything after the code is considered deployable: infrastructure, environments, builds, deployments, and cloud dependency provisioning. Most teams run both. GitHub Actions is the quality gate. The IDP is the delivery layer. Trying to consolidate both into one system adds more complexity than it removes.

2. Do you need to build an internal developer platform from scratch to automate GitHub deployments?

No. Building from scratch typically means assembling Kubernetes, Terraform, ArgoCD, a container registry, secrets management, and IAM configuration, then wiring them together and maintaining them ongoing. That is a significant engineering investment before a single deployment is automated. Managed IDP solutions handle that entire layer out of the box. You connect your GitHub repo, map a branch to an environment, and deployments start automatically on every push.

3. Can Backstage as an open source internal developer platform automate CI/CD from GitHub?

Backstage is a developer portal, not a deployment platform. It catalogs your services, provides a software inventory, and surfaces documentation. It does not provision infrastructure, trigger deployments, or manage environments. To get CI/CD automation from Backstage, you need to integrate it with separate tools like ArgoCD, Terraform, and a container registry. That integration work is substantial and sits entirely with your team to build and maintain.

4. Which is the best internal developer platform for GitHub-based deployments?

The best fit depends on what your team needs to own. If you want full control over every layer, a self-hosted setup built on Kubernetes and Terraform gives you that, with the associated maintenance overhead. If your priority is getting GitHub-to-AWS deployments working without building the infrastructure layer yourself, a managed IDP is the more practical choice. The right question is not which platform has the most features. It is how much of the deployment infrastructure your team should be maintaining at your current stage.

5. Internal developer portal vs platform: which one handles CI/CD?

A portal surfaces information. A platform executes operations. An internal developer portal shows you what services exist, their health status, and their documentation. An internal developer platform provisions environments, triggers deployments, manages cloud dependencies, and enforces configuration standards. CI/CD automation lives in the platform layer, not the portal layer. Many teams confuse the two because some tools market themselves as both.

Conclusion

An IDP does not introduce a new deployment mechanism. It restructures where deployment logic lives.

Instead of spreading infrastructure configuration across individual repositories and engineers, the platform centralises it. Environments are provisioned once and reused. Services declare what they need. Deployment behaviour follows a consistent pattern across the organisation.

CI systems still validate code. The platform takes over once code is deployable, running it in a correctly configured environment every time.

Deployments become easier to reason about because the logic is defined in one place, not scattered across twenty.

If you are evaluating whether this model fits your team, the best way to understand it is to see how it maps to your current setup. You can book time with our engineer to walk through your existing workflows and identify where a platform approach would make a difference.

Get started for free -- Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs -- A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Backstage vs Off-the-Shelf IDP: Infrastructure Complexity, Cost, and Time-to-Value

Madhushree Sivakumar — Fri, 08 May 2026 11:52:03 GMT

Backstage gets evaluated as a cost-saving decision. No licensing fees, open source, backed by Spotify and the CNCF. On paper, it looks like the responsible choice.

The problem shows up later. Usually around month six, when the team that was supposed to be building developer tooling is instead maintaining a TypeScript application, chasing plugin compatibility, and managing Kubernetes upgrades for the portal itself.

This post breaks down what Backstage actually is, what it takes to run it in production, and where off-the-shelf IDPs make more engineering sense. No vendor bias. Just the trade-offs as they actually play out.

TL;DR

Backstage is a developer portal framework, not a complete internal developer platform. It catalogs your infrastructure. It does not provision it.
Running Backstage in production requires 3-5 dedicated engineers, frontend skills most platform teams don’t have, and ongoing plugin maintenance that compounds over time.
The “free” label applies to licensing only. Engineering time, upgrade cycles, and opportunity cost are where the real spend accumulates.
Off-the-shelf IDPs handle the infrastructure layer Backstage assumes is already solved: environment provisioning, CI/CD, observability, secrets, and networking.
The right choice depends on your team size, existing platform maturity, and whether you have the engineering capacity to treat a portal as an internal product.

Understanding Backstage

Spotify built Backstage in 2020 to solve a specific problem: 2,000 engineers, hundreds of microservices, no single place to find who owned what or how to spin up a new service consistently. They open sourced it, donated it to the CNCF, and called it an open source internal developer platform framework.

That last word matters. Framework.

Out of the box, Backstage gives you:

A service catalog for tracking ownership and metadata across services
Software templates for scaffolding new services consistently
TechDocs for internal documentation
A plugin architecture for connecting external tools like GitHub, PagerDuty, and Kubernetes dashboards

What it does not do:

Provision infrastructure or cloud environments
Manage CI/CD pipelines
Handle secrets rotation or environment isolation
Set up observability or networking

Those systems need to exist independently before Backstage has anything useful to catalog.

Most teams miss this. They come in looking for an internal developer platform that reduces DevOps bottlenecks and speeds up environment provisioning. Backstage solves neither. It solves discoverability and standardization at the service layer, which is a real problem, but only after the platform layer is already working.

Evaluating Backstage without understanding that scope is how teams end up 12 months in, having built a catalog nobody uses, still waiting on DevOps for environments.

Baseline Infrastructure Needed to Operate Backstage

Backstage is a Node.js backend with a React frontend backed by PostgreSQL. Before it can serve a single request in production, your team needs to provision and maintain the following:

Hosting and compute:

A Kubernetes deployment for the Backstage app and its backend services
PostgreSQL instance for the catalog — SQLite is dev-only and not supported in production
Ingress controller with TLS termination for internal network access
Container registry and a CI pipeline to build, tag, and ship Backstage image updates on every release

Authentication:

SSO integration with your identity provider — Okta, Google Workspace, or GitHub OAuth
RBAC configuration per team and service boundary
Per-plugin auth wiring, since most plugins manage their own token or OAuth flow independently

Ongoing operational surface:

Backstage ships monthly releases, several with breaking changes to the plugin API
Each plugin has its own version compatibility matrix that breaks independently of core
The New Backend System migration in 2024 required significant refactoring across all self-hosted deployments
Security patches have no vendor SLA — internal triage owns the timeline

This is the baseline. Custom plugins, catalog ingestion pipelines, and scaffolding templates sit on top of all of this.

The Real Cost of Running Backstage

Backstage has no licensing fee. That is the only part of the cost equation that is actually zero.
Everything else shows up in engineering time, not a software invoice. That is the core trade-off with any open source internal developer platform: the software is free, the operational ownership is not.

Engineering headcount:

Community-reported data from vendors like Roadie and platform teams suggests that successful self-hosted Backstage deployments typically require a dedicated platform team. In many cases, that means 2–4 engineers maintaining and evolving the portal, with larger organizations investing significantly more.

At a median US senior engineer cost of around $150K–$180K per year, even a small team represents a $300K–$700K annual investment, before infrastructure or opportunity cost.

Skill requirements:

Backstage requires TypeScript and React expertise for meaningful customization
Many platform teams are backend or infrastructure-focused
Teams either hire for it, reassign product engineers, or accept limited extensibility

Ongoing maintenance:

Frequent releases, including breaking changes, require active upgrade work
Plugin compatibility issues are common in larger deployments
No vendor SLA means incidents and bugs are handled internally

The adoption problem:

Adoption is not guaranteed. Many teams report low internal usage when Backstage is introduced without strong platform workflows behind it. A well-designed catalog does not create value on its own. If developers still depend on manual processes for provisioning and deployment, the portal becomes an additional layer rather than a productivity gain.

What an Off-the-Shelf IDP Actually Handles

Internal developer platform architecture is built around one core responsibility: provisioning and managing the infrastructure your engineers deploy into, not cataloging what already exists.

A production-grade IDP typically handles:

Infrastructure provisioning:

VPC, subnets, and network isolation per environment
Managed Kubernetes cluster provisioning and configuration
Cloud resource provisioning — RDS, S3, ElastiCache, SQS — without manual Terraform

Developer workflows:

Push-to-deploy from a connected Git repository
Per-branch or per-environment CI/CD pipeline automation
Preview environments on pull requests

Observability:

Pre-configured Prometheus, Grafana, and Loki per environment
Logs and metrics available without manual instrumentation
Alerting to Slack or equivalent out of the box

Security defaults:

Encrypted volumes and secrets storage
Role-based access control across environments
Auto-renewing TLS certificates

The key difference from Backstage: none of this requires the engineering team to write Dockerfiles, Helm charts, or Terraform modules.

Here’s how LocalOps provisions each environment

Backstage vs IDP: Architecture and Approach

The architectural difference between Backstage and a modern IDP is which layer of the stack each one owns.

Backstage operates at the portal layer. It consumes APIs, aggregates metadata, and presents a unified UI over infrastructure that already exists. A modern IDP operates at the platform layer. It provisions and manages that infrastructure directly.

The two are not direct competitors. Organizations running Backstage at scale typically pair it with a separate platform layer — Humanitec, Crossplane, or custom Terraform modules — to handle infrastructure provisioning. That works, but it means owning and maintaining two separate systems. A IDP consolidates this by providing and standardizing the platform layer as part of the product.

See how LocalOps structures a production environment

Backstage vs IDP: Total Cost of Ownership

The licensing comparison is straightforward. Backstage has no licensing cost. Most commercial IDPs charge a subscription, often based on users or usage. That is where the simplicity ends.

According to Roadie’s 2025 State of Backstage Report, 70% of companies that are “very happy” with Backstage dedicate at least three full-time engineers to maintaining it. At $180,000 median US senior engineer compensation, that is $540,000 per year in headcount alone, before infrastructure, upgrade cycles, and opportunity cost.

A commercial IDP subscription for a 30-person engineering team typically runs well under $20,000 per year. The gap between those two numbers is where the real TCO conversation starts.

Time-to-Value: Deployment and Adoption

Time-to-value is where the gap between self-hosted and off-the-shelf solutions becomes most visible.

Getting Backstage to the point where a new engineer can push code and see it deployed typically takes months. Most off-the-shelf IDPs reduce this to hours or days, depending on environment complexity.

The faster a platform delivers value, the more likely it is to be adopted. Slow setup cycles increase the risk that internal platforms stall before developers rely on them.

When Backstage Can Actually Work

Backstage is not the wrong tool. It is the wrong tool for certain team sizes and maturity levels.

It works well when:

Your engineering org has 100+ engineers with a dedicated platform team of at least 3 people
You already have working infrastructure automation — Terraform modules, CI/CD pipelines, environment provisioning — and need a catalog layer on top
You have TypeScript proficiency on your platform team and can treat Backstage as an internal product with a dedicated roadmap
Your primary problem is service discoverability, ownership tracking, and documentation consistency across a large, distributed engineering organization

If all four of those conditions are true, Backstage is a reasonable choice. If two or more are missing, the operational overhead will outpace the value delivered.

When to Choose a Modern IDP

A modern off-the-shelf IDP makes more sense when:

Your team is under 100 engineers and does not have a dedicated platform function
Environment provisioning, CI/CD, and observability are not yet standardized across your stack
You need developers deploying to production within days, not months
You are running on AWS and want production-grade environments without writing Terraform, Helm charts, or Dockerfiles
Your platform engineers should be working on developer experience, not maintaining portal infrastructure
You have evaluated Backstage and concluded the operational overhead exceeds your current capacity to absorb it

The core question is not which tool has more features. It is which tool your team can actually operate without it becoming a maintenance liability.

FAQs

1. Internal Developer Portal vs Platform: What’s the Difference?

A developer portal is a UI layer. It surfaces information about services, ownership, and documentation. A developer platform is the operational layer underneath. It provisions environments, manages deployments, handles secrets, and runs CI/CD pipelines. Backstage is a portal. A modern internal developer platform handles the infrastructure layer the portal sits on top of.

2. How Long Does It Take to Build an Internal Developer Platform Through Backstage?

Getting a production-ready Backstage internal developer platform typically takes months, not weeks. Most teams report 6 to 12 months before the catalog is populated, plugins are stable, and developers are actively using it. Off-the-shelf IDPs reduce that to days or weeks.

3. How Does Platform Engineering Relate to an Internal Developer Platform?

Platform engineering and internal developer platforms go hand in hand. Platform engineering is the discipline of building and maintaining the infrastructure and tooling that developers deploy into. The best internal developer platforms are what platform engineering teams build or adopt to deliver that capability consistently across the organization.

4. Can Backstage Provision Cloud Infrastructure Like AWS?

No. Backstage does not provision cloud infrastructure. It has no native capability to create VPCs, spin up EKS clusters, configure RDS instances, or manage any AWS resources directly. An AWS internal developer platform handles that layer. Backstage catalogs what already exists.

5. Is Backstage Suitable for Small Engineering Teams?

Generally not. Backstage requires a minimum of 2 to 4 full-time engineers to maintain in production, TypeScript proficiency on the platform team, and existing infrastructure automation before it delivers meaningful value. For teams under 50 engineers without a dedicated platform function, the operational overhead typically outweighs the benefit.

Conclusion

Backstage is a well-engineered tool. The CNCF backing, the plugin ecosystem, and the community are real assets. For large engineering organizations with a dedicated platform team and existing infrastructure automation, it is a reasonable choice.

For everyone else, the math rarely works out. Three engineers maintaining a portal is three engineers not building the platform capabilities your developers are actually waiting for.

The decision is not really about Backstage vs everything else. It is about what your team can absorb operationally right now. An IDP that provisions infrastructure on day one and gets out of the way delivers more value than a portal your team spends a year building.

If your team is at that decision point, a 20-minute technical walkthrough with one of our engineers is worth the time before you commit engineering resources in either direction.

Get started for free -- Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs -- A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Introducing Projects

Anand — Tue, 28 Apr 2026 12:04:53 GMT

Our customers wanted to isolate people & environments across different projects so that their team doesn’t deploy / see environments they don’t belong to.

We re-imagined and re-built Projects to solve this.

Along the way, we made a ton of UI enhancements to make it easier to create multiple projects, switch between them and isolate cloud environments with clarity.

We consciously didn’t invest time in building complicated RBAC features to make things simple for admins to govern environments and easier for teams to isolate and secure resources without getting into Spaghetti full of permissions and roles configurations.

Projects

A project is a place where environments and members working on one product or client or an agenda can be grouped together in one place. Go to “Projects” section under Organization menu to create/manage projects.

Say you’re building an Agentic CRM product. You can create a project and organize environments as follows.

Project Name: Agentic CRM

Environments you can create and group under the project:

CRM QA
CRM UAT
CRM Production

Members: Can include people who work on the CRM project. Unless they belong to the project, they won’t be able to access or deploy in above environments.

Unless someone belongs to Agentic CRM project, they can’t see or deploy in above environments.

Another example - say you are product studio/agency and your dev team is building applications for a client - Acme Co. You can then create a project as follows.

Project Name: Acme Co

Environments:

ACME QA
ACME UAT
ACME Prod

Or say you have BYOC distribution for your service for your enterprise customers and you’re deploying & managing such environments on the customer’s cloud accounts as In-VPC distribution, you can have a project as follows. And assign specific DevOps engineers to manage such customer environments.

Project Name: BYOC distribution

Environments:

Acme Co
Heist enterprise
Trust bank
and so on..

Default project: For every new signup/organization, we create a “Default" project for convenience. Any number of additional projects can be created as appropriate to your setup.

Brand new navigation:

Until now, the sidebar navigation had a flat menu structure that let users navigate between environments, connections and code repositories.

With Projects coming in as a native group method, we are introducing a project switcher at the top so users can switch between their projects in 1-click.

And we have cleaned up all the menu structure to clearly indicate which menus belong to a Project scope and which ones belong to their Organization’s scope. This gives clarity over what is isolated and grouped under projects and what functionality works at the organization scope.

New clean URLs:

We put a lot of effort in making the platform super simple to navigate and are a big fan of clean URLs.

We have now cleaned up all the page URLs of the console (& our product documentation 😅) so that they now have a URL clearly indicating the hierarchical context of capabilities (Orgs > Projects > Environments).

Some examples for you:

https://console.localops.co/localops-inc/org-projects
https://console.localops.co/localops-inc/projects/65a299d7-a1ee-4547-8f3e-14e6d0b61dd6/environments
https://console.localops.co/localops-inc/projects/65a299d7-a1ee-4547-8f3e-14e6d0b61dd6/members
https://console.localops.co/localops-inc/projects/65a299d7-a1ee-4547-8f3e-14e6d0b61dd6/deployments
https://console.localops.co/localops-inc/connections

Our roadmap has more capabilities coming in to let teams organize their setup and manage their AWS environments with much more ease.

Want to solve similar problems in your cloud operations?

Get started with a quick demo at https://go.localops.co/tour. Our engineers will guide you with a personalized tour.

Cheers.

Open-Source Heroku Alternatives: What Works in Production and What Doesn't

Nidhi Pandey — Fri, 24 Apr 2026 14:06:10 GMT

TL;DR

What this covers: What open-source Heroku alternatives actually deliver in production versus in demos, the real build-versus-buy cost of running a self-hosted alternative when engineering hours and on-call burden are included, and the long-term architectural risks that engineering leaders accept when choosing between a managed PaaS and a self-hosted cloud-native IDP.

Who it is for: CTOs and VPs of Engineering evaluating Heroku alternatives in 2026, specifically teams that have looked at Coolify, Dokku, Caprover, or CapRover and are trying to determine whether the open-source route is the right call for a production workload.

The conclusion: Open-source Heroku alternatives are genuinely good for specific use cases: small teams, internal tooling, hobby workloads, and teams with a dedicated platform engineer who wants full control over every configuration surface. For scaling SaaS teams with production reliability requirements, enterprise compliance obligations, and engineering organisations where platform maintenance competes with product development for senior engineering hours, the total cost of a self-hosted alternative consistently exceeds that of a managed cloud-native IDP, once all hours are included. This guide shows exactly where costs accumulate and what the architectural risks look like over a three-year horizon.

Evaluating whether to build or buy your platform?
→ See how a production-ready AWS environment is set up in 30 minutes

The Open-Source Heroku Alternative Landscape in 2026

The field of open-source Heroku alternatives is active and well-populated. Teams evaluating the category in 2026 have meaningful options, each with a different point of view on how to replicate Heroku’s developer experience on self-hosted infrastructure.

Dokku is the oldest and most established option. It describes itself as a “Docker-powered mini-Heroku” and implements the Heroku buildpack interface on a single server. Push to a Git remote, Dokku builds and deploys. The operational model is close to Heroku for simple applications. The scaling model, Dokku, runs on a single server, which is the primary limitation.

Coolify is the most actively developed option in 2026. It provides a web UI for managing applications, databases, and services across multiple servers. It supports Docker Compose deployments and has grown its feature set significantly. It has a strong following among indie developers and small teams.

CapRover sits between Dokku and Coolify in complexity. It runs on Docker Swarm, supports multiple nodes, and provides a dashboard for application management. It is more capable than Dokku for multi-service architectures but less actively developed than Coolify.

Kamal (formerly MRSK, from the Rails team at 37signals) takes a different approach: it deploys Docker containers to bare servers using SSH, with zero platform software running on the target servers. It is a deployment tool more than a platform, and it requires more operational involvement than the others.

Each of these tools has genuine strengths. None of them is a complete substitute for Heroku at the production scale for a B2B SaaS team with reliability requirements and enterprise customers. The gap between what they demonstrate in a tutorial and what they require in production is where the real evaluation happens.

Want to skip managing platform layers entirely?
→ Explore how LocalOps handles infrastructure, deployments, and observability on AWS

What Open-Source Heroku Alternatives Actually Deliver in Production

The honest evaluation of open-source Heroku alternatives requires separating two distinct questions: what do they deliver at initial setup, and what do they require to operate reliably at production scale over time?

The initial setup story for most open-source Heroku alternatives is genuinely good. Dokku on a DigitalOcean droplet or a small EC2 instance can be running and deploying applications within an hour. Coolify’s setup process is well-documented and fast. For a team deploying a single application with a Postgres database and basic monitoring, these tools deliver the Heroku-like experience they promise.

The production-scale story is where the honest picture diverges from the tutorial experience.

What works well in production:

Single-server deployments for low-traffic applications. Dokku and Coolify are reliable for applications that fit on a single server with headroom. Internal tools, admin dashboards, low-traffic marketing sites, and development environments run well on these platforms. The operational model is simple because the infrastructure is simple.

Teams with a dedicated platform engineer. Open-source Heroku alternatives work in production when there is a person on the team whose job it is to maintain the platform, respond to platform failures, and keep the underlying infrastructure patched and healthy. The platform does not manage itself; it requires ownership. When that ownership is clearly assigned, and the person has the skills to carry it, these tools can support meaningful production workloads.

Non-critical workloads where downtime is acceptable. Tools, demos, staging environments, and internal services where an hour of downtime does not translate directly to customer impact or revenue loss tolerate the failure modes of self-hosted platforms better than customer-facing production services do.

What breaks or degrades at the production scale:

Multi-node reliability. Dokku is fundamentally a single-server platform. Dokku Scheduler plugins exist to add Kubernetes or Nomad scheduling, but they add significant operational complexity and are not production-proven at the same level as the core Dokku runtime. Coolify’s multi-server support is functional but has more surface area for failure than a managed Kubernetes cluster operated by a team with dedicated expertise.

Automatic failover. When the server running a Dokku deployment fails, the application goes down. Bringing it back up requires either manual intervention or a separately built and maintained automated recovery system. Heroku handles dyno failure transparently. Managed Kubernetes on EKS handles pod failure transparently. Self-hosted platforms without a managed control plane require explicit failover engineering.

Observability at scale. The default observability story for most open-source Heroku alternatives is thin. Coolify provides basic container metrics. Dokku provides application logs through dokku logs. Neither provides the integrated metrics collection, log aggregation, and dashboard correlation that production incident response requires. Teams running these platforms in production typically assemble a separate observability stack, Prometheus, Loki, and Grafana, on top of the platform, which is the same assembling problem that Heroku’s add-on model creates, shifted from a managed service to a self-hosted infrastructure project.

Security patching cadence. Open-source platforms run on servers that require OS-level patching, Docker runtime updates, and platform software updates. These updates need to happen on a cadence that matches the vulnerability disclosure cycle, which is roughly continuous. On a managed platform, Heroku or a cloud-native IDP, the platform team handles this. On a self-hosted alternative, someone on the engineering team handles it. That someone is typically the same person who handles production incidents, infrastructure changes, and developer support requests.

The Real Operational Limitations of Open-Source Heroku Alternatives

This is the gap between what open-source Heroku alternatives promise and what they deliver for scaling startup teams, and it is worth examining each limitation specifically rather than in aggregate.

Limitation 1: Scaling models that don’t match production workload patterns.

Dokku scales vertically by default: increase the server size to handle more load. Horizontal scaling in Dokku requires additional configuration and is not automatic. Coolify and CapRover support Docker Swarm for multi-node deployments, but Docker Swarm’s autoscaling model is limited compared to Kubernetes’ Horizontal Pod Autoscaler.

For workloads with variable traffic, B2B SaaS applications that peak during business hours, consumer applications with campaign-driven spikes, and any application with non-linear load patterns, the inability to autoscale horizontally based on real-time resource metrics is a meaningful operational gap. Teams either overprovision the server (paying for idle capacity continuously) or accept that traffic spikes will cause degraded performance until someone manually adjusts the infrastructure.

Limitation 2: Database management is the team’s responsibility.

Heroku Postgres is a managed database: provisioning, backups, point-in-time recovery, connection pooling, version upgrades, and failover are handled by the platform. Open-source Heroku alternatives do not provide managed databases. Coolify can deploy a Postgres container; it does not manage it the way Heroku Postgres or Amazon RDS does.

Running a production Postgres database correctly, with automated backups, point-in-time recovery, failover, connection pooling, and a maintenance window strategy for version upgrades, is a non-trivial operational project. Teams that migrate to an open-source Heroku alternative and bring their Postgres database with them are taking on database operations as an engineering responsibility. For teams without database operations expertise, this is where production incidents happen.

Limitation 3: SSL, networking, and security configuration are manual and ongoing.

Heroku handles SSL termination, HTTP-to-HTTPS redirection, and TLS certificate renewal automatically. Open-source alternatives handle this through Let’s Encrypt integration (Coolify, Dokku, and CapRover all support this), but certificate renewal failures, DNS configuration errors, and networking changes are the team’s operational responsibility.

At low service count, this is manageable. As the service count grows, ten, fifteen, twenty services, the surface area for SSL and networking configuration errors grows proportionally. Each service is an additional certificate to renew, an additional DNS record to maintain, and an additional networking configuration that can drift from the intended state.

Limitation 4: No compliance posture for enterprise customers.

Open-source Heroku alternatives running on a VPS or small EC2 instance do not provide the compliance infrastructure that enterprise procurement requires: dedicated VPC, IAM-based access control with audit logs, network isolation between services, data residency in a specified region, and evidence of SOC 2 or equivalent security posture.

For B2B SaaS teams without enterprise ambitions, this may not matter today. For teams with any enterprise go-to-market motion, the infrastructure needs to satisfy the security questionnaire that accompanies every significant deal. An application running on a Coolify-managed Docker host cannot answer that questionnaire honestly. The compliance gap that exists on Heroku also exists on most open-source Heroku alternatives, and in some cases is larger, because self-hosted infrastructure adds an operational attack surface that managed platforms control more tightly.

These are the exact gaps teams hit at scale
→ Talk to an engineer about how teams move off self-hosted setups without downtime

When should a scaling startup choose a managed cloud-native IDP instead?

The decision point is not binary, but there are clear signals that the open-source route is the wrong call for a specific team:

The team has more than five engineers, and product development competes directly with platform maintenance for senior engineering time
The product has enterprise customers or is actively pursuing enterprise deals that will require security questionnaire responses.
The application has stateful workloads, background job queues, WebSocket connections, and large file processing, which require reliable persistent compute
The team does not have a designated platform engineer with Kubernetes or Docker Swarm expertise.
Production downtime directly impacts customer experience, SLAs, or revenue

At any of these signals, the TCO analysis for a self-hosted alternative consistently comes out worse than a managed cloud-native IDP when all costs are included. The following section works through why.

The True Build-Versus-Buy Cost of a Self-Hosted Heroku Alternative

This is the analysis that most engineering teams do not complete before choosing the open-source route, and the one that produces the most surprises when they do it retrospectively after six months of operation.

The surface-level cost comparison is straightforward: open-source Heroku alternatives are free to use, so the cost is just the infrastructure. A $40/month DigitalOcean droplet or a small EC2 instance is dramatically cheaper than Heroku’s production dyno tiers. The analysis looks obvious.

The cost that does not appear in that comparison is engineering time, for initial setup, for ongoing maintenance, for incident response, and for the opportunity cost of senior engineers spending hours on platform work instead of product development.

The initial setup cost:

Setting up a production-grade self-hosted Heroku alternative takes longer than tutorials suggest. The tutorial path, Dokku on a droplet, one application deployed, is genuinely fast. The production path, multi-server setup, managed database configuration, observability stack, SSL and networking configuration, backup verification, monitoring and alerting, runbook documentation, takes a senior engineer two to four weeks of focused effort.

For a company with fifteen engineers at an average fully-loaded cost of $200,000/year, a senior engineer costs approximately $100/hour. Two weeks of setup at 40 hours per week is $8,000. Four weeks is $16,000. This is the upfront engineering investment before the platform serves a single production request.

Most teams do not account for this cost because it feels like infrastructure work rather than a direct expenditure. It appears in sprint velocity metrics as product features not shipped. It does not appear in the infrastructure budget.

The ongoing maintenance cost:

After initial setup, a self-hosted Heroku alternative requires continuous maintenance. Security patches for the host OS, Docker runtime, and platform software need to be applied on a regular cadence. The observability stack needs to be maintained. SSL certificates need to be monitored. Backup integrity needs to be verified periodically. Capacity needs to be reviewed as the application grows.

Conservative estimate for a production self-hosted setup: two to four hours of platform maintenance per week. At $100/hour for a senior engineer, that is $800–$1,600/month in ongoing maintenance cost, consistently, before any incidents occur.

Over twelve months, the maintenance cost alone is $9,600–$19,200. This is in addition to the infrastructure cost, and it compounds with the service count as the platform’s surface area grows.

The on-call burden:

Self-hosted platforms fail in ways that managed platforms handle transparently. A pod crash on EKS restarts automatically. A disk filling on a self-hosted Coolify instance takes the application down until someone investigates and resolves it. A networking configuration change that breaks SSL certificate renewal on a Dokku server causes downtime until someone diagnoses and fixes it.

Production incidents on self-hosted infrastructure happen at off-hours with the same frequency as on managed platforms. The difference is who responds. On Heroku or a managed IDP, Heroku’s reliability team or the IDP provider’s on-call rotation handles infrastructure incidents. On a self-hosted alternative, someone from the engineering team handles them.

At a company with one person effectively on-call for the platform, this represents a meaningful quality-of-life cost that compounds into retention risk over time. Senior engineers who regularly wake up at 2 AM for platform incidents that managed infrastructure would have handled automatically do not stay in that role indefinitely.

The opportunity cost: what the platform engineer is not building.

This is the largest and least visible cost component. Every hour a senior engineer spends on platform maintenance, incident response, and infrastructure configuration is an hour not spent on the product features, performance improvements, and technical debt reduction that drive business value.

For a team at Series A with a product roadmap full of competitive priorities, redirecting a senior engineer’s capacity toward platform maintenance is a strategic cost that appears nowhere in the infrastructure budget and shows up instead in delayed product releases and compressed competitive advantage.

The full cost comparison:

The Long-Term Architectural Risks of Each Path

This is the analysis that matters most for engineering leaders making a three-year infrastructure bet. The day-one costs are relatively predictable. The long-term architectural implications of the hosting choice shape the company’s technical trajectory in ways that are harder to reverse than they appear at the decision point.

Risk Profile 1: Managed PaaS Heroku Alternatives (Render, Railway, Fly.io)

Managed PaaS alternatives to Heroku provide a genuine improvement on Heroku’s developer experience in some areas and similar constraints in others. They are worth examining honestly as a category before discussing the IDP path.

What improves over Heroku: Better pricing models in most cases, better geographic distribution options, more modern infrastructure primitives, and more active product development. Render and Railway, in particular, have improved the developer experience meaningfully relative to Heroku.

What does not improve: The fundamental architecture remains the same. The application runs on the provider’s infrastructure, not in the team’s cloud account. The compliance posture is governed by the provider’s security posture, not the team’s. Infrastructure control is limited by what the provider exposes. The vendor dependency is as deep as Heroku’s, in some cases, deeper, because modern PaaS platforms have fewer migration paths than Heroku did.

The long-term architectural risk: Three years into using a managed PaaS alternative, the team has made application architecture decisions shaped by the platform’s constraints, built deployment workflows around the platform’s model, and accumulated operational familiarity with a platform that may change pricing, deprecate features, or change ownership. The compliance and infrastructure control limitations that eventually drove the team off Heroku are structurally present in every managed PaaS alternative. The team is on a path that leads back to the same decision in a different form.

Risk Profile 2: Self-Hosted Open-Source Alternatives (Coolify, Dokku, Caprover)

Self-hosted alternatives transfer the operational risk from the vendor to the engineering team. This is a genuine benefit for teams with the expertise and capacity to absorb it. For most scaling SaaS teams, it is a risk transfer that creates more exposure than it eliminates.

The platform drift risk: Open-source platforms are maintained by communities with priorities that may not align with the production requirements of a scaling SaaS company. Coolify’s roadmap is driven by its maintainers’ priorities, not by what the teams running it in production need. A critical security vulnerability in an open-source platform’s dependency may be patched in weeks or months, not hours, as a commercial managed platform provider would address it. Teams running production workloads on open-source platforms accept that the platform’s maintenance cadence is outside their control.

The expertise dependency risk: Self-hosted platforms run reliably when someone on the team understands them deeply. That expertise lives in a person, not in the platform. When the person who set up and maintains the Dokku or Coolify installation leaves the company, the institutional knowledge leaves with them. The team inherits a production system they did not build, with configuration choices they do not fully understand, running on infrastructure they now own completely. This is a common and underappreciated operational risk for engineering-led companies.

The scale ceiling risk: Open-source Heroku alternatives have natural scale ceilings that are lower than the scale ceilings of Kubernetes-native platforms. Dokku hits its ceiling at single-server scale. Coolify and CapRover on Docker Swarm hit their ceiling below what a mid-sized production workload on EKS handles routinely. Teams that choose a self-hosted alternative may find themselves re-platforming again within two to three years as the workload outgrows the platform’s scaling model. Re-platforming has a cost that compounds with each cycle, and it diverts engineering capacity from product development at exactly the growth stage where product velocity matters most.

Risk Profile 3: Cloud-Native IDP on Your Own AWS Account (LocalOps)

The long-term architectural risk profile of a cloud-native IDP differs from both managed PaaS and self-hosted alternatives in one structural dimension: the infrastructure lives in the team’s AWS account.

What this means for long-term risk: The vendor dependency is on the IDP’s operational interface, the deployment experience, the observability dashboards, and the environment management UI, not on the infrastructure itself. If the IDP provider changes pricing, changes the product, or ceases operation, the EKS cluster, RDS databases, ElastiCache instances, and S3 buckets continue running in the team’s AWS account. The migration path is to adopt a different operational interface on top of the same infrastructure, not to re-platform the infrastructure from scratch.

The compliance posture compounds positively: An application running on AWS-native infrastructure inside the team’s own VPC, with IAM-based access control and AWS-native audit logging, starts from a compliance posture that can support enterprise customers from day one. As the company grows and enterprise procurement becomes more rigorous, the infrastructure posture improves by configuration, enabling additional AWS compliance features, adding audit log retention, enabling GuardDuty, rather than by platform migration.

The scale ceiling is the AWS scale ceiling: EKS scales to the capacity of the AWS region. There is no platform-imposed scale ceiling below the hyperscaler ceiling. Teams that grow from five services to fifty services, from one AWS region to three, and from $1M ARR to $100M ARR do not re-platform their infrastructure along the way; they scale the same Kubernetes and AWS-native services that were there from the beginning.

The architectural risk to manage: The risk of a cloud-native IDP path is primarily in the initial adoption decision: choosing a provider whose infrastructure model, pricing, and roadmap alignment fit the team’s three-year trajectory. This risk is manageable through due diligence at the selection point, examining the IDP’s architecture (does infrastructure live in your account?), pricing model (does cost scale reasonably with your workload growth?), and migration path (what happens if you need to change providers?).

How LocalOps Addresses the Build-Versus-Buy Decision

LocalOps is an AWS-native Internal Developer Platform that resolves the build-versus-buy tension for scaling SaaS teams, replacing Heroku.

The build case for a self-hosted alternative rests on cost and control. LocalOps preserves both while eliminating the engineering overhead of operating the platform. Infrastructure runs in the team’s AWS account, with full VPC control, full IAM visibility, and full compliance posture. The platform fee replaces the engineering hours that self-hosted alternatives require.

The buy case for a managed PaaS rests on simplicity and developer experience. LocalOps matches the simplicity of managed PaaS, push to branch, service deploys, logs and metrics are immediately available, without the compliance constraints and infrastructure opacity that come with a provider-owned cloud environment.

Connect your AWS account. Connect your GitHub repository. LocalOps provisions a dedicated VPC, EKS cluster, load balancers, IAM roles, and a complete observability stack automatically. No Terraform. No Helm charts. First environment ready in under 30 minutes. No platform maintenance, no on-call burden for infrastructure incidents, no observability setup project.

The infrastructure stays in the team’s AWS account permanently. The engineering hours that a self-hosted alternative would consume go back to the product roadmap.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” — Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” — Gaurav Verma, CTO and Co-founder, SuprSend

Frequently Asked Questions

When is Dokku or Coolify the right answer for a production workload?

Dokku and Coolify are the right answer when the workload is genuinely simple, the team has a dedicated platform engineer with operational enthusiasm, and downtime does not directly impact customer experience or revenue. Internal tools, admin dashboards, low-traffic APIs, and development environments are well-served by open-source alternatives. The wrong answer is applying these tools to customer-facing production services at Series A scale without accounting for the operational cost of running and maintaining the platform under real production conditions.

What is the most common failure mode when teams self-host a Heroku alternative in production?

The most consistent failure mode is not a technical failure; it is an expertise departure. The engineer who set up and understood the self-hosted platform leaves the company. The team inherits a production system with undocumented configuration choices, dependency versions that have drifted from the original setup, and no institutional knowledge of why specific decisions were made. Recovering from this situation typically requires a platform audit, a re-provisioning project, and several weeks of senior engineering time, at a moment when that time is usually least available. Managed platforms eliminate this risk category because the platform knowledge lives in the provider’s team, not in a single employee.

How does the compliance posture of self-hosted alternatives compare to LocalOps on AWS?

Self-hosted Heroku alternatives running on a VPS or small EC2 instance have a weaker compliance posture than applications running in a properly configured AWS VPC on LocalOps. VPC isolation, IAM-based access control, audit logging, and data residency are native features of the AWS environment that LocalOps provisions automatically. On a self-hosted alternative, these capabilities require explicit engineering work to implement, document, and maintain. For B2B SaaS teams pursuing enterprise customers, the compliance gap between a self-hosted alternative and an AWS-native IDP is the compliance gap between infrastructure that fails a security questionnaire and infrastructure that passes one.

What does the migration path look like if a team wants to move from Coolify or Dokku to LocalOps?

The migration involves three steps: containerising any applications that are not already containerised (most Dokku-deployed applications are Heroku-buildpack-based and require Dockerfiles), migrating managed services (Postgres to RDS, Redis to ElastiCache), and connecting the GitHub repository to LocalOps for automated deployments. LocalOps provides the environment infrastructure automatically, VPC, EKS cluster, load balancers, IAM roles, observability stack, so teams are not building the AWS environment from scratch. The migration from a self-hosted alternative to LocalOps is typically faster than the original setup of the self-hosted alternative, because LocalOps automates the infrastructure provisioning that the self-hosted setup required manual work to achieve.

Is the open-source alternative path ever cheaper than a managed IDP over a three-year horizon?

For teams with a dedicated platform engineer whose primary role is infrastructure operations and whose time is not competing with product development, the total cost of ownership for a well-run self-hosted alternative can match or slightly undercut a managed IDP over three years. This is the scenario where the build path makes financial sense: the engineering cost is already allocated to infrastructure work, and the open-source alternative gives that engineer maximum control. For teams where the platform engineer is also a product engineer, where platform maintenance competes directly with feature development, the open-source path is more expensive over a three-year horizon in every realistic model. The engineering opportunity cost of platform maintenance consistently exceeds the managed IDP’s platform fee.

What architectural signals indicate that a team has outgrown its self-hosted Heroku alternative?

The clearest signals are: traffic spikes that cause degraded performance because autoscaling is not available or not automatic, incident response time that is measured in hours rather than minutes because platform observability is insufficient, security questionnaire line items that cannot be answered honestly because the infrastructure lacks VPC isolation and IAM controls, and deployment pipelines that require manual steps because the CI/CD integration on the self-hosted platform does not support the team’s Git workflow. Any one of these signals is worth evaluating against the TCO comparison above. Multiple signals simultaneously indicate that the re-platforming project will be more costly the longer it is deferred.

Key Takeaways

Open-source Heroku alternatives are real products with genuine use cases. They are not the right answer for every team, and the evaluation should be made with the full cost picture visible, not just the infrastructure bill.

The TCO analysis for a self-hosted Heroku alternative has four components: infrastructure (cheap), initial setup engineering (significant upfront), ongoing maintenance engineering (continuous and compounding), and on-call burden (invisible in budgets, visible in retention). When all four are included, the self-hosted path is cheaper than a managed IDP only when a dedicated platform engineer’s time is already fully allocated to infrastructure work and not competing with product development.

The long-term architectural risk of the self-hosted path is the scale ceiling, the expertise dependency, and the platform drift, three risk categories that compound silently over two to three years and surface as a re-platforming project at the moment of least available engineering capacity.

The long-term architectural risk of the managed PaaS path is the infrastructure opacity, the compliance ceiling, and the vendor dependency on a provider whose infrastructure the team does not own, risks that surface specifically when enterprise customers arrive, and the security questionnaire requires answers the platform cannot support.

The cloud-native IDP path on the team’s own AWS account resolves both risk profiles: the platform is managed (no maintenance burden), but the infrastructure is owned (compliance posture, vendor independence, no scale ceiling). The engineering hours that a self-hosted alternative would consume return to the product roadmap. The compliance posture that a managed PaaS cannot provide is native to the AWS environment from day one.

Get Started with LocalOps → First production environment on AWS in under 30 minutes. No credit card required.

Schedule a Migration Call → Our engineers walk through your specific stack, whether you’re currently on Coolify, Dokku, or Heroku, and map the migration path to AWS.

Read the Heroku Migration Guide → Full technical walkthrough: database migration, environment setup, CI/CD pipeline configuration, DNS cutover.

How Internal Developer Platforms Give Engineering Teams Full Observability Without Manually Configuring Grafana

Madhushree Sivakumar — Fri, 24 Apr 2026 04:30:58 GMT

You have three environments: dev, staging, production. Each one has a Grafana instance someone set up months ago. Staging has different dashboard versions than production because whoever set it up used a different Helm chart version. The dev Prometheus stopped scraping correctly after a node replacement, but nobody noticed until a bug in dev could not be reproduced in staging. The production Grafana still has the default admin password because rotating it means logging into a cluster nobody touches unless something breaks.

This is not unusual. It is the default state for teams that treat observability as something you configure after provisioning an environment, rather than part of provisioning itself.

TL;DR

The Prometheus + Loki + Grafana stack covers infrastructure metrics and application logs automatically. Custom application metrics still require code-level instrumentation.
Multi-environment observability breaks down not at setup, but over time — through version drift, storage misconfiguration, and scrape failures that produce silence instead of errors.
An internal developer platform that provisions observability at environment creation removes per-environment setup work and the drift that follows.
Infrastructure metrics (CPU, memory, pod restarts) are not the same as application metrics. A pod with a normal CPU can be returning 500s on 30% of requests. Infrastructure metrics will not surface that.
For BYOC distribution and single-tenant enterprise deployments, centralized observability creates data residency problems. Per-environment, in-cluster observability fits better.
The ops.json instrumentation pattern reduces the friction of surfacing custom application metrics in Grafana without writing ServiceMonitor CRDs.

What Does “Built-In Observability” Actually Mean in an IDP

The phrase gets used loosely. It can mean anything from a link to your existing Datadog account to a fully provisioned monitoring stack running inside the same Kubernetes cluster as your application. Those are not equivalent.

The standard open-source stack has three components.

Prometheus is a time-series metrics scraper. It pulls data from instrumented endpoints and from Kubernetes system components at a configured interval. Two categories of data worth separating: infrastructure metrics (CPU utilization, memory pressure, pod restart counts, deployment replicas) collected automatically via kube-state-metrics and node-exporter, and application metrics (request rate, error rate, queue depth, cache hit ratio) which require explicit instrumentation in the application code. Most observability content blurs this line. The distinction matters when something goes wrong.

Loki is a log aggregator. It runs a DaemonSet-based agent called Promtail on each Kubernetes node, collecting container stdout and stderr. Unlike Elasticsearch, Loki indexes only labels — not log content. Log lines are grouped into streams and tagged with labels like pod name, namespace, and container. That keeps storage costs lower and makes logs available for querying within milliseconds.

Grafana connects to both Prometheus and Loki as data sources and is where the actual debugging happens. You see a CPU spike in Prometheus, jump to the same time window in Loki, read the log lines from the pod that caused it. Without that correlation, you are looking at two separate interfaces with no direct link.

One thing this stack does not cover: distributed tracing. Following a request through five microservices requires Tempo or Jaeger plus OpenTelemetry instrumentation — a separate layer entirely. Teams that set up Prometheus and Loki and consider themselves done will eventually hit a slow request they cannot explain, because they have no way to trace it through the service graph.

Why Multi-Environment Observability Breaks Down Without a Platform

The initial setup takes a day or two. The maintenance does not stop there.

Environment drift

Prometheus 0.62 on production, 0.58 on staging, because someone updated one and not the other. Different scrape configurations. Different alert rule formats. Dashboards exported from production as JSON fail on import to staging because label names shifted between versions. You find this out during an incident when you need staging to tell you something and it cannot.

The less visible failure mode is this: a developer ships a new feature and moves on. No metrics added, no log events instrumented, no alerts defined. The feature runs in production with no visibility into how it actually behaves under load. The gap does not surface until something breaks, at which point the investigation starts from scratch with no baseline to work from.

The node replacement problem

Prometheus stores time-series data on the local disk of whichever node it lands on. In an EKS managed node group, nodes get replaced during version updates or when the ASG replaces an unhealthy instance. If there is no PersistentVolumeClaim backed by an EBS volume that survives pod restarts, the metrics history is gone. Production setups require specifying a StorageClass like gp3 with a persistent volume claim to keep data across restarts. Most Helm-based tutorials skip this step. Teams learn it the first time a cluster update wipes two weeks of metrics.

The scrape configuration failure mode

When Prometheus fails to scrape a target, it does not throw an error. It just has no data. An engineer adds a ServiceMonitor, deploys a service, sees nothing in Grafana, and spends 45 minutes checking RBAC permissions, port declarations, pod annotations, and label selector mismatches — all of which produce the same symptom. Silence. It is one of the more tedious debugging loops in Kubernetes.

Per-environment provisioning cost

Every new environment is another Prometheus, Loki, and Grafana to install, configure, and connect. Dev, staging, production, a canary, and a few customer-dedicated deployments adds up to a non-trivial maintenance surface.

LocalOps handles this by provisioning Prometheus, Loki, and Grafana as Kubernetes companion deployments inside every environment, running in the same cluster as the application. It does not eliminate the underlying tools or their operational characteristics, but it removes the per-environment setup and the drift that accumulates when configuration is manual.

How IDP Handle Grafana and Prometheus Monitoring Out of the Box

“Out of the box” can mean a lot of things. The architecture is worth being specific about.

In a provisioned environment using EKS on AWS, an internal developer platform gives you Prometheus, Loki, and Grafana running as Kubernetes workloads inside the same VPC and EKS cluster as your application services. Prometheus scrapes internal endpoints. Loki receives logs from Promtail on each node. Grafana comes up with both already registered as data sources.

The co-location has practical consequences beyond tidiness. Prometheus scrape calls stay internal to the cluster. Loki log ingestion happens over the cluster network. For BYOC and single-tenant deployments, this matters: log and metric data does not need to cross the VPC boundary. That is often a hard requirement in enterprise procurement, not a preference.

LocalOps follows this model — Prometheus, Loki, and Grafana are provisioned as companion Kubernetes deployments inside every environment, with data sources pre-registered, running in the same cluster as the application.

“Pre-configured” means Grafana has Prometheus and Loki registered as data sources before anyone logs in. No separate step where someone types the Prometheus service name into the UI, tests the connection, and troubleshoots an unhelpful error because the URL had a typo. That step is where a significant portion of manual setups fail.

Worth being clear about one thing: infrastructure metrics and logs are available immediately. Custom application metrics still require instrumentation at the code level. The platform handles the plumbing; it does not write your /metrics endpoint for you.

Infrastructure Metrics vs. Application Metrics

A Grafana dashboard showing CPU, memory, and pod restart counts does not mean your application is working correctly. A pod running at 40% CPU with zero restarts can still return database timeout errors on every third request. Infrastructure metrics will not show that. Prometheus will not show it unless someone instrumented the application.

Infrastructure metrics are collected automatically once Prometheus is running: node CPU, memory, pod phase, deployment replicas, persistent volume capacity. They tell you whether the cluster is healthy. They do not tell you whether the application is doing what users expect.

Application metrics require the application to expose a /metrics endpoint using a Prometheus client library. Go, Java, Python, and Rust have official libraries. Node.js, Ruby, and others have community-supported ones. The application defines counters, gauges, and histograms and exposes them at the endpoint. Prometheus scrapes it on a configured interval.

The metrics that reflect actual user experience sit in this second category. Request rate per endpoint. p95 and p99 latency. Background job processing time. External API failure rate. Queue depth. These are the signals that tell you whether the system is working from a user’s perspective — not just from the cluster’s.

Getting application metrics into Grafana manually involves: instrumenting the application with a client library, exposing /metrics, creating a ServiceMonitor CRD, matching label selectors to the correct Prometheus instance, verifying RBAC for cross-namespace access, and confirming scrape status in the Prometheus targets UI. When any of these steps is wrong, the metrics do not appear. There is no error that tells you which step failed.

Here’s how LocalOps handles this with a declaration in ops.json:

json

{

“metrics”: {

“endpoint”: “/metrics”,

“interval”: 15

}

The platform registers the endpoint with Prometheus. The custom metrics appear in Grafana. The ServiceMonitor, RBAC binding, and Prometheus configuration are handled without a separate debugging loop.

If you want to walk through this on your own codebase, our engineers can show you how it works in a live environment.

How Do You Give Developers Log and Metric Access Without Exposing Cloud Infrastructure?

The obvious answers create problems. Giving developers kubectl access to production is a large blast radius. Giving broad IAM read access to the AWS account raises audit and compliance concerns. Neither is a clean answer.

Grafana works as the access layer because it separates observability from infrastructure control. Developers query logs in LogQL and metrics in PromQL through the Grafana UI. They get visibility into system behavior without needing direct cluster or cloud console access.

Through Grafana, a developer can see application logs from their services, infrastructure metrics for the cluster, deployment timestamps, and resource utilization. With proper access controls configured, they cannot touch the Kubernetes control plane, access cloud account credentials, or see data from unrelated services or environments.

The separation matters for a specific reason: observability tells you what happened. It does not give you the ability to change anything. Developers can investigate. Operational control stays restricted.

This carries over to customer-dedicated environments. A vendor’s engineering team can access Grafana for a specific customer environment, review logs and metrics, and debug a support issue — without needing IAM access to the customer’s AWS account. The observability layer has the data. The underlying infrastructure stays isolated.

Per-Customer Observability: Why Centralized Monitoring Breaks Down for BYOC Deployments

When a B2B SaaS company starts supporting enterprise customers who need their own cloud infrastructure — for data residency, compliance, or isolation — the observability model gets complicated fast.

The common reaction is Prometheus federation: a central Prometheus instance in the vendor’s account scrapes from per-customer Prometheus instances. Mattermost documented this pattern across multiple Kubernetes clusters and multiple AWS VPCs. The implementation involved a central monitoring cluster, cross-VPC networking, private load balancers, Route 53 private hosted zones, and a Lambda function to handle dynamic cluster registration. It works. It is also a substantial ongoing infrastructure commitment.

And it still does not fully solve the log problem. In regulated industries like financial services or healthcare, application logs frequently cannot leave the customer’s cloud account. Metrics are sometimes acceptable to aggregate centrally. Logs often are not.

A per-environment, in-cluster model fits this constraint better. Each customer environment runs its own Prometheus, Loki, and Grafana within its VPC. Logs and metrics stay within the account boundary. Vendor engineers access that environment’s Grafana when debugging.

The tradeoff is real: there is no centralized view across all customer environments. Aggregating insight across customers requires additional work. For teams where data residency is a hard requirement, that tradeoff is usually unavoidable.

LocalOps provisions the observability stack inside every environment it creates, including BYOC deployments. Each environment runs its own Prometheus, Loki, and Grafana within the customer’s VPC. The provisioning is consistent across environments because it comes from the same template — which reduces setup variation without centralizing data.

What Internal Developer Platform Architecture Should Include for Observability

In platform engineering, an internal developer platform is only as useful as the capabilities it provisions consistently. Observability is one of the first gaps that shows up when that consistency is missing. Not all IDPs handle it the same way, and the differences show up in how much manual work is required and how the system holds up across many environments.

A per-environment Prometheus instance prevents metric label conflicts between environments and supports the isolation that BYOC deployments require. Loki running inside the same cluster keeps log ingestion internal to the cluster network and makes logs available quickly without routing data outside the VPC.

Grafana should come pre-configured with Prometheus and Loki as data sources — not because this is difficult to do manually, but because it is reliably skipped or done incorrectly in manual setups. Persistent storage for Prometheus needs a PersistentVolumeClaim backed by something durable like EBS, with appropriate retention and capacity planning. Without it, node replacement erases your metrics history.

There should be a way to declare application metrics endpoints without writing ServiceMonitor resources. This lowers the bar for teams that want custom instrumentation but do not want to debug Kubernetes resource configurations to get there. Access to observability should flow through Grafana with role-based controls, not through direct cluster access.

Consistency across environments is worth treating as a requirement, not a nice-to-have. Dashboards and queries built on staging should work on production without modification. That only holds if labeling, naming, and configuration are consistent across environments from the start.

One constraint worth stating plainly: in BYOC and single-tenant architectures, the observability system should not route customer logs or metrics through the vendor’s cloud account unless the customer has explicitly agreed to that. In regulated industries, it is a procurement blocker.

LocalOps provisions the observability stack inside the target cloud account for each environment. Each environment runs its own Prometheus, Loki, and Grafana within the customer’s VPC.

DIY Observability vs. a Cloud-Native IDP That Provisions It for You

One question that comes up often when teams think about how to build an internal developer platform is where observability fits, whether it gets provisioned upfront or bolted on later. Building the Prometheus + Loki + Grafana stack yourself is not technically hard. Helm charts exist, documentation is reasonable, and most engineers can get a working setup in a day.

The problem is not the first installation. It is the second, third, and eighth.

Every new environment — staging, canary, customer-dedicated deployment — repeats the same process. Different engineers make slightly different choices. Chart versions differ. Storage configurations vary. Six months later, Prometheus versions differ across environments, scrape intervals are inconsistent, and dashboards built on staging do not work on production because label names drifted.

DIY also means owning the operational surface. Prometheus runs out of memory on a high-cardinality workload: your alert, your fix. Loki fills its disk because someone added verbose logging to a worker: your PVC resize, your pod restart, your lost logs. These are ongoing responsibilities, not one-time setup tasks.

A cloud-native IDP that provisions observability as part of environment creation reduces that maintenance overhead. The stack comes from a consistent template. Configuration drift is reduced because the manual step that causes drift is removed — not because the tools behave differently.

The tradeoff is real. You give up some configuration flexibility in exchange for not owning the full operational burden. For teams whose job is building product rather than maintaining monitoring infrastructure, that is often a reasonable trade — though it is worth understanding what you are giving up before making it.

FAQs

1. What observability tools should be built into an internal developer platform?

At minimum: Prometheus for metrics, Loki for log aggregation, and Grafana as the visualization and correlation layer. These three cover infrastructure metrics automatically and application metrics when services expose a /metrics endpoint. The best internal developer platforms include all three as part of environment creation, not as a post-setup task. Distributed tracing requires a separate tool like Tempo or Jaeger plus OpenTelemetry, and is worth planning for before you need it.

2. How do internal developer platforms provide Grafana and Prometheus monitoring out of the box?

By provisioning Prometheus, Loki, and Grafana as Kubernetes workloads during environment creation, running inside the same cluster as the application, with Grafana already configured to connect to both. No manual data source setup, no scrape configuration to write. The stack is functional before any application code is deployed.

3. How do you get logs from multiple Kubernetes environments without configuring Grafana manually each time?

Use an IDP that provisions a per-environment Loki and Grafana stack as part of environment creation. Each environment gets an identical stack with Loki already registered as a Grafana data source. Developers access logs through that environment’s Grafana. No separate login, no data source configuration, no inconsistencies between environments.

4. What is the difference between an internal developer portal and an internal developer platform?

A portal like Backstage is a UI layer: software catalog, documentation links, embedded dashboards. Teams evaluating a Backstage internal developer platform setup often find that Backstage handles the portal layer well but still requires a separate solution for provisioning, environment management, and observability. A platform handles all of that. A portal surfaces information about your stack. A platform creates and maintains it.

5. Does an open source internal developer platform work for production observability?

Yes. Prometheus, Loki, and Grafana are open source and run on standard Kubernetes. The tools are production-ready. The question is whether your team has capacity to provision, configure, and maintain them consistently across every environment you run. The tooling cost is zero. Ensuring they are provisioned, configured, and maintained consistently across environments is where the effort accumulates.

Conclusion

Most teams spend a day or two getting Prometheus and Grafana running on a new cluster. That feels like a one-time investment. It is not.

Every environment added is another Grafana instance to update, another Prometheus scrape configuration to maintain, another persistent volume to size correctly, and another set of dashboards to keep in sync with production. That cost scales with the number of environments, not the size of the team. A four-person team running eight environments carries a monitoring surface that grows with every new deployment.

According to the 2024 DORA report, elite engineering teams recover from failed deployments significantly faster than low performers. Observability is one contributing factor in that gap. Teams that can quickly see what broke, where, and when spend less time navigating fragmented monitoring setups and more time fixing the issue.

An internal developer platform does not replace Prometheus, Loki, or Grafana. It reduces the repeated provisioning work and limits the configuration drift that makes these tools harder to operate across multiple environments.

If you are running a single environment with dedicated infrastructure support, a manual setup can be sufficient. As the number of environments grows across stages, regions, or customer accounts, the provisioning overhead compounds, and the maintenance cost tends to surface during incidents when time matters most.

If you’re thinking through how to standardize observability across multiple environments or reduce the maintenance overhead that comes with scaling clusters, then LocalOps team can help you work through it:

Book a Demo - Walk through how environments, deployments, and AWS infrastructure are handled in practice for your setup.

Get started for free - Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs - A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Rails Hosting After Heroku: The Best Alternatives for Production Ruby Applications

Nidhi Pandey — Thu, 23 Apr 2026 08:02:08 GMT

TL;DR

What this covers: Why Heroku’s architecture creates specific failure modes for Rails applications at production scale, what capabilities a genuine Rails hosting alternative must provide, how modern platforms handle the workloads Heroku manages with fragile add-ons, and what a production-grade CI/CD pipeline looks like for a Rails team moving off Heroku.

Who it is for: CTOs and VPs of Engineering running Ruby on Rails applications on Heroku who are evaluating production alternatives, specifically teams with Postgres databases, Sidekiq background workers, and growth-stage traffic that is making Heroku’s limitations visible.

The conclusion: Heroku was the right Rails hosting choice for a long time. It understood the Rails application model, handled Procfile-based process management naturally, and abstracted infrastructure decisions that most Rails teams did not need to make. The reason teams move off it is not that Heroku stopped understanding Rails, it is that Rails applications at production scale need infrastructure capabilities that Heroku’s architecture cannot provide: persistent stateful workloads that survive dyno cycling, background job queues that do not depend on fragile add-on integrations, Active Storage and Action Cable deployments that work without platform workarounds, and CI/CD workflows that match modern Git-based development practices. This guide covers what those capabilities look like on a modern alternative.

See what your Rails stack looks like on AWS (EKS + RDS + Redis + S3), fully set up in your own account
→ Get a live environment in under 30 minutes

The Best Heroku Alternative for Rails in Production: What the Stack Needs to Cover

For a Rails application running in production with Postgres, Sidekiq, and real traffic, the hosting alternative needs to satisfy a specific set of requirements. The list is not long, but each item is non-negotiable for a production-grade deployment.

A genuine Rails hosting alternative in 2026 needs to handle the full application model:

Want to see how this full Rails stack is provisioned (EKS, RDS, Redis, S3) without writing Terraform or Kubernetes YAML?
→ Explore how LocalOps sets up production-ready infrastructure

The platform that comes closest to this for Rails teams is an AWS-native Internal Developer Platform like LocalOps. Not because AWS is the only option, but because the combination of EKS for persistent workloads, RDS for Postgres, ElastiCache for Redis, and S3 for Active Storage maps the Rails application model to managed AWS services that are production-proven, priced on actual consumption rather than arbitrary tiers, and controllable by the platform team in ways Heroku does not allow.

Why EKS specifically for Rails workloads:

Kubernetes on EKS allows the Rails application model to be expressed correctly. Web processes, Sidekiq workers, scheduled jobs (Whenever or Sidekiq-Scheduler), and cable servers can all run as separate Kubernetes Deployments with independent scaling policies, independent resource allocation, and independent restart behaviour. A Sidekiq worker that needs to process a memory-intensive job can be allocated 2GB RAM without affecting the web dyno’s resource allocation. A web process under traffic pressure can scale to fifteen replicas without triggering a Redis tier jump. None of this requires application code changes; it requires a hosting model that cleanly expresses the Rails multi-process architecture.

See how Rails workloads map to Kubernetes in practice (web, Sidekiq, schedulers, Action Cable)
→ Book a walkthrough with an engineer

Unified Platform for Rails, Node.js, Python, Django, and Go: Why It Matters for Platform Teams

Most engineering organisations running Rails are not running only Rails. The Rails application is the core product, but the surrounding architecture includes Node.js services for specific workloads, Python services for data processing or ML inference, Go services for high-performance API layers, and Django applications for internal tooling.

The failure mode of Heroku at this scale is not that it cannot run these workloads; it can, but that each language runtime requires a separate buildpack, each buildpack has its own behaviour and failure modes, and the operational model across language runtimes is inconsistent enough to create cognitive overhead for the platform team.

More significantly, Heroku’s observability story is fragmented across services regardless of language. A Rails service and a Python service running on the same Heroku team both send logs to the same Papertrail drain, and both have their own New Relic agents, but correlating a request that touches both services during an incident requires stitching together data from multiple places with no unified service map.

Running multiple services beyond Rails?
→ See how LocalOps handles multi-service deployments on one platform

What a unified platform looks like across language runtimes:

A container-native deployment platform is inherently language-agnostic. Docker containers encapsulate the runtime; the platform does not know or care whether the application inside is Rails, Django, Node.js, or Go. The deployment model is identical: push to branch, platform builds the container image, deploys to Kubernetes, serves traffic.

LocalOps handles multi-language stacks with a consistent deployment model across all services. A Rails web process, a Python background worker, a Node.js API service, and a Go microservice all deploy through the same pipeline, log to the same Loki instance, and surface metrics in the same Grafana dashboard. An incident that crosses service boundaries, a Rails request that calls a Python inference service that produces a slow response, is traceable in a single observability interface.

For platform teams managing heterogeneous stacks, this consistency is the difference between having a platform model and having a collection of separately managed services with a common billing account.

CI/CD consistency across language runtimes:

The CI/CD story on Heroku is language-runtime-specific. Rails applications use the Ruby buildpack. Node.js applications use the Node.js buildpack. When buildpack versions change, when build-time dependencies differ across services, and when environment variable requirements differ between language runtimes, the CI/CD behaviour is inconsistent in ways that are difficult to reason about at the platform level.

A container-native platform uses Dockerfiles (or auto-generated container builds) that fully specify the build environment per service. The build environment for a Rails application, Ruby version, bundler version, Node.js version for asset compilation, and system library dependencies is explicitly declared and reproducible. The same Dockerfile builds the same image in every environment, eliminating the class of “works on staging, fails on production because the buildpack version differs” failures that Heroku’s buildpack model produces.

How Modern Alternatives Handle Rails-Specific Workloads Heroku Manages Badly

Active Storage: From Workaround to Native

On a container-native platform using AWS EKS, Active Storage’s persistent storage requirement is satisfied without application-level workarounds. The Rails application mounts S3 as its Active Storage backend, not as a compromise, but as the correct production architecture. File uploads go directly to S3. Variant generation stores outputs to S3. Temporary files in the container filesystem are genuinely temporary and do not affect application state.

The operational difference from Heroku is that the ephemeral filesystem constraint no longer shapes application behaviour. Variant generation works predictably. Direct upload flows do not depend on temp file staging that survives in some dyno configurations and fails in others. The application behaves consistently across all replicas because no replica depends on local filesystem state.

LocalOps environments include S3 bucket provisioning as part of the standard environment setup. Teams configure config/storage.yml to point at the provisioned S3 bucket. Active Storage works in production from day one without add-on configuration, drain setup, or architectural accommodations.

Action Cable: Persistent Connections Without Platform Constraints

Action Cable on Kubernetes eliminates the connection timeout constraints that shape Action Cable architecture on Heroku. Kubernetes pods support long-lived WebSocket connections natively. The platform does not impose connection timeouts that require client-side reconnection logic as a reliability mechanism.

The Redis connection count problem from Heroku also resolves structurally. On EKS with ElastiCache, Redis connection limits are governed by the ElastiCache node type’s actual capacity, not by Heroku’s tier pricing model, which forces Redis upgrades based on connection count thresholds rather than actual resource consumption. A Rails application with 500 concurrent Action Cable subscribers connecting through 10 web pods uses 10 Redis connections from the server side. ElastiCache at an appropriate node type handles this without tier-jump pricing pressure.

Sidekiq: Persistent Workers Without Daily Interruption

Sidekiq workers on Kubernetes run as persistent Deployments. They are not subject to daily restarts as part of normal platform operation. They restart only when a new deployment is pushed or when a pod fails health checks.

When a new deployment is pushed, Kubernetes performs a rolling update: new Sidekiq pods start and become healthy before old ones terminate. Sidekiq’s shutdown signal handling (SIGTERM) gives in-flight jobs a configurable timeout to complete before the process exits. Jobs that cannot complete within the timeout are requeued for processing by the new pod.

This is meaningfully more reliable than Heroku’s daily dyno restart model for Sidekiq workloads. Teams running batch processing, data pipelines, or long-running background jobs on Sidekiq find that the daily restart window on Heroku requires explicit engineering investment to handle safely, retry logic, idempotency guarantees, and job state persistence. On Kubernetes, the restart behaviour is controlled, predictable, and aligned with how long-running background workloads should behave.

Asset Pipeline: Reproducible Builds Without Buildpack Fragility

Container-based builds for Rails applications handle asset precompilation in a Dockerfile layer that is fully specified and reproducible. The build environment, Ruby version, Node.js version, and system library versions are declared explicitly in the Dockerfile. The assets: precompile step runs in the same environment on every build, in every environment. There is no buildpack version drift, no build-time environment variable injection that affects compilation behaviour, and no memory pressure from shared build infrastructure.

Teams running Webpacker, Shakapacker, or Vite Ruby alongside Sprockets benefit directly from this model. The JavaScript build toolchain is specified in the Dockerfile, Node.js version pinned, npm dependencies installed from package-lock.json, Webpack or Vite build executed in the same layer that has access to the full build environment. The compiled assets are baked into the container image and deployed consistently to every replica. There is no per-dyno asset compilation, no CDN configuration required to make assets available across dynos, and no compile-time failures caused by environment configuration differences between the build environment and the runtime.

Why Heroku Is Architecturally Unsuitable for Persistent, Stateful Workloads

This is the structural incompatibility that underlies most of the Rails-specific failure modes described above, and it is worth stating clearly rather than leaving it implicit.

Heroku’s architecture is designed for stateless, ephemeral processes. The dyno model treats compute as fungible: dynos start, serve requests or process jobs, and stop. The ephemeral filesystem means no state persists between dyno restarts. The daily restart cycle means no process runs indefinitely. This model is intentional; it makes Heroku simple to reason about and resilient to individual process failures.

The problem is that production Rails applications are not fully stateless and ephemeral. They have components that are legitimately persistent and stateful:

Background job queues (Sidekiq) maintain in-progress job state in memory during execution. A job interrupted mid-execution is in an indeterminate state. Heroku’s restart model treats this as acceptable because the ephemeral design expects processes to be interruptible. Sidekiq’s operational model expects processes to complete in-flight work before restarting.

WebSocket connections (Action Cable) are inherently persistent. A WebSocket connection to a Heroku dyno is subject to the dyno’s connection timeout policies and restart behaviour. The connection abstraction that Heroku provides is not designed for long-lived stateful connections.

File system operations (temp file staging, variant generation, direct upload flows) assume that the local filesystem persists at least for the duration of a request-response cycle and in many cases across requests. Heroku’s ephemeral filesystem satisfies the first assumption but not reliably the second, particularly across dyno restarts.

What modern infrastructure design does instead:

Kubernetes pods on EKS are persistent by default. A pod runs until explicitly replaced by a deployment update or until it fails a health check. Pods are not subject to scheduled daily restarts. Persistent volumes can be mounted for workloads that genuinely need local filesystem persistence. Long-lived connections are supported without platform-imposed timeouts.

The design principle is different: instead of treating all compute as ephemeral and requiring applications to accommodate that, Kubernetes allows workloads to declare their persistence requirements explicitly. Stateless web processes use rolling deployments and horizontal autoscaling. Stateful background workers use persistent pod lifecycles with graceful shutdown handling. Workloads with persistent storage requirements mount persistent volumes. Each workload type gets the persistence model it needs.

For Rails applications, this means the application can be designed around what the feature needs rather than around what the platform will reliably support. That is the structural difference between Heroku and modern alternatives for production Rails workloads.

Why CI/CD Workflows Built on Heroku Fail at Scale, and What Production-Grade Looks Like

Heroku’s deployment model was built around a specific Git-based workflow: push to a branch, Heroku deploys. For individual developers or small teams, this is simple and effective. For engineering organisations with multiple teams, feature branch workflows, review environments, and deployment promotion across staging and production, the model breaks in specific ways.

The specific CI/CD failure modes on Heroku:

No native review app reliability at scale. Heroku’s review apps feature spins up ephemeral environments per pull request. In practice, review apps on Heroku have reliability problems at scale: slow provisioning, environment variables that do not correctly inherit from the parent app configuration, and ephemeral environments that do not accurately replicate production because Heroku’s add-on provisioning in review apps does not match production configuration.

The deployment pipeline lacks integration with modern Git workflows. Heroku Pipelines, the mechanism for promoting builds from staging to production, works for simple linear workflows. Teams using trunk-based development with feature flags, teams with multiple staging environments for different workstreams, or teams that need to deploy specific commits rather than the latest main branch find that Heroku’s pipeline model does not accommodate their workflow without significant workarounds.

Build failures are opaque. When a Heroku build fails, during slug compilation, during buildpack execution, or during asset precompilation, the failure message is often insufficient to diagnose the root cause quickly. Buildpack builds are black boxes with limited introspection. Teams spend engineering time decoding build failures that a container build with explicit Dockerfile layers would surface clearly.

No native canary or blue-green deployment. Heroku Pipelines support build promotion but not sophisticated deployment strategies. Blue-green deployments require manual Heroku preboot configuration. Canary deployments, routing a percentage of traffic to a new version to validate before full rollout, are not natively supported. For teams deploying to production multiple times per day with reliability requirements, the absence of native traffic-splitting deployment strategies is a meaningful operational gap.

What a production-grade Rails CI/CD pipeline looks like:

A production-grade CI/CD pipeline for a Rails application in 2026 has a few defining characteristics:

Container-based builds that are environment-consistent. The same Docker image that is tested in CI is deployed to staging and then to production. There is no slug re-compilation, no buildpack re-execution, no environment-specific build behaviour. The image is built once, tested, and promoted. What passes CI is exactly what runs in production.

Branch-triggered environment provisioning. Feature branches trigger the provisioning of isolated review environments automatically. The review environment is not a stripped-down approximation of production; it is the same Kubernetes deployment with the same configuration, the same managed Postgres instance, and the same observability stack. Developers can test against an environment that accurately represents production behaviour.

Deployment strategy configuration per service. Rolling deployments by default. Blue-green for services where zero-downtime cutover is critical. Canary traffic splitting for high-risk releases. These are configuration choices per service, not platform limitations that require workarounds.

Integrated observability in the deployment event stream. Deployments appear as events in the metrics and log timeline. When a deployment at 2:35 PM correlates with an error rate spike at 2:36 PM, that correlation is visible in Grafana without cross-referencing deployment logs in a separate interface.

LocalOps delivers this pipeline model for Rails teams. Push to branch triggers a Docker build, image push to ECR, and deployment to the target EKS environment automatically. Review environments are provisioned on pull request open and torn down on merge. Deployments are visible as events in the Grafana dashboard. Rolling deployments run by default. The pipeline configuration is per-service and per-environment, not global and opaque.

How LocalOps Addresses the Full Rails Production Stack

LocalOps is an AWS-native Internal Developer Platform built for teams, replacing Heroku. For Rails teams specifically, it handles the full production stack:

Web processes run on EKS with horizontal autoscaling driven by CPU and request rate metrics. Web dynos scale out under traffic pressure and back in during off-peak periods. No manual dyno count management.

Sidekiq workers run as persistent Kubernetes Deployments with independent resource allocation from web processes. Daily restarts do not occur. Rolling deployments give in-flight jobs time to complete before old pods terminate.

Postgres runs on Amazon RDS with automated backups, read replica support, Multi-AZ availability, and storage autoscaling. Pricing scales with actual resource consumption, not with row count tiers.

Redis runs on Amazon ElastiCache with connection count driven by actual workload, not by pricing tier thresholds. Sidekiq, Action Cable, and Rails cache all share the ElastiCache instance with appropriate namespace separation.

Active Storage connects to S3 buckets provisioned as part of the environment. No architectural workarounds. No ephemeral filesystem fragility.

Action Cable runs on web pods without platform-imposed connection timeouts. Redis pub/sub backend connects to ElastiCache.

Asset pipeline builds inside Docker during CI. Compiled assets are baked into the container image. No per-dyno compilation, no buildpack fragility.

Observability, Prometheus, Loki, and Grafana are included in every environment. Rails application metrics, Sidekiq job throughput and error rates, database performance, and Redis connection metrics are all available from day one without add-on configuration.

Sign up for free at LocalOps, and connect your AWS account. Connect your GitHub repository. LocalOps provisions the full environment automatically in under 30 minutes. First Rails application deployed to AWS without writing Terraform, Helm charts, or Kubernetes YAML.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” – Gaurav Verma, CTO and Co-founder, SuprSend
Get a live environment in under 30 minutes

Frequently Asked Questions

What is the best Heroku alternative for Rails in production with Postgres, Sidekiq, and autoscaling?

For teams that need production-grade Rails hosting with all three, Postgres, Sidekiq, and horizontal autoscaling, an AWS-native Internal Developer Platform like LocalOps is the strongest option. It maps the full Rails application model to managed AWS services: EKS for web and worker processes, RDS for Postgres, ElastiCache for Redis and Sidekiq, and S3 for Active Storage. Autoscaling runs horizontally on EKS rather than vertically through dyno tiers. Sidekiq workers run as persistent Kubernetes Deployments without daily interruption. The developer experience stays close to Heroku, push to branch, service deploys, without requiring Kubernetes expertise from the engineering team.

How does the Sidekiq migration from Heroku to Kubernetes actually work?

Sidekiq on Kubernetes runs as a separate Kubernetes Deployment from the Rails web process. The Procfile convention from Heroku maps directly: worker: bundle exec sidekiq becomes a separate Deployment in Kubernetes with its own replica count, resource allocation, and scaling policy. LocalOps reads the Procfile and generates the corresponding Kubernetes resources automatically during environment setup. The Sidekiq configuration, concurrency, queue weights, and Redis connection URL are passed through environment variables or Kubernetes secrets exactly as on Heroku. The operational difference is that Sidekiq workers are no longer subject to daily dyno restarts, and rolling deployments give in-flight jobs time to finish before old pods terminate.

Can teams run Rails, Node.js, and Python services from the same deployment platform on LocalOps?

Yes, LocalOps deploys any containerised workload to the same EKS cluster. A Rails API, a Node.js frontend server, and a Python ML inference service all deploy through the same pipeline, log to the same Loki instance, and surface metrics in the same Grafana dashboard. Language runtime differences are encapsulated in each service’s Dockerfile. The platform layer is language-agnostic. For platform teams managing heterogeneous stacks, this means a consistent deployment model, consistent observability, and consistent incident response across all services regardless of language runtime.

What does the Rails asset pipeline look like in a Docker-based deployment?

The standard pattern for Rails asset precompilation in Docker uses a multi-stage build: a build stage installs all dependencies, runs bundle exec rails assets: precompile, and produces compiled assets; a production stage copies only the compiled assets and production dependencies from the build stage, leaving build tooling behind. This produces a smaller container image than single-stage builds and ensures that asset compilation always runs in a controlled, reproducible environment. Node.js is available during the build stage for Webpacker, Shakapacker, or Vite Ruby compilation. The compiled assets are baked into the image and consistent across all replicas, no per-dyno compilation, no CDN required to serve assets consistently across a multi-replica deployment.

How do review environments on LocalOps compare to Heroku Review Apps for Rails applications?

LocalOps provisions review environments as full Kubernetes deployments in isolated namespaces upon opening a pull request. The review environment includes the Rails application, a Postgres database (RDS instance or shared cluster with namespace isolation), Redis, and the full observability stack. It is not a stripped-down approximation of production; it is the same deployment configuration. Environment variables are inherited from the parent environment’s configuration. The review environment tears down automatically on PR merge or close. For Rails teams doing feature branch development, the practical difference from Heroku Review Apps is reliability: review environments on Kubernetes are provisioned consistently and behave like production rather than like an approximation of it.

Why is Heroku architecturally unsuitable for long-running Rails background jobs?

Heroku’s dyno model is built for ephemeral processes that start, do work, and stop. Daily dyno restarts are a feature of this model, not a bug; the platform is designed to treat compute as interruptible. Sidekiq workers are not interruptible without consequences: a job interrupted mid-execution may leave application state in an inconsistent condition, and the engineering investment required to make every job safely interruptible is significant. On Kubernetes, pods are persistent and only restart when explicitly replaced by a deployment update or when health checks fail. Deployment updates use rolling restart with a configurable termination grace period that allows Sidekiq to finish in-flight jobs before the old pod exits. The platform’s restart model is aligned with how long-running background workloads actually behave.

Key Takeaways

Rails teams leave Heroku for a specific set of reasons that are tied to the Rails application model, not to generic infrastructure concerns. Sidekiq workers that cannot run persistently. Active Storage that requires application-level workarounds for ephemeral filesystem constraints. Action Cable deployments shaped by connection timeout policies and Redis tier pricing. Asset pipeline builds that fail in opaque ways due to buildpack environment drift. CI/CD workflows that do not accommodate modern Git-based development practices at the team scale.

These failure modes share a common root: Heroku’s architecture optimises for stateless, ephemeral processes. Production Rails applications need a hosting model that correctly handles persistent, stateful workloads alongside the stateless web tier.

Modern alternatives built on AWS address this structurally. EKS runs persistent workloads with graceful shutdown handling. RDS provides Postgres without tier-jump pricing. ElastiCache provides Redis without connection-count-driven pricing pressure. S3 makes Active Storage work correctly without filesystem workarounds. Container-based builds make asset precompilation reproducible. Rolling deployments with graceful termination make Sidekiq migrations safe.

What changes when Rails teams move to LocalOps: the infrastructure is in the team’s AWS account, workloads run persistently without daily interruption, observability is integrated rather than assembled from add-ons, and the CI/CD pipeline accommodates modern Git workflows rather than constraining them.

What stays the same: the deployment model. Push to branch, service deploys. The Rails application does not know it moved.

Get Started with LocalOps → First Rails production environment on AWS in under 30 minutes. No credit card required.

Schedule a Migration Call → Our engineers walk through your specific Rails stack, Sidekiq configuration, Active Storage setup, Action Cable deployment, and map the migration path.

Read the Heroku Migration Guide → Full technical walkthrough: database migration, Sidekiq configuration, DNS cutover, asset pipeline setup.

How to Provision a Production-Grade AWS Environment Without Writing Terraform

Madhushree Sivakumar — Thu, 23 Apr 2026 04:16:09 GMT

If you are running a 15 to 50-person engineering team, there is a good chance someone on your team is maintaining Terraform instead of building product. Not because they want to. Because someone has to.

Spinning up a production-grade environment on AWS is not a weekend project. The networking alone — VPC, subnets, NAT gateway, security groups — takes time to get right. Add EKS, IAM roles, secrets management, observability, and per-environment configuration, and you are looking at several days to a week of infrastructure work before the first feature deploys. Each new environment after that adds incremental overhead and increases the risk of configuration drift between staging and production.

At some point, the question stops being “how do we configure this correctly” and starts being “why is this our problem to own.”

Internal developer platforms exist to answer that question. This post covers what a production-grade AWS environment actually requires, what it costs to maintain that setup in-house across environments, and how an IDP changes the tradeoffs for engineering teams that would rather be building.

TL;DR

A production-grade AWS environment needs more than most teams plan for: a full networking stack (VPC, private and public subnets, NAT gateway), a managed Kubernetes cluster (EKS), compute nodes, a load balancer, IAM, secrets management, encryption, observability, and backup configuration.
Setting up this stack in Terraform typically takes several days to a week depending on complexity and team experience. The second and third environments are where state management overhead and configuration drift start compounding.
Every environment your team cannot spin up quickly is a bottleneck: slower staging, slower customer onboarding, slower incident isolation.
Internal developer platforms handle the provisioning layer so application developers do not have to. Developers get self-serve environments, GitHub-push deployments, and AWS resource access without managing infrastructure code directly.

What a Production-Grade AWS Environment Actually Needs

Most teams start with a rough mental model: a cluster, a database, some networking. The actual list of what constitutes a production-grade AWS environment is longer, and the gaps between what teams think they need and what they actually need tend to surface at the worst possible time.

Here is what a complete setup requires.

Networking

A dedicated VPC with a defined CIDR block. Private subnets for application workloads, with no public IP on application nodes. Public subnets for ingress components like load balancers. A NAT gateway with an Elastic IP in the public subnet so private subnet nodes can make outbound calls for pulling container images and reaching external APIs. An Internet gateway attached to the VPC. For private clusters, VPC endpoints for services like STS, ECR, and S3 are often required for integrations to work reliably.

Compute and orchestration

A managed Kubernetes cluster (EKS), where AWS runs the control plane (API server, etcd, scheduler) across multiple availability zones. Worker nodes run as EC2 instances in private subnets, attached to the cluster as a node group, with EBS volumes. Inbound traffic reaches workloads through an ALB or NLB provisioned via Kubernetes ingress or a Service of type LoadBalancer, not at the cluster level directly.

IAM

Node instance profiles for baseline EC2 permissions. IRSA (IAM Roles for Service Accounts) for pod-level AWS access, using OIDC-based access control so individual pods get scoped permissions rather than sharing the node role. Least-privilege policies per resource. Getting this wrong is how production incidents happen.

Secrets management

AWS Secrets Manager or SSM Parameter Store for credentials and environment-specific config. AWS’s own guidance for EKS uses IRSA and OIDC for Secrets Manager access from pods, so this ties directly to the IAM setup above.

Security

Encryption at rest on EBS volumes and managed data stores. Security groups in AWS are stateful allow rules, so the practical goal is tightly scoped ingress, minimal egress, and private-only access for databases and caches. Cluster endpoint access control to limit who can reach the Kubernetes API.

Observability

Prometheus for cluster and node metrics, Loki or CloudWatch for log aggregation, Grafana as the unified interface. Without this, debugging a live incident means guessing. AWS supports Prometheus-based metrics collection through Container Insights, but it still requires setup.

Backup and recovery

Automated backups for managed data stores with defined retention. AWS Backup can cover EKS cluster state and persistent storage, but a backup without a tested restore procedure is not a recovery plan.

Autoscaling and node lifecycle

Cluster autoscaler or Karpenter for node scaling. A defined strategy for node upgrades, because EKS minor version support windows are finite and upgrades require planning.

The sum of this is not exotic. It is the baseline. But it is also a significant number of infrastructure concerns that need to be correctly configured and kept consistent across every environment you run: production, staging, preview, and any customer-dedicated environments.

That gap between what teams plan for and what production actually requires is where most of the infrastructure overhead lives.

See what’s inside a LocalOps environment

The Real Cost of Owning Terraform at a Scaling Startup

The Terraform problem at a scaling startup is not that Terraform is hard. It is that owning it has a compounding cost that most teams only feel after they are already committed to it.

The First Environment Is Manageable

Community modules like terraform-aws-modules/vpc and terraform-aws-modules/eks handle a lot of the scaffolding. A senior engineer who knows what they are doing can get a well-configured first environment running in several days to a week. The work is real but it is bounded.

The Second Environment Is Where It Gets Complicated

Each environment needs its own state file. Teams typically end up with either Terraform workspaces, which have state isolation problems, or a directory structure like environments/prod and environments/staging where configuration is duplicated and diverges over time. Staging drifts from production. Nobody fully documents what changed or why. The engineer who built the original setup has usually moved on to other work by the time environment three is needed.

State Management Fails Quietly

Two engineers running terraform apply concurrently without remote state locking (S3 + DynamoDB) can corrupt state. Recovering from corrupted state means manual terraform state mv and terraform import operations. It is recoverable, but it takes time and focus that should be going elsewhere.

EKS Add-On Management Is a Separate Problem

Managed add-ons like CoreDNS, kube-proxy, VPC CNI, and the EBS CSI driver have independent version lifecycles from the EKS module. They fall behind cluster version requirements in ways that cause node registration failures or pod networking issues that are hard to diagnose if you have not seen them before.

Observability Is Almost Always Deferred

Terraform provisions the cluster. It does not install or configure Prometheus, Loki, or Grafana. That requires a separate Helm chart workflow. Most teams put it on the backlog and add it later under pressure, usually after a live incident makes it unavoidable.

The Actual Cost Is Not the Setup Time

It is the ongoing maintenance tax: keeping environments consistent, managing module version upgrades, handling state issues, and making sure the one person who understands the full setup is not a single point of failure. At a 20 to 40-person company, that tax comes directly out of product velocity.

A team with a dedicated platform engineer can run this well. Most scaling startups do not have that luxury.

For teams that go further, building an internal developer platform in-house adds another layer of cost on top of infrastructure: hiring for it, maintaining it, and keeping it current.

That investment is hard to justify when the underlying goal is simply to ship product reliably on AWS.

What Is an Internal Developer Platform and How Does It Change the Provisioning Model?

An internal developer platform is a layer that sits between application developers and cloud infrastructure. It owns the provisioning model, enforces production defaults, and gives developers a self-service interface that does not require infrastructure knowledge to use.

The term gets used loosely, so it is worth being precise about what an IDP actually does versus what it does not do.

What an IDP Is Not

The internal developer portal vs platform distinction is a common source of confusion. A portal gives teams a UI to browse services, view documentation, and track ownership. Backstage is a good example: it is a developer portal framework, not a platform. It does not provision infrastructure. Building a full IDP on top of Backstage means writing and maintaining plugins for environment provisioning, CI/CD, cloud resource management, and observability. That is a significant internal build investment, not a solved problem.

An IDP is also not a PaaS wrapper that puts a UI on top of managed services and abstracts away the cloud account. The distinction matters for engineering leaders: a real IDP provisions into your own cloud account, so you retain ownership of the infrastructure, the data, and the billing relationship with AWS.

What an IDP Actually Does

It owns a set of production-hardened provisioning templates and creates environments idempotently from those templates. Any engineer on the team can spin up a new environment without understanding VPC design, EKS configuration, or IAM. The platform handles what those things map to in AWS.

This is where platform engineering and internal developer platforms intersect. The goal is not to eliminate infrastructure complexity but to centralise ownership of it.

The critical distinction is between shallow and deep abstraction. A shallow abstraction wraps Terraform in a UI and still requires someone who understands the underlying resource model to maintain it. A deep abstraction owns the resource model entirely, enforces production defaults, and exposes only what developers need to configure.

What a Production-Grade IDP Provisions on AWS

When connected to AWS, Internal developer platform provisions a complete, exclusive infrastructure stack: a dedicated VPC, private and public subnets, a NAT gateway, an Internet gateway, a managed Kubernetes cluster, compute node groups with attached storage, a load balancer for inbound traffic, and a pre-configured observability suite. Each environment gets its own isolated stack. There is no shared multi-tenant infrastructure between environments.

This gives you network, data, and compute isolation by default, which matters for teams running staging environments alongside production, or operating customer-dedicated deployments.

The provisioning happens without Terraform files to write, without state to manage, and without IAM roles to configure manually. Your team connects an AWS account, creates an environment, and the platform handles the provisioning from there.

How IDPs Give Developers Self-Serve Access to AWS Without Exposing Infrastructure Complexity

Once an environment exists, application teams still need to declare what cloud resources their services depend on: databases, caches, queues, object storage. This is where most platforms fall short. They handle the networking layer but leave AWS service dependencies to the developer to figure out.

A well-built IDP solves this through a declarative service configuration file committed to the application repository. Developers declare what their service needs. The platform provisions it within predefined guardrails, wires the permissions, and injects the connection details at runtime. Without requiring routine AWS console access. Without needing to manually write IAM policies in most cases. Without credentials to rotate manually.

Here is what that looks like in practice.

Cloud Resource Dependencies

A developer declares an S3 bucket, an RDS instance, an ElastiCache cluster, an SNS topic, or an SQS queue directly in the service config file. The platform provisions these resources inside the environment’s VPC, configures encryption at rest, sets private-only network access, enables monitoring, and turns on automated backups. The IAM policies required to access each resource are generated automatically and attached to the pre-configured role the service containers run with.

This replaces a non-trivial amount of manual work. Wiring IAM for S3 access from EKS pods manually involves setting up an OIDC identity provider, creating an IAM role with a trust policy, annotating the Kubernetes service account, and mounting it correctly in the pod spec. The declarative config approach reduces this to a few lines that any developer can write without knowing what is happening underneath.

See how LocalOps provisions cloud resources

Environment Variable Injection

Resource connection details, ARNs, endpoints, hostnames, and ports are injected automatically as environment variables into the service containers. Application code reads from environment variables at runtime. There are no credentials stored in code, no secrets to manually sync across environments, and no risk of a developer hardcoding a production database URL in a staging config file.

Database Migrations and Init Jobs

Schema migrations run as init containers before the main service starts, in the correct order, with the option to run once across all pods rather than once per pod. This behavior is not straightforward to implement correctly in Kubernetes without additional logic. The developer specifies the command. The platform handles the execution model.

Health Checks

HTTP, TCP, gRPC, or shell-based health checks with configurable failure thresholds and automatic container restarts. This maps to Kubernetes liveness and readiness probes under the hood, but the developer does not need to know that or write a Helm chart to configure it.

Preview Environment Dependencies

For pull request preview environments, the service config supports a separate block for ephemeral dependencies: Postgres, MySQL, Redis, Memcache, or RabbitMQ instances that spin up per PR and are torn down when the PR closes. These run as lightweight in-cluster containers rather than managed AWS services, suited for faster, lower-cost testing rather than production parity. Production resources are explicitly excluded from preview environments via config flags, so there is no risk of a preview service accidentally pointing at production data.

The net result for engineering leaders is that developers can configure, deploy, and connect AWS-backed services without infrastructure knowledge, without routine AWS console access, and without waiting on a platform team to provision resources for them. The self-service model works because the platform owns the complexity that would otherwise require infrastructure expertise.

How Do You Let Developers Deploy to AWS With Just a GitHub Push?

The deployment model is where an IDP either earns its place or exposes its limits. Self-serve infrastructure means nothing if shipping code still requires a platform engineer to be in the loop.

A well-built IDP connects directly to the application repository. A developer pushes to a configured branch. The platform pulls the latest commit, builds the container image, and deploys it to the Kubernetes cluster automatically, without requiring developers to write or maintain Dockerfiles, Helm charts, or CI/CD pipelines directly.

For most developers, the primary interaction with the deployment system becomes a git push.

How the Deployment Pipeline Actually Works

The platform builds the container image from the application source, pushes it to a registry, and schedules the new version on the cluster. Health checks run against the new containers before traffic is gradually shifted. If they fail, the deployment does not proceed and the previous version continues running.

This is not a novel concept. It is how deployment should work for application teams. The reason it is worth stating explicitly is that most teams building on raw AWS end up owning a significant amount of CI/CD configuration to get here: GitHub Actions workflows, ECR push credentials, EKS deploy steps, rollback logic. Each of those is a maintenance surface.

Branch-Based Environment Mapping

Each service in an environment typically maps to a specific GitHub branch. Push to the staging branch and the staging environment updates. Push to the production branch and production updates. Different team members can work on separate branches mapped to separate environments without stepping on each other.

For pull requests, a full-stack preview environment can be provisioned automatically per PR, with its own ephemeral dependencies, and torn down on merge. Developers get an isolated environment for every PR without any manual setup or infrastructure request.

For Teams Migrating From PaaS

Teams moving from Heroku, Render, Vercel, or Fly.io are often already used to push-to-deploy workflows. The gap they hit when moving to raw AWS is that the deployment simplicity disappears and gets replaced with infrastructure work. An IDP preserves that deployment experience while moving the actual infrastructure to a dedicated AWS account that the team owns and controls.

Moving to an IDP significantly reduces the amount of re-platforming work by preserving the push-to-deploy experience while shifting infrastructure complexity away from developers and onto the platform.

What Happens When You Need Custom Infrastructure Beyond the Platform?

No platform covers every infrastructure requirement a growing engineering team will have. At some point, a service needs a resource the platform does not natively support: a DynamoDB table, an MSK cluster, a custom VPC endpoint, a third-party data pipeline. This is the lock-in question every engineering leader should ask before committing to any IDP.

The answer depends entirely on whether the platform gives you the primitives to extend it.

Extending the Environment With Custom IaC

A well-built IDP exposes the underlying infrastructure identifiers for every environment it provisions: the VPC ID, subnet IDs, and resource tags. With those, a team can write Terraform or Pulumi scripts that provision additional resources within the same VPC and private subnets, privately accessible from application containers, without disturbing anything the platform manages.

The custom resources sit cleanly alongside the platform-managed infrastructure. They share the same network boundary. Application services can reach them without any additional networking configuration.

The lifecycle of those custom resources is the team’s responsibility. The platform does not import or manage them. If the team deletes the environment, they need to tear down their custom Terraform stack first, before the platform removes the VPC and underlying networking. That is a reasonable tradeoff, not a gotcha.

Taking Full Ownership of the Infrastructure

Beyond extending environments, a trustworthy IDP documents a full eject path: the ability to take complete ownership of the provisioned AWS infrastructure and move it outside the platform entirely.

This matters for two reasons. First, it tells you something about how the platform is designed. A platform confident enough to document a clean exit is one that expects to earn continued use rather than trap it. Second, it de-risks the adoption decision. Committing to an IDP is easier when the downside scenario is controlled rather than open-ended.

How Often This Actually Comes Up

For most engineering teams, the custom infrastructure question comes up less often than anticipated. The declarative service config model handles the majority of AWS resource dependencies that application services need. The cases that genuinely require custom IaC tend to be specific integrations or compliance requirements that are well-defined enough to manage separately.

The right question to ask is not whether the platform handles everything. It is whether the escape hatch is clean enough to use when you need it

Internal Developer Platform Architecture

An IDP that handles AWS environment provisioning operates across three layers. Knowing where each layer sits tells you what you are delegating to the platform and what you are keeping.

The Infrastructure Layer

VPC, subnets, NAT gateway, EKS cluster, EC2 nodes, EBS volumes, load balancer, observability stack. The platform provisions these into your AWS account using internal templates. Your account gets billed. You own the resources. The platform manages their configuration and lifecycle.

This is not hosting. The infrastructure runs in your account, not the vendor’s.

The Platform Layer

Kubernetes orchestration, container scheduling, service deployment, health check enforcement, auto-scaling, auto-healing. This layer runs on top of the infrastructure. Developers do not touch it. The platform operates it.

This is also the layer that costs the most to build and maintain in-house. Teams that roll their own IDP on top of Backstage or raw Kubernetes end up owning this in full, which means hiring for it, maintaining it, and being on call for it.

The Developer Layer

A service configuration file in the application repo, a GitHub integration, and a dashboard. This is the only surface developers interact with. They declare what their service needs, push code, and check deployment status. Everything below is handled by the platform.

For the Build vs Buy Decision

Building in-house means owning all three layers. Using a platform means owning the developer layer, retaining the infrastructure layer through your AWS account, and handing the platform layer to the vendor.

The platform layer is the one most teams do not want to own. It requires Kubernetes expertise, on-call coverage, and ongoing maintenance that has nothing to do with the product being built. That is the core of the IDP value proposition, stated plainly.

See how LocalOps handles shared responsibilities

FAQ

1. What happens to our existing Terraform setup if we move to an IDP?

You do not have to throw it away. A well-built IDP provisions and manages the core infrastructure layer: VPC, subnets, EKS cluster, observability. Any custom resources your team has already built in Terraform can continue to run alongside the platform, as long as they sit within the same VPC. The IDP exposes VPC IDs, subnet IDs, and resource tags so your existing Terraform scripts can reference them directly. The practical split is: the platform owns the environment baseline, your team owns anything custom on top of it.

2. Can we trust the platform to enforce security defaults we would otherwise configure ourselves?

This depends on how the platform is built. A production-grade AWS internal developer platform should enforce encryption at rest on all storage, private-only network access for databases and caches, least-privilege IAM policies per resource, and security group rules with tightly scoped ingress by default. These should not be optional configurations. They should be on by default for every environment the platform provisions, without requiring your team to audit or maintain them manually.

3. What is the difference between an open source internal developer platform and a commercial one?

Open source options like Backstage are portal frameworks, not full platforms. They provide a service catalog and developer UI but the provisioning and operational layers still need to be built on top. That is a significant internal build investment. A commercial internal developer platform includes the full stack out of the box: infrastructure provisioning, Kubernetes orchestration, observability, and CI/CD. The tradeoff is customisability versus the ongoing cost of building and maintaining the platform layer yourself.

4. How do you choose the best internal developer platform for your AWS environment?

A few things worth evaluating. Does it provision into your own AWS account or the vendor’s? You want to retain ownership of the infrastructure and the billing relationship. How deep is the abstraction? A platform that wraps Terraform in a UI still requires someone to understand the underlying resource model. What is the escape hatch when you need custom infrastructure? And does observability come pre-configured or is it something your team sets up separately? The best internal developer platforms answer all four of these cleanly.

5. Can Terraform and an IDP coexist in the same AWS account?

Yes, and for most teams they should. An IDP handles the environment baseline that every team needs: networking, compute, Kubernetes, observability. Terraform handles the exceptions: custom integrations, compliance-specific resources, or anything outside the platform’s native support. The key is that the IDP exposes the infrastructure primitives (VPC ID, subnet IDs) so custom Terraform resources sit inside the same network boundary and are privately accessible from application services without additional configuration.

Conclusion

Most engineering leaders do not set out to own a complex infrastructure setup. It tends to happen incrementally: one environment becomes two, staging drifts from production, the engineer who built the original setup moves on, and maintaining it becomes a background cost that nobody explicitly budgeted for.

The IDP model does not make infrastructure disappear. The VPC still exists. The EKS cluster still runs. IAM policies still govern access. What changes is who is responsible for configuring and maintaining those things, and whether that responsibility sits with the people building the product or with a platform built specifically to handle it.

The tradeoff is real. You give up some direct control over the infrastructure layer in exchange for not having to maintain it yourself. Whether that is the right call depends on your team size, your infrastructure requirements, and how much engineering capacity you have available for platform work.

For teams with a dedicated platform engineer who wants to own this, building and maintaining the stack in-house is a reasonable path. For teams without that, an IDP is worth evaluating not because it solves every infrastructure problem, but because it solves the provisioning and environment management problem well enough that the rest of the team can stay focused on the product.

If you’re figuring out how this would fit into your setup, then LocalOps team can help you work through it:

Book a Demo -- Walk through how environments, deployments, and AWS infrastructure are handled in practice for your setup.

Get started for free -- Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs -- A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Heroku Observability Is Broken at Scale - Here's What Production Teams Use Instead

Nidhi Pandey — Wed, 22 Apr 2026 12:24:09 GMT

TL;DR

What this covers: Why Heroku’s observability model fails at production scale, what the real financial and operational cost of the standard Heroku monitoring stack is, what natively integrated observability looks like on a modern alternative, and how AWS maps to the full Heroku stack without requiring Kubernetes expertise.

Who it is for: CTOs and VPs of Engineering running production SaaS workloads on Heroku who are evaluating alternatives, specifically teams that want better observability without a post-migration setup project before they can operate in production.

The conclusion: Heroku’s monitoring story is not weak because Heroku chose bad add-ons. It is structurally weak because Heroku’s architecture requires you to assemble observability from disconnected third-party tools, each with its own data model, pricing tier, and alert configuration. At production scale, that model creates financial overhead, incident response friction, and operational blind spots that integrated platforms eliminate by design.

Why Heroku’s Monitoring Story Collapses at Production Scale

Heroku’s observability model made sense when most Heroku applications were single services with modest traffic. A Papertrail log drain, a basic New Relic agent, and a status page were enough to know whether the application was healthy.

The problem is not that these tools are bad. The problem is the architectural model: Heroku provides no native observability layer. Every observability capability, logs, metrics, APM, error tracking, and uptime monitoring must be sourced from independent third-party add-ons, each billing separately, each storing data in its own system, each requiring separate configuration.

At a low scale with a single service, this is manageable. At production scale with five, ten, or fifteen services, the model breaks in three specific ways.

Fragmentation makes incidents slower to resolve.

When an incident occurs, the first task is correlation: which service is affected, what changed recently, and what does the error pattern look like relative to traffic and resource usage?

On Heroku, answering these questions requires switching between tools. Logs are in Papertrail. Application metrics and traces are in New Relic or Scout. Infrastructure metrics, if they exist at all, are in a separate monitoring add-on. Each tool has its own data model, its own time axis, and its own filtering interface.

The act of correlating a spike in error rates with a recent deployment, a specific service, and an underlying resource constraint requires context-switching between three or four separate interfaces. Every switch adds friction. Under incident pressure at 2 AM, that friction adds minutes to the mean time to resolution. For SaaS applications with customer-facing SLAs, those minutes are the difference between an incident that resolves cleanly and one that triggers a customer escalation.

Add-on cost compounds with service count, not with revenue.

This is the financial dynamic that surprises engineering leaders when they examine the Heroku invoice carefully. Observability costs on Heroku do not scale with your revenue. They scale with your service count and your log volume, both of which grow with product complexity, not with business growth.

Every new service adds a New Relic agent at the APM tier pricing. Every new service adds log volume that pushes Papertrail closer to the next pricing tier. Every additional engineer writing more verbose log output accelerates the compounding. The observability bill grows faster than the application bill, and faster than revenue, at precisely the growth stage where unit economics start mattering to the board.

Coverage gaps appear exactly when you need visibility most.

Heroku’s add-on model creates a specific failure mode: teams instrument the things they thought to instrument and are blind to everything else. The standard Heroku observability stack provides application-level logs and request-level APM traces. It does not provide container-level resource utilisation, memory pressure per service, pod restart counts, or infrastructure-level metrics that distinguish “application bug” from “resource constraint” during an incident.

Teams discover these coverage gaps at the worst possible time: during an incident that turns out to be a memory leak or a resource exhaustion pattern that no application-level log or APM trace would ever surface. The gap between what the monitoring shows and what is actually happening is the gap that turns a 15-minute incident into a two-hour war room.

Want to see what integrated observability looks like on LocalOps? Schedule a walkthrough →

The Real Cost of the Standard Heroku Observability Stack

The standard production observability stack on Heroku is assembled from three to five tools. Each carries its own cost. The combined cost compounds with service count in ways that most teams do not fully account for until they audit the invoice line by line.

The standard stack and what it actually costs:

Papertrail (Log Management)

Papertrail pricing is structured around log volume: lines per month, retention period, and search capability. At low log volume with a single service, the cost is manageable. The problem is that log volume does not scale linearly with traffic. It scales with service count; more services mean more internal log output independent of external request volume, and with engineering team size, since more engineers writing more instrumentation produces higher baseline log output.

A team running five production services with reasonable log verbosity typically sits in a Papertrail tier that costs meaningfully more per month than the entry-level plan. Each additional service adds log volume that may or may not trigger a tier jump, but always moves the team closer to one.

New Relic or Scout APM (Application Performance Monitoring)

APM pricing on Heroku add-ons scales with host count or agent count. Adding a new service adds a new APM agent. Adding a new dyno for horizontal scaling adds another agent at the same per-agent rate.

For teams that run multiple services with multiple dynos each, the APM bill can easily exceed the compute bill. This is the observability cost that most surprises engineering leaders during a cost audit: the monitoring costs more than the compute it monitors.

Additional Tools: Error Tracking, Uptime Monitoring, Alerting

A production-ready observability setup also typically includes Sentry or Rollbar for error tracking, an uptime monitoring tool, and a separate alerting layer since Heroku’s native alerting capabilities are limited.

Each tool adds a billing line item. Each tool adds a configuration surface that someone on the team needs to maintain. Each tool is a separate place to look during an incident.

The combined financial picture:

For a typical five-service production stack on Heroku, the assembled observability stack costs between $400 and $900 per month, depending on traffic volume, service count, and the specific tools chosen. This number is not fixed; it grows with every new service, every traffic spike that pushes log volume past a tier boundary, and every new engineer that joins and starts writing logs.

The observability line item on a Heroku invoice is frequently larger than teams expect when they first examine it carefully, and it is one of the fastest-growing line items as the product scales.

The operational cost beyond the financial one:

The financial cost is real and calculable. The operational cost is often larger and harder to quantify.

The operational cost of a fragmented observability stack shows up in three places:

Initial configuration time. Setting up Papertrail drain correctly, instrumenting New Relic across services, configuring alert thresholds, and wiring everything together is a non-trivial engineering project. Teams typically underestimate this when they first assemble the stack and then absorb the cost invisibly over time as new services require the same configuration cycle.

Maintenance overhead. Each tool has its own configuration format, its own agent version requirements, and its own deprecation cycles. When New Relic releases a breaking change to their agent API, someone has to update every service. When Papertrail changes its drain configuration format, someone has to update the drain setup for every application. Multiply this by the number of tools in the stack.

Incident response friction. As described above, the cost of context-switching between three or four separate monitoring interfaces during an incident is paid in minutes of additional incident duration on every incident. For a team running ten incidents per month at an average of fifteen minutes of additional context-switching overhead per incident, that is 2.5 hours of senior engineering time per month spent on avoidable tool friction during the highest-stress moments of the operational cycle.

What Natively Integrated Observability Looks Like on a Modern Heroku Alternative

The architectural difference between Heroku’s assembled monitoring model and natively integrated observability is not a difference of tools. Prometheus, Loki, and Grafana are open-source projects that exist outside of any platform. The difference is where the assembly and configuration happen: in the platform layer versus in every customer’s environment.

On a modern Heroku alternative with integrated observability, the monitoring stack is part of the platform. It is not a list of recommended tools teams should configure after migration. It is infrastructure that exists and runs from day one, covering every service deployed to the platform without additional setup.

See how LocalOps handles observability out of the box →

What this looks like in practice with LocalOps:

LocalOps provisions Prometheus, Loki, and Grafana automatically as part of every environment, development, staging, and production. There is no observability setup step after migration. There is no drain configuration, no agent installation, and no dashboard provisioning. The stack is ready when the environment is ready.

Prometheus: Metrics Without Agent Installation

Prometheus collects metrics automatically from every service deployed to the platform: CPU utilisation, memory consumption, request rate, error rate, response latency, and any custom application metrics exposed on a standard metrics endpoint.

The difference between New Relic and Scout on Heroku is structural. New Relic requires agent installation and configuration per service. Prometheus on Kubernetes scrapes metrics from services automatically using service discovery, no agent, no per-service configuration, no per-service billing.

Teams get infrastructure-level and application-level metrics in the same system with the same data model. A memory pressure problem that triggers a pod restart and causes elevated error rates is visible as a single correlated event in Prometheus, not as two separate signals in two separate monitoring tools.

Loki: Log Aggregation Without Drain Configuration

Loki aggregates logs from all services through standard output. Services write logs to stdout. Loki collects them automatically. There is no Papertrail drain to configure, no log format requirement, and no volume-based pricing tier to worry about.

The operational difference from Papertrail is meaningful. Papertrail requires configuring a log drain URL per application, managing drain credentials, and monitoring log volume to avoid unexpected tier jumps. Loki on Kubernetes collects logs from every container automatically through standard Kubernetes log aggregation. New services are covered the moment they deploy.

Grafana: Unified Dashboards for Logs and Metrics

Grafana provides dashboards that query both Prometheus and Loki simultaneously. When an incident occurs, the engineer opens a single interface, selects the relevant service, and sees metrics and logs side by side with the same timestamps.

The incident response workflow changes fundamentally. Instead of opening Papertrail, filtering to the relevant service, opening New Relic, correlating the timeline, and switching back and forth, the workflow is: open Grafana, select the service and timeframe, and see the full picture.

LocalOps includes pre-built dashboards for infrastructure health, service-level metrics, and deployment events. Teams get operational visibility from day one without building dashboards from scratch as a separate project.

What AWS Looks Like as a Heroku Alternative, Without Requiring Kubernetes Expertise

The most common objection to AWS-based Heroku alternatives is not cost or capability; it is operational complexity. AWS is powerful but not self-serve for developers. A developer who can deploy a Heroku app in five minutes cannot deploy the same app to EKS in five minutes without a platform layer that abstracts the Kubernetes and AWS operations.

This objection is valid when applied to raw AWS. It does not apply to AWS accessed through an Internal Developer Platform like LocalOps, where the platform layer handles the Kubernetes and AWS operations automatically.

The developer experience comparison:

On Heroku today: Developer pushes to the main branch. Heroku detects the push, runs the buildpack, creates a slug, and deploys the dyno. The developer can view logs from the Heroku dashboard. Service is running.

On LocalOps (AWS + IDP): Developer pushes to main branch. LocalOps detects the push, builds a Docker image, pushes to ECR, and deploys to EKS. The developer can view logs and metrics from the LocalOps dashboard. Service is running.

The developer experience is structurally identical. The infrastructure underneath is in the team’s AWS account, with all the compliance and control implications that it carries. But the developer never writes a Dockerfile, never touches kubectl, and never configures a Kubernetes deployment manifest unless they choose to.

Teams where developers need to understand Kubernetes to deploy services have not implemented a Heroku alternative with a platform layer. They have implemented Kubernetes and asked their product engineers to become platform engineers. That is a different and significantly more expensive outcome.

How AWS Services Map to the Full Heroku Stack

For engineering leaders evaluating the migration, the question is not whether AWS can replace Heroku technically; it clearly can, but how each component of the Heroku stack maps to its AWS equivalent, and what the operational complexity difference looks like in practice.

The full stack mapping:

Heroku Dynos → Amazon EKS (Kubernetes on AWS)

Heroku dynos are the compute layer: containerised processes that run application code, receive traffic, and scale based on configuration.

Amazon EKS is the AWS-native equivalent: managed Kubernetes that runs containerised workloads, handles load balancing, and autoscales based on real resource metrics rather than manual dyno count adjustments.

The operational complexity difference on raw AWS is significant; Kubernetes requires configuration that Heroku handles automatically. Through LocalOps, the EKS cluster is provisioned, configured, and managed as part of the platform. Developers deploy to it without writing Kubernetes YAML.

The capability difference favours EKS meaningfully at scale:

Horizontal autoscaling responds dynamically to CPU and memory pressure rather than requiring manual dyno count changes
Workloads scale to zero during off-peak periods rather than running at minimum dyno count continuously
Pod resource limits can be set per service rather than per dyno tier
Deployment strategies (rolling, blue-green) are configurable rather than fixed

Heroku Postgres → Amazon RDS

Heroku Postgres is a managed PostgreSQL service with pricing structured around row limits, connection limits, and storage tiers that force upgrades as databases grow.

Amazon RDS is the AWS-native managed PostgreSQL equivalent. The structural pricing difference is significant: RDS charges based on instance type and actual storage consumed, not on arbitrary row count and connection limit tiers. A database at 7 million rows costs the same instance rate as a database at 5 million rows, as long as the instance type handles both workloads. There are no forced tier jumps driven by row counts.

Operational capabilities on RDS exceed Heroku Postgres at production scale: automated backups with configurable retention, read replica support for read-heavy workloads, Multi-AZ deployment for high availability, and storage autoscaling that expands capacity automatically without manual intervention.

Heroku Redis → Amazon ElastiCache

Heroku Redis pricing scales with connection count. As more services connect to Redis, the connection count drives tier upgrades that are significant price jumps relative to the actual resource consumption increase.

Amazon ElastiCache prices are based on node type and replication configuration. Connection count does not drive pricing. A five-service application with 200 total Redis connections costs the same ElastiCache rate as a two-service application with 50 connections, if the node type handles both workloads.

For multi-service architectures, this pricing difference is substantial. Every service in a Heroku stack that uses Redis adds to the connection count. ElastiCache pricing is indifferent to connection count within node capacity limits.

Heroku Monitoring Stack → Prometheus + Loki + Grafana (included in LocalOps)

As covered in detail above, the assembled Heroku monitoring stack of Papertrail, New Relic or Scout, and additional tools is replaced on LocalOps by an integrated observability stack that is part of the platform at no additional cost.

The financial comparison: $400–$900/month for the assembled Heroku observability stack versus zero additional cost on LocalOps, where Prometheus, Loki, and Grafana are included in the platform.

The operational comparison: multiple tools with separate configuration, separate data models, and separate alert configurations versus a unified stack with integrated dashboards, automatic service discovery, and correlated metrics and logs in a single interface.

Heroku Scheduler → Amazon ECS Scheduled Tasks or Kubernetes CronJobs

Heroku Scheduler handles time-based job execution with limited configurability: jobs run at defined intervals, there is minimal execution history, and there is no native retry logic or failure alerting beyond basic log output.

On AWS via LocalOps, scheduled jobs run as Kubernetes CronJobs: execution history is retained and queryable, failure alerting integrates with the platform’s observability stack, retry logic is configurable per job, and job resource allocation is separate from web dyno configuration, so scheduled jobs do not compete with request-serving workloads for resources.

Heroku Add-Ons (Miscellaneous) → AWS-Native Services

The broader Heroku add-on ecosystem, search, queuing, email, feature flags, and maps to AWS-native services or best-in-class managed services that run independently of the deployment platform. The key operational difference is that these integrations are not intermediated by Heroku’s add-on marketplace pricing layer. Teams connect directly to AWS SQS, OpenSearch, SES, or their preferred SaaS tools at the provider’s pricing without a platform markup.

How LocalOps Delivers This in Practice

LocalOps is an AWS-native Internal Developer Platform built specifically for teams replacing Heroku. It handles the EKS cluster, VPC, load balancers, IAM roles, and observability stack automatically, and delivers the developer experience that makes Heroku compelling, running on infrastructure the business controls.

The setup path:

Connect your AWS account. Connect your GitHub repository. LocalOps provisions a dedicated VPC, EKS cluster, load balancers, IAM roles, and the full observability stack, Prometheus, Loki, and Grafana, automatically. No Terraform. No Helm charts. No manual configuration. First environment ready in under 30 minutes.

From that point, the developer workflow is identical to Heroku. Push to your configured branch. LocalOps builds, containerises, and deploys to AWS automatically. Logs and metrics are available in Grafana from day one. Autoscaling and auto-healing run by default. Secrets management runs through AWS Secrets Manager with audit logging.

The observability stack that costs $400–$900/month in Heroku add-ons is included in LocalOps as infrastructure. There is no add-on to configure. There is no additional cost. There is no vendor to manage.

The infrastructure runs in your AWS account. If you stop using LocalOps, it keeps running.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” – Gaurav Verma, CTO and Co-founder, SuprSend

Get started for free — first environment live in under 30 minutes →

Frequently Asked Questions

Why does Heroku’s monitoring break at production scale specifically, rather than at early stages?

At early stages, with one or two services, modest traffic, and a small engineering team, the assembled Heroku monitoring stack is manageable. The financial cost is limited, the configuration overhead is a one-time investment, and incident correlation across two tools is not prohibitively slow. The model breaks at the production scale because each of its weaknesses compounds with the service count. Financial cost multiplies per service. Configuration overhead recurs with every new service. Incident correlation becomes slower as the number of services generating signals increases and the number of tools required to correlate them grows. The same architectural model that works at low scale becomes operationally and financially indefensible when the number of services in production grows past five to ten.

Can teams get Prometheus, Grafana, and Loki without using an IDP like LocalOps?

Yes, all three are open-source projects and can be self-hosted. The operational cost of doing so is the relevant variable. Setting up Prometheus correctly on Kubernetes with appropriate scrape configurations, retention policies, and alert rules is a non-trivial engineering project. Integrating Loki with a log aggregation pipeline that covers all services requires additional configuration. Building Grafana dashboards from scratch that surface the metrics and logs relevant to production operations requires additional time. The open-source tools are available; the pre-configured, production-ready integration is what LocalOps provides as part of the platform. Teams that want to do this work themselves can. Teams that want to be in production on day one without an observability setup project use LocalOps.

What does the migration path from Heroku Postgres to Amazon RDS look like?

The core migration has three steps: provisioning an RDS instance with the same engine version, migrating data using pg_dump and pg_restore or AWS Database Migration Service, and updating application connection strings. The most important operational consideration is connection pooling: Heroku Postgres bundles PgBouncer connection pooling in its managed tiers. On RDS, connection pooling is configured separately using RDS Proxy or a self-managed PgBouncer instance. LocalOps handles the RDS provisioning and configuration automatically. The data migration guide in LocalOps’s documentation covers the full process with step-by-step instructions for typical Heroku Postgres configurations.

Read the full migration guide →

How does the observability model change for teams running Rails applications on Heroku?

Rails applications on Heroku are typically instrumented with the New Relic Ruby agent for request tracing and rely on Heroku’s log output with Papertrail for log management. On LocalOps, Rails applications are instrumented with the OpenTelemetry Ruby SDK or use Prometheus’s Ruby client for custom metrics. Logs are written to stdout and are collected automatically by Loki. The instrumentation model is more explicit than New Relic’s automatic instrumentation, but it is also more flexible and not tied to a proprietary agent. Teams migrating Rails applications from Heroku typically find that the instrumentation migration takes one to two days. The observability visibility post-migration is meaningfully better than Heroku + New Relic + Papertrail because logs and metrics are correlated in a single interface.

What is the realistic total cost of ownership comparison between Heroku observability and LocalOps for a five-service production stack?

For a five-service production stack on Heroku, the observability component alone typically runs $400–$900/month: Papertrail at a tier appropriate for multi-service log volume, New Relic or Scout at per-agent pricing across five services and their dynos, plus error tracking and uptime monitoring tools. On LocalOps, the observability stack, Prometheus, Loki, and Grafana, is included at zero additional cost. The total infrastructure cost for the same five-service stack runs at AWS list pricing (typically $200–$400/month for compute and managed services) plus the LocalOps platform fee, replacing both the Heroku compute bill and the observability add-on bill. The observability saving alone typically exceeds the LocalOps platform fee.

Does using LocalOps create vendor lock-in that is worse than Heroku?

The opposite. Heroku’s lock-in is architectural: your infrastructure runs in Heroku’s cloud, your databases are Heroku Postgres instances, and your monitoring is Heroku-intermediated add-ons. If you stop using Heroku, everything must be rebuilt on a different platform. LocalOps provisions infrastructure in your AWS account. If you stop using LocalOps, the EKS cluster, VPC, RDS instances, ElastiCache clusters, and Prometheus stack all continue running in your AWS account. The dependency is on the platform’s operational interface, not on the platform’s infrastructure. Your Heroku dependency includes the infrastructure. Your LocalOps dependency does not.

Key Takeaways

Heroku’s observability model is broken at production scale, not because of any single tool’s weakness, but because the architectural model, assembling monitoring from disconnected third-party add-ons, creates compounding financial cost, operational fragmentation, and coverage gaps that grow with every service added to the production stack.

The $400–$900/month observability bill for a five-service Heroku production stack is the visible portion. The invisible portion is the engineering time spent on configuration, maintenance, and incident response friction across disconnected tools, costs that compound silently and rarely surface in infrastructure reviews.

Natively integrated observability on a modern Heroku alternative resolves this at the architectural level. When Prometheus, Loki, and Grafana are part of the platform rather than add-ons assembled after the fact, the financial cost disappears, the configuration overhead disappears, and the incident response workflow improves from three tools to one.

The AWS-to-Heroku stack mapping is complete: EKS replaces dynos with autoscaling that matches real workload patterns, RDS replaces Heroku Postgres without tier-jump pricing, ElastiCache replaces Heroku Redis without connection-count pricing penalties, and the LocalOps observability stack replaces the assembled Heroku monitoring stack at zero additional cost.

What platform teams gain: infrastructure in their AWS account, compliance posture for enterprise customers, and observability that is better than anything the Heroku add-on stack provides.

What developers keep: a deployment experience that is identical to Heroku, push to branch, service deploys, logs and metrics are immediately available.

Get Started with LocalOps → First production environment on AWS in under 30 minutes. No credit card required.

Schedule a Migration Call → Our engineers model your current Heroku observability costs against LocalOps + AWS and walk through the migration for your specific stack.

Read the Heroku Migration Guide → Full technical walkthrough: database migration, environment setup, DNS cutover.

Related reading:

Internal Developer Platform vs PaaS

Madhushree Sivakumar — Wed, 22 Apr 2026 12:20:38 GMT

PaaS and internal developer platforms are not the same thing, but most teams treat them as if they are. That confusion leads to either staying on a platform too long or jumping to something the team is not ready for.

A PaaS runs your application on the vendor’s infrastructure. An internal developer platform automates how your team provisions and operates infrastructure inside your own cloud account. The distinction is not just technical. It affects cost, compliance, and how much control your engineering team actually has over what runs in production.

This post covers what each one does, where PaaS typically stops working, and how to figure out which one fits your team right now.

TL;DR

A PaaS deploys your code on the vendor’s infrastructure. An IDP automates how your team provisions and operates infrastructure inside your own cloud account.
PaaS works well for small teams with simple stacks. It starts breaking down around 10 to 20 engineers, multiple services, or any compliance requirement.
PaaS compute typically costs 2 to 5x more than equivalent cloud resources.
Building an IDP from scratch with Backstage takes 6 to 12 months and requires 3 to 15 engineers to maintain.
Managed IDPs exist specifically for teams that have outgrown PaaS but don’t have the capacity to build a platform from scratch.

What Is the Difference Between an Internal Developer Platform and a PaaS?

The core difference is where your infrastructure lives and who controls it.

With a PaaS, the vendor owns the infrastructure. You deploy code onto their platform, and they handle the runtime, networking, scaling, and maintenance. You work within whatever constraints they have set. Heroku, Render, Railway, and Fly.io all follow this model.

With an internal developer platform, the infrastructure runs inside your own cloud account. The IDP is the automation and workflow layer on top of it. Developers get a self-service path to deploy services and provision environments without touching the underlying infrastructure directly. The platform team sets the guardrails. Developers work within them.

The CNCF defines an internal developer platform as an integrated collection of capabilities defined around the needs of the platform’s users, not the vendor’s convenience. That distinction matters when evaluating options.

What PaaS Gives Your Engineering Team

PaaS removes the infrastructure setup problem entirely. You connect a GitHub repository, configure a few environment variables, and you have a running application. No VPC setup, no container orchestration, no server provisioning. For a small team focused on shipping product, that is genuinely useful.

Most PaaS platforms give you:

Automatic deployments triggered by code pushes
Managed runtimes for common languages and frameworks
Built-in SSL, load balancing, and basic autoscaling
Add-on marketplaces for databases, caching, and monitoring
Preview environments for pull requests

The developer experience is fast to set up and easy to understand. A new engineer can push their first deployment on day one without knowing anything about the underlying infrastructure.

That simplicity is the point. PaaS makes sense when your team is small, your stack is straightforward, and infrastructure complexity is not yet a bottleneck. The trade-off is that you are working within the vendor’s model. Their networking, their compute, their scaling behavior, their pricing. As long as your requirements fit inside those constraints, it works.

Where PaaS genuinely earns its place is in the early stages of a product. You are not hiring a DevOps engineer to set up EKS. You are not writing Terraform to provision a VPC. You are shipping features. The platform handles deployments, restarts failed containers, and scales up when traffic increases. For a team of three to five engineers, that is a reasonable trade.

The problems appear later. PaaS platforms abstract the infrastructure, but that abstraction has hard edges. You cannot configure custom networking. You cannot control where your data lives at the subnet level. You cannot integrate with internal tooling that the add-on marketplace does not support. Heroku, for example, does not support persistent storage on standard plans, limits applications to a single port, and requires containers to boot in under 60 seconds. These are not edge cases. They are constraints that growing teams hit regularly.

When your requirements start pushing against those edges, the platform stops being an accelerator and starts being a ceiling.

What an Internal Developer Platform Gives Your Engineering Team

An internal developer platform gives developers a self-service path to deploy and operate services without depending on a DevOps engineer for every infrastructure request. The infrastructure still exists and runs on your cloud account. The IDP is what makes it accessible to the rest of the team.

In practice, a production-grade IDP on AWS provisions the following inside your account when a new environment is created:

Dedicated VPC with private and public subnets
Managed EKS cluster with EC2 compute nodes
Load balancer for inbound traffic
Prometheus, Loki, and Grafana for metrics, logs, and dashboards
Managed AWS services on demand: RDS, S3, SQS, Elasticache
CI/CD pipeline wired to branch pushes
SSL certificates, encrypted secrets, and role-based access control

Developers push code. The platform handles everything underneath. No Dockerfile, no Terraform, no Helm required from the developer’s side.

The operational difference shows up most clearly during onboarding and environment provisioning. On a PaaS, a new engineer is productive quickly but is constrained by what the vendor supports. On an IDP, a new engineer gets the same self-service experience but against infrastructure your team controls. They can spin up a staging environment, deploy a service, and check logs without filing a ticket or waiting on anyone.

The platform team sets the golden paths once. Every developer on the team benefits from them without needing to understand the infrastructure underneath. When something breaks at the Kubernetes level, the platform handles recovery. Engineers who need deeper access can get it. Engineers who don’t, never have to think about it.

That split is what makes IDPs useful at scale. The abstraction is not hiding complexity permanently. It is putting it in the right hands.

PaaS vs Internal Developer Platform:

Infrastructure Control, Customization, and Team Fit Compared

Comparing PaaS and an IDP on features alone misses the point. The real difference is in what your team can and cannot do when requirements change. Here is how they stack up across the dimensions that actually matter.

The networking and compliance rows are where most teams feel the gap first. VPC isolation, custom subnets, and IAM boundaries are not optional for teams handling sensitive data or working toward SOC 2. PaaS platforms either don’t support these controls or lock them behind enterprise tiers.

Why Engineering Teams Outgrow PaaS

The shift rarely happens all at once. It starts with one requirement the platform cannot meet, and then another, until the workarounds cost more time than the platform saves.

The most common breaking points:

Networking constraints

PaaS platforms abstract networking entirely. That works until you need VPC peering, private subnets, static IPs, or custom security group rules. Heroku’s standard plans have no private networking at all. Getting it requires upgrading to Private Spaces, which is an enterprise-only feature with a significant price jump. Teams building B2B products that need to connect to customer VPCs hit this wall early.

Compliance requirements

SOC 2, HIPAA, and similar frameworks require infrastructure isolation, audit trails, and control over where data is processed and stored. Standard PaaS tiers run on shared infrastructure. You cannot tell an auditor exactly where your data lives or demonstrate VPC-level isolation on a shared runtime. Heroku’s HIPAA-compliant offering requires Shield Private Spaces, available only on enterprise plans.

Multi-service complexity

PaaS works well for a single application. At 10 or more services, environment management becomes a problem. Keeping dev, staging, and production environments consistent across services, managing shared dependencies, and coordinating deployments across a growing codebase is difficult on a platform that treats each application as an independent unit.

Observability gaps

Most PaaS platforms give you application logs and basic CPU and memory metrics. That is enough for simple workloads. It is not enough when you need distributed tracing, custom metrics, or correlated logs across services. TRM Labs, running on Render, had to build a custom log ingestion pipeline with OTEL Collectors just to get logs into their observability stack because Render’s log drains support only a single destination.

The Real Cost Difference Between PaaS and an IDP

A typical small production app on Heroku runs two Standard dynos at $50 per month, a Postgres Standard-0 plan at $50 per month, Redis Mini at $15 per month, and basic monitoring add-ons at $15 to $30 per month. That puts you at $130 to $160 per month for a minimal setup. Postgres plans escalate quickly in practice — Standard-2 runs $200 per month, and most production workloads land somewhere in between. Add-ons stack faster than teams expect.

The same setup on AWS sits closer to $130 to $220 per month depending on usage. Two t3.medium instances run around $60 per month on-demand. RDS for a production database with storage and automated backups is realistically $50 to $150 per month, not $30. An Application Load Balancer starts at $20 per month but increases with traffic. AWS can be 30 to 70% cheaper than Heroku depending on scale and usage patterns — the gap is smaller at low scale and wider as you add services.

Here is how the numbers compare across a typical mid-size production setup:

The savings are real, but the more honest case for moving to an IDP is not just compute cost. It is what you cannot get on PaaS at any price point: VPC isolation, custom networking, compliance controls, and infrastructure you actually own.

The other cost CTOs consistently underestimate is what it takes to build an IDP from scratch. Self-hosting Backstage requires 3 to 5 dedicated engineers to manage infrastructure, handle upgrades, and maintain the codebase, according to Roadie’s survey of the Backstage community. Some organisations staff teams of 12 people just for Backstage. Total cost of ownership for a self-hosted Backstage installation can exceed $1 million per year in operational overhead at scale.

That is the real trade-off. PaaS removes upfront infrastructure cost but adds a compute premium that compounds with scale. Building your own IDP removes the compute premium but adds significant engineering overhead. Managed IDPs sit in between: raw cloud compute pricing, without the internal build cost.

For teams under 10 engineers with a straightforward stack, PaaS is still the cheaper option when you factor in engineering time. The math shifts once you are scaling services, handling sensitive data, or selling to enterprise customers.

When Teams Move from PaaS to an Internal Developer Platform

The decision to move off PaaS rarely comes from a single problem. It usually builds up over several months, one workaround here, one missing feature there, until the cost of staying outweighs the cost of switching.

Two cases illustrate how this plays out in practice.

Algolia was running on Heroku when an enterprise customer required crawling from a fixed IP address for firewall whitelisting. Standard Heroku does not support static outbound IPs. The alternative was Heroku Private Spaces, an enterprise-only tier that came with a significant price increase for a single networking requirement. The team moved to Kubernetes instead. One infrastructure gap triggered a full platform migration.

Checkly had been on Heroku since 2016. By 2022 the platform had become a recurring operational burden. Upgrading PostgreSQL versions required Heroku support involvement. Disk size plans were fixed, requiring a full plan upgrade just for storage. Essential database tasks needed senior engineer involvement every time. Moving to AWS resolved all three issues and reduced the dependency on senior engineers for routine maintenance.

Both cases point to the same pattern. The trigger is rarely cost alone. It is cost combined with a capability the platform cannot provide: networking control, compliance requirements, enterprise customer demands, or operational flexibility.

If any of this sounds familiar, you are likely already in the transition phase, not approaching it.

The common signals that indicate a team is ready to move:

Running multiple services, often 10 or more, with inconsistent environment configurations across dev, staging, and production
Hitting networking constraints that require workarounds or enterprise tier upgrades
Facing a compliance audit that requires VPC isolation or audit trails
Selling to enterprise customers who require dedicated infrastructure or BYOC deployments
Spending a significant portion of engineering time, often 20% or more, on deployment tooling and infrastructure maintenance
Onboarding new engineers taking longer than expected due to infrastructure complexity

When several of these apply at the same time, teams are usually closer to needing a shift than they realize.

If your team is currently on Heroku and evaluating the move to AWS, this migration guide covers the exact steps involved.

The BYOC Problem

Spinning Up Dedicated Infrastructure for Every Enterprise Customer

For teams selling to enterprise customers, PaaS creates a specific problem. Enterprise deals often require dedicated infrastructure inside the customer’s own cloud account. On a standard PaaS, that is not possible. On a shared runtime, there is no path to VPC isolation per customer, no way to provision inside a customer’s AWS account, and no mechanism for the customer to maintain control over their own data residency.

SuprSend, a notification infrastructure company, ran into this directly. Every new enterprise deal required spinning up dedicated infrastructure manually. The engineering team was spending significant time on per-customer setup that had nothing to do with the product. Using an IDP, they provisioned per-customer AWS environments in under 30 minutes using the same git-push workflow their engineers already used. The operational overhead that previously blocked enterprise deals was removed without changing how the engineering team worked. Read the case study here

Should You Build Your Own IDP or Buy One?

Once a team decides to move toward an IDP, the next question is whether to build it internally or use an existing platform. Both are valid paths. The right answer depends on your team size, engineering capacity, and how much time you can spend on platform work versus product work.

Building in-house

Gives you full control. You can design golden paths around your exact stack, integrate with internal tooling, and own every layer of the platform.

The most common starting point for teams figuring out how to build an internal developer platform is Backstage, the open source internal developer platform originally built by Spotify and now a CNCF project. Self-hosting the Backstage internal developer platform requires 3 to 5 dedicated engineers to manage infrastructure, handle upgrades, and maintain the codebase, according to Roadie’s survey of the Backstage community. Getting a usable instance into production typically takes 6 to 12 months. Total cost of ownership can exceed $1 million per year at scale.

Building from scratch makes sense in specific cases:

Compliance requirements so unusual that no managed platform can meet them
Multi-cloud or hybrid infrastructure with complex dependency graphs
Regulatory constraints that prevent sending any metadata to a third-party vendor

Outside these cases, a full DIY build adds significant overhead without adding proportional value, even for teams with dedicated platform engineering capacity. The time spent maintaining platform infrastructure is time not spent on the actual developer experience improvements that make an IDP worth having.

Buying a managed IDP

The best internal developer platforms handle infrastructure provisioning, observability, CI/CD wiring, and environment management out of the box, running entirely inside your own AWS account. Engineers get a self-service path to deploy services without touching Terraform or Helm. The platform team sets guardrails once. Every developer benefits from them without needing to understand what is running underneath.

This approach makes sense when:

Your engineering team’s time is better spent on product work
You do not have a dedicated platform team
You need production-ready AWS environments quickly without accumulating infrastructure debt
You want infrastructure your team controls, without the cost of building the tooling around it

The hybrid approach

This is what most mature platform teams end up adopting, including those with dedicated platform engineering staff. Use a managed IDP as the foundation for environment provisioning, CI/CD, and observability. Build custom tooling only for the parts that are genuinely specific to your organisation, such as internal approval workflows, custom cost allocation logic, or proprietary security controls.

This keeps the platform team focused on high-value work rather than maintaining infrastructure that already exists elsewhere. The best internal developer platform for most teams is not the one with the most features. It is the one that removes the most friction without creating new kinds of it.

Not sure which approach fits your team? Book a quick demo and see how LocalOps handles this in practice.

FAQs

1. Internal Developer Portal vs Platform: What’s the Difference?

An internal developer portal is a UI layer that surfaces information about existing infrastructure, service catalogs, documentation, ownership. Backstage is a portal. An internal developer platform provisions and manages cloud resources. It gives developers a self-service path to deploy services, spin up environments, and access observability without filing a ticket. A portal shows you what exists. A platform automates what happens. Most teams searching for the best internal developer platforms end up landing on Backstage first and discovering this distinction only after months of setup.

2. Do you need a platform engineering team to run an internal developer platform?

For a DIY IDP built on an open source internal developer platform like Backstage, yes. You need engineers who can maintain the codebase, manage upgrades, and build integrations. Platform engineering and internal developer platform adoption are closely linked in enterprise organisations for this reason. But managed IDPs change the equation. A small team without a dedicated platform function can run a production-grade IDP on AWS if the platform handles provisioning, observability, and CI/CD out of the box. The platform engineering investment shifts from building infrastructure to configuring a platform that already exists.

3. Can PaaS handle compliance requirements like SOC 2 or HIPAA?

Standard PaaS tiers cannot. Most run on shared infrastructure without VPC isolation, which is a baseline requirement for SOC 2 and HIPAA audits. Heroku’s HIPAA-compliant offering requires Shield Private Spaces, an enterprise-only tier with a significant price jump. An AWS internal developer platform provisions dedicated VPCs, private subnets, encrypted volumes, and role-based access controls inside your own account by default. For teams heading toward a compliance audit, this is usually one of the clearest signals that PaaS is no longer the right fit.

4. At what point does PaaS become more expensive than running on your own cloud?

It depends on scale, but the gap typically becomes significant around 10 or more services. At that point PaaS compute markups, per-seat charges, and add-on costs add up faster than the raw AWS equivalent. A typical mid-size production setup on Heroku runs $400 to $800 per month. The equivalent setup on AWS via an IDP runs $150 to $350 per month. The internal developer platform architecture on AWS also includes observability, private networking, and compliance controls that would require enterprise PaaS tiers to replicate.

5. Is an internal developer platform only for large engineering teams?

No. The common assumption is that IDPs are enterprise tools requiring large platform engineering teams and months of setup. That was true when building one meant assembling Backstage, Argo CD, Terraform, and Prometheus from scratch. Managed IDPs have changed this. A team of 10 engineers on AWS can run a production-grade internal development platform without a dedicated DevOps hire. The shift is less about team size and more about infrastructure complexity. Once you are running multiple services, selling to enterprise customers, or facing compliance requirements, an IDP starts making sense regardless of headcount.

Conclusion

PaaS is not a bad choice. It is a starting point. For small teams moving fast with a simple stack, it does exactly what it promises. The problems start when your requirements outgrow what the vendor supports, and by the time most teams realize that, they have already spent months working around the limitations.

An internal developer platform does not replace PaaS simplicity. The best ones preserve it while giving your team infrastructure they actually control. Developers still push code. Environments still spin up automatically. The difference is that the infrastructure runs in your AWS account, compliance controls are built in from the start, and enterprise customers can bring their own cloud without requiring manual setup for every deal.

The question is not really IDP vs PaaS. It is whether your current platform is still working for you or quietly becoming the ceiling.

If you are at the point where PaaS is starting to feel like a constraint rather than a convenience, LocalOps is worth looking at. It provisions production-grade AWS environments without Terraform, Helm, or Dockerfiles, and runs entirely inside your own AWS account.

Book a Demo — Walk through how environments, deployments, and AWS infrastructure are handled in practice for your setup.

Get Started for Free — Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs — A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Internal Developer Platforms vs. Heroku: What Changes, What Improves, and What to Expect

Nidhi Pandey — Fri, 17 Apr 2026 10:54:33 GMT

TL;DR

What this covers: What Internal Developer Platforms (IDPs) are, how they structurally replace Heroku, what changes for developers and platform teams, what improves, and how to evaluate whether an IDP is genuinely production-ready versus only suited to hobbyist workloads.

Who it is for: CTOs and VPs of Engineering evaluating Heroku alternatives in 2026, specifically teams that need self-serve infrastructure running on their own AWS account without rebuilding their platform from scratch.

The conclusion: Heroku’s model of abstracting infrastructure entirely worked well at low scale. IDPs replace that model with one that delivers equal developer self-service while giving platform teams the control, compliance, and cost efficiency that enterprise growth demands. The transition is structural, not just a tool swap.

Ready to see what this looks like for your stack? Schedule a free migration call → Our engineers walk through your current Heroku setup and model the AWS equivalent in under 30 minutes.

What Is an Internal Developer Platform, and How Does It Replace Heroku?

Heroku’s core promise was infrastructure abstraction: developers deploy applications without thinking about servers, networking, or scaling. That promise was genuinely useful. For teams moving past Heroku, the goal is not to abandon that promise; it is to fulfil it on infrastructure the business actually owns and controls.

An Internal Developer Platform (IDP) is the architectural layer that makes this possible.

Structurally, an IDP sits between the developer and the underlying cloud infrastructure. Developers interact with a self-serve interface, environment provisioning, deployments, log access, and scaling controls, without needing to understand what is running underneath. Platform teams interact with the same infrastructure through an operational layer that gives them visibility, compliance controls, audit logs, IAM policies, and network topology they can actually reason about.

What this looks like in practice for teams replacing Heroku:

On Heroku, a developer pushes code to a branch. Heroku builds it, deploys it, and manages the underlying compute, database, and networking. The developer never touches AWS. The infrastructure team never controls AWS. Nobody controls AWS.

On an IDP built on your own AWS account, like LocalOps, a developer pushes code to a branch. The IDP builds it, containerises it, and deploys it to your EKS cluster inside your VPC. The developer experience is identical. The difference is that the infrastructure lives in your AWS account, under IAM policies your team controls, inside a network topology your security team can audit.

The structural replacement is not about rebuilding Heroku internally. It is about inserting a platform layer that translates developer intent, “deploy this service”, into infrastructure operations on AWS that the business controls.

The IDP is what makes this transition work without creating a new bottleneck. Without it, moving off Heroku typically means developers lose self-service entirely, every environment provision, every deployment change, every scaling decision routes through the platform team. That is the bottleneck IDPs are specifically designed to prevent.

Want to see this infrastructure model in action? Get your first AWS environment running in under 30 minutes → No Terraform. No Helm charts. No credit card required.

How IDPs Solve the Platform Engineering Bottleneck That Heroku Migrations Create

This is the problem that most Heroku migration plans underestimate, and the one that causes migration projects to stall or fail.

When teams leave Heroku without an IDP layer, the self-serve model does not transfer to AWS by default. AWS is not self-serve for developers. It is a powerful infrastructure platform that requires IAM knowledge, networking concepts, and operational context to use safely. Developers who could deploy independently on Heroku cannot deploy independently on AWS without a platform layer that abstracts those requirements.

The result is predictable: deployments stop being developer-driven and start routing through whoever owns the Terraform or Helm charts. For product teams with release velocity targets, this is an immediate regression.

What the bottleneck looks like in practice:

A developer needs a new staging environment. On Heroku, they fork the app in the UI. Takes three minutes. Post-migration without an IDP, they open a ticket for the platform team, who queue it behind other infrastructure work. Three days later, the environment exists.

A backend team needs to change the memory allocation for a production service. On Heroku, they adjust the dyno slider. Post-migration without an IDP, they update a Terraform variable, submit a PR, wait for review, and hope nothing else depends on that configuration.

These are not hypothetical regressions. They are the documented patterns of teams that migrate to AWS without a developer platform layer. The platform engineering team becomes a deployment bottleneck, release velocity drops, and the product organisation quickly starts asking whether the migration was a good idea.

How an IDP prevents the bottleneck:

An IDP on AWS maintains the contract that made Heroku useful: developers self-serve for the things they own, platform teams control the things the business needs to govern.

Concretely, this means:

Environment provisioning remains developer-driven. Developers create environments through the IDP interface without filing tickets. The IDP translates that into VPC, EKS namespace, and IAM resource creation in your AWS account.
Deployments remain branch-triggered. Push to main. The IDP builds, containerises, and deploys. Developers do not write Kubernetes YAML.
Scaling adjustments remain self-serve. Developers adjust replicas or resource limits through the platform interface. Platform teams set the guardrails.
Platform teams retain governance without becoming a bottleneck. They define policies, which instance types are available, what resource limits apply, which secret scopes exist, and the IDP enforces them automatically.

The IDP is the mechanism that makes “we moved to AWS” and “developers are still unblocked” simultaneously true. Without it, one of those statements is usually false.

See how LocalOps keeps developer self-service intact on AWS. Talk to an engineer about your migration → We’ve helped teams move off Heroku without losing a single day of deployment velocity.

What Native Capabilities Does a Platform Need Before It’s a Viable Heroku Replacement?

This is the evaluation question that separates production-ready IDPs from tools that work well in demos and break under real workloads.

Heroku at its best provides a complete operational surface: deployment, scaling, database management, secrets, logs, and basic metrics out of the box. A platform that replaces Heroku needs to match that surface on all dimensions before it can be trusted with production workloads. Teams that migrate to platforms with capability gaps discover those gaps at the worst possible times: during incidents, during audits, and during the security reviews of enterprise deals.

CI/CD That Developers Don’t Have to Configure

Heroku’s git-push deploy model is one of its most underrated capabilities. Developers who have never written a CI pipeline can still ship code. A Heroku replacement needs to match this for the teams that currently rely on it.

What this means in practice: the platform should build Docker images from source, push to a registry, and deploy to the target environment automatically on branch push. Developers should not be required to write Dockerfiles, configure GitHub Actions workflows, or manage pipeline YAML as a prerequisite to deploying. Those capabilities should exist for teams that want them, but they should not be required for basic deployments.

Platforms that require significant CI/CD configuration before the first deployment are not Heroku replacements. They are infrastructure platforms that require a platform engineering layer to be usable, which recreates the bottleneck IDPs are supposed to solve.

Observability That Is Included, Not Assembled

A critical capability gap in most Heroku alternatives is observability. Heroku’s native observability is weak: log drains exist, basic metrics are available, but production observability requires assembling Papertrail, New Relic, and additional monitoring tools at significant additional cost.

The failure mode in most alternatives is that they replace Heroku’s weak observability with no observability at all. Teams migrate, get to production, and realise they have no log aggregation, no application metrics, and no dashboards. Setting up Prometheus, Loki, and Grafana on Kubernetes correctly is meaningful infrastructure work, not something a product team should absorb mid-migration.

A production-ready IDP includes integrated observability as part of the platform, not as an optional add-on configuration step. Metrics collection from all services, log aggregation from standard output, and dashboards showing infrastructure and application health should be available from day one without additional configuration.

LocalOps includes Prometheus, Loki, and Grafana pre-configured in every environment at no additional cost. The observability stack that costs hundreds of dollars per month in Heroku add-ons is part of the infrastructure.

Stop paying $400–$900/month for Heroku monitoring add-ons. See how built-in observability works on LocalOps → Prometheus, Loki, and Grafana. Included. No configuration required.

Autoscaling That Matches Real Workload Patterns

Heroku’s scaling model, manual dyno count adjustments with limited automatic response to traffic, is one of the concrete limitations that drives teams to evaluate alternatives.

A production-ready IDP should provide:

Horizontal Pod Autoscaler (HPA) by default. Services should scale out when CPU or memory pressure rises and scale back when it drops. No manual intervention. No scheduled scaling windows.
Scale to zero for non-production environments. Staging and development environments should not run at full capacity continuously. Scale-to-zero reduces cost and eliminates the “why is my staging environment billed like production” problem.
No tier-jump cost model. Autoscaling on Kubernetes scales proportionally to actual load. Teams are not forced to jump to the next compute tier when they cross an arbitrary threshold.

Secrets Management That Doesn’t Create Compliance Exposure

Heroku’s config vars model works at low scale. At production scale with enterprise customers, it creates compliance problems: secrets are visible in the Heroku dashboard to anyone with platform access, there is no audit log of who accessed or modified which secrets, and there is no integration with enterprise identity providers for access control.

A production-ready Heroku alternative needs a secrets management story that satisfies enterprise security questionnaires:

Secrets stored in a managed secrets service (AWS Secrets Manager or equivalent), not in environment variable stores
Audit logs of secret access and modification
IAM-based access control that scopes secret access per service, not per team
No plaintext secrets in deployment configurations or CI/CD logs

Platforms that handle secrets by surfacing them in a dashboard with no audit log are not production-ready for B2B SaaS teams with enterprise customers. This capability gap shows up in security reviews and cost deals.

Networking Primitives That Enterprise Customers Require

Heroku applications share Heroku’s network. There is no VPC isolation. There is no private networking between services. There is no data residency control.

For most B2B SaaS teams, this becomes a deal-blocking problem when enterprise procurement asks about network isolation. A production-ready IDP should provide:

Dedicated VPC per environment
Private service-to-service networking (services communicate on private IPs, not through public endpoints)
Security group controls that restrict inbound access to application ports
Region control for data residency requirements

Does your current Heroku setup pass the enterprise security questionnaire? Talk to our team about compliance-ready AWS infrastructure → VPC isolation, IAM controls, and audit logging — included from day one, not as an enterprise add-on.

How to Evaluate Whether a Heroku Alternative Is Genuinely Production-Ready

The gap between “works for early-stage projects” and “production-ready for a $10M ARR SaaS company” is significant and not obvious from documentation or demos.

Most Heroku alternatives look capable in the setup experience. The differentiation surfaces under three conditions: high-concurrency production traffic, compliance review, and late-night incidents.

Test 1: Zero-to-Production Without Platform Engineering Involvement

Run the self-serve path end to end. Can a developer who has never used your AWS account go from a git repository to running a production service without filing a ticket, writing Kubernetes YAML, or getting help from a platform engineer?

If the answer is no, the platform is not a Heroku replacement. It is an infrastructure platform that requires a platform layer to be usable. The missing platform layer is the product you will end up building internally, which is the outcome IDPs are supposed to prevent.

Test 2: The Incident Simulation at 2 AM

Simulate a production incident. A service is returning 500 errors. How long does it take to identify the root cause using only the tools the platform provides?

On a platform with integrated observability, this means: open the metrics dashboard, identify the service with the elevated error rate, switch to the logs view for that service filtered to the incident timeframe, and identify the error pattern. Three to five minutes.

On a platform that requires assembling Papertrail and New Relic, this means: open Papertrail, filter to the relevant service, open New Relic, correlate the timeline, and cross-reference the timestamps. Context switching between tools adds minutes to every incident. At 2 AM, under pressure, those minutes matter.

Test 3: The Enterprise Security Questionnaire

Get a copy of a standard enterprise security questionnaire; any vendor security questionnaire from a Fortune 500 procurement team will do. Go through it against the platform you are evaluating.

Can you honestly answer yes to: dedicated VPC, private service networking, IAM-based access control, audit logs for secrets access, data residency in a specified region, and SOC 2 compliance evidence?

If the answers are no, the platform will block enterprise deals. The question is not whether this matters. It is when some teams discover this on the first significant enterprise opportunity; others discover it when a large existing customer expands their security review program.

Test 4: Cost at Scale, Not Cost at Launch

Many platforms price attractively for small workloads and scale badly. The cost model to evaluate is not the starting price; it is the cost for a five-service production stack with staging environments, observability, and managed databases.

On Heroku, this stack costs $800–$1,500/month at conservative estimates, excluding APM add-ons. On AWS via LocalOps, the same stack runs at AWS list pricing with no platform margin, typically $200–$500/month for the infrastructure, plus the LocalOps platform fee.

Evaluate the model, not the number: does cost scale proportionally with actual usage, or does it jump at tier boundaries that bear no relationship to your actual workload? Platforms that inherit Heroku’s tier-based cost model are not solving the cost problem; they are replicating it.

Want to model what your Heroku stack costs on LocalOps + AWS? Get a cost comparison for your specific setup → Our engineers calculate the infrastructure cost and add-on savings side by side — no guesswork.

Test 5: What Happens When You Leave

Ask every platform vendor the same question: if we stop using your platform, what does our infrastructure look like?

For platforms that own your infrastructure, where the resources are in the vendor’s cloud account, the answer is that you need to migrate again. Your team has built operational familiarity with a platform that disappears along with its vendor relationship.

For platforms like LocalOps that run in your AWS account, the answer is that your infrastructure keeps running. The EKS cluster, the VPC, the databases, the observability stack, all of it continues operating in your account. You are not locked to the platform layer; you are locked to AWS, which is a much safer dependency.

What Changes When You Move From Heroku to an IDP

For developers, the daily experience changes less than teams expect. Push to branch, service deploys, logs are available, metrics are visible. The most common developer response to the transition is that things work the same, and then they notice the observability is better.

For platform teams, the change is significant and positive. Instead of managing Heroku add-ons and filing support tickets for infrastructure questions, they configure policies and controls in a platform that runs on infrastructure they own. They gain the visibility and governance that enterprise customers require without losing the developer self-service model that keeps product teams unblocked.

For the business, the change is structural. Infrastructure that once sat in Heroku’s cloud now runs in the company’s AWS account. Security questionnaires that were previously impossible to answer honestly become straightforward. Enterprise deals that previously stalled on infrastructure compliance can close. Cost structures that grew faster than revenue on Heroku scale proportionally to actual usage on AWS.

See what the first 90 days after a Heroku migration look like. Read the LocalOps migration guide → Full walkthrough: database migration, environment setup, DNS cutover, and what to expect at each stage.

How LocalOps Delivers This in Practice

LocalOps is an AWS-native Internal Developer Platform built for teams, replacing Heroku. It is designed specifically to maintain the developer self-service model while moving infrastructure to a customer-owned AWS account.

Connect your AWS account. Connect your GitHub repository. LocalOps provisions a dedicated VPC, EKS cluster, load balancers, IAM roles, and a complete observability stack, Prometheus, Loki, and Grafana, automatically. No Terraform. No Helm charts. No manual configuration. First environment ready in under 30 minutes.

From that point, the developer experience is identical to Heroku. Push to your configured branch. LocalOps builds, containerises, and deploys to AWS automatically. Logs and metrics are available from day one. Autoscaling and auto-healing run by default. Secrets management runs through AWS Secrets Manager with audit logging.

The infrastructure runs in your AWS account. If you stop using LocalOps, it keeps running.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” — Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” — Gaurav Verma, CTO and Co-founder, SuprSend
Start for free — first environment on AWS in under 30 minutes → No credit card required.

Frequently Asked Questions

What is an Internal Developer Platform, and is it the same as a PaaS like Heroku?

An Internal Developer Platform is a layer that sits between developers and cloud infrastructure, providing self-serve capabilities for environment management, deployments, scaling, and observability. Like Heroku, it abstracts infrastructure complexity from developers. Unlike Heroku, it runs on infrastructure the business owns, typically in the company’s AWS account, and gives platform and security teams the visibility and controls that hosted PaaS platforms like Heroku cannot provide.

Will developers need to learn Kubernetes or AWS to use an IDP?

No, that is the point. A well-designed IDP abstracts Kubernetes and AWS in the same way Heroku abstracts them. Developers interact with a self-serve interface: deploy, scale, view logs, and manage environment variables. The Kubernetes and AWS operations happen underneath without developer involvement. Teams that require developers to understand Kubernetes to deploy services have not implemented an IDP; they have implemented Kubernetes with a dashboard.

How does an IDP differ from self-hosted Heroku alternatives like Coolify or Dokku?

Self-hosted alternatives eliminate the platform margin on infrastructure, but they shift the operational burden to the engineering team. The team must provision the host infrastructure, maintain the platform software, manage security patching, and handle on-call response for the platform itself. For teams without dedicated platform engineering capacity, the engineering cost of running a self-hosted alternative consistently exceeds the platform fee of a managed IDP once all hours are included. Managed IDPs like LocalOps provide the same cost efficiency, direct AWS pricing, and no platform margin on infrastructure, without the operational burden of running the platform layer.

What does the migration from Heroku to an IDP actually involve?

The core migration has three phases. First, environment setup: connecting your AWS account, configuring the IDP, and provisioning the first environment. With LocalOps, this takes under 30 minutes. Second, application migration: containerising applications if they are not already containerised, updating deployment configurations, and migrating managed services (Postgres, Redis). Third, DNS cutover: routing production traffic to the new environment and validating that everything runs as expected. The database migration is typically the most time-intensive step. LocalOps provides a full migration guide covering each phase.

How do Internal Developer Platforms handle the compliance requirements that Heroku fails to satisfy?

IDPs built on AWS inherit the compliance posture of the AWS infrastructure they run on. VPC isolation is structural; every environment runs in a dedicated VPC in the customer’s AWS account. IAM-based access control governs who can do what at both the platform and infrastructure levels. Secrets management runs through AWS Secrets Manager with full audit logging. Network topology, access logs, and resource configurations are all auditable through standard AWS tooling. For teams pursuing SOC 2, ISO 27001, or responding to enterprise security questionnaires, the compliance posture of AWS-native infrastructure is fundamentally more defensible than Heroku’s shared-infrastructure model.

What should CTOs expect in the first 90 days after migrating from Heroku to an IDP?

The first two weeks are operational: environments are provisioned, applications are containerised and deployed, and the team validates that production behaviour matches expectations. The next four to six weeks surface the improvements: observability is better than it was on Heroku, incident response is faster, and developers start noticing that environment provisioning no longer requires tickets. By 90 days, the cost difference from the AWS infrastructure and eliminated add-ons is visible in the invoices, and the platform team is spending meaningfully less time on deployment and environment management. For B2B SaaS teams, the first enterprise security questionnaire that arrives post-migration is the validation moment, the answers are honest, and the deals can move forward.

Key Takeaways

Heroku’s model of complete infrastructure abstraction was a genuine innovation. It lowered the barrier to shipping software significantly for a generation of product teams.

The reason teams move past it is not that the model was wrong; it is that the model hits structural limits as businesses scale. No control over infrastructure. No compliance posture for enterprise customers. Cost that scales with product complexity rather than with revenue. Observability is assembled from disconnected paid add-ons.

Internal Developer Platforms are the architecture that replaces this model at scale. They maintain the self-serve developer experience that made Heroku valuable while running on infrastructure that the business controls. Platform teams get the governance and visibility that enterprise growth requires. Developers keep the deployment simplicity that keeps product velocity high. The business gets infrastructure that can answer an enterprise security questionnaire.

What changes: where the infrastructure lives (your AWS account, not Heroku’s), who controls it (your platform team, with guardrails), and what it costs (AWS list pricing, no platform margin, observability included).

What stays the same: the developer experience, the self-serve model, and the ability to ship without becoming an infrastructure expert.

What improves: observability, compliance posture, cost predictability, autoscaling behaviour, secrets management, and the ability to close enterprise deals that currently stall on infrastructure questions.

Get Started

Get Started for Free → First production environment on AWS in under 30 minutes. No credit card required.

Schedule a Migration Call → Our engineers model your current Heroku costs against LocalOps + AWS and walk through the migration for your specific stack.

Read the Heroku Migration Guide → Full technical walkthrough: database migration, environment setup, DNS cutover.

Related reading:

Golden Path Deployments: Ship Faster Without Managing Kubernetes or Terraform

Madhushree Sivakumar — Thu, 16 Apr 2026 12:19:51 GMT

Setting up infrastructure shouldn’t take longer than building the feature.

For engineering teams without a dedicated DevOps function, it often does. Spinning up a new environment means configuring VPCs, provisioning EKS clusters, wiring up monitoring, and writing CI/CD pipelines from scratch. That’s before a single line of application code gets deployed.

The concept of a golden path exists to fix this. A golden path is a standardized, pre-paved route from code to production that removes the guesswork from deployment. An internal developer platform is the mechanism that makes it real.

This post covers what a golden path deployment workflow looks like, how internal developer platforms implement one, and what the right level of abstraction looks like for engineering teams that want to ship faster without taking on unnecessary infrastructure complexity.

TL;DR

A golden path is a standardized, pre-paved route from code to production that removes infrastructure decisions from the developer’s critical path
An internal developer platform is what actually implements it. It provisions environments, automates CI/CD, and handles cloud complexity by default
Building a golden path the traditional way (Backstage + Terraform + ArgoCD) requires a dedicated platform team and 6-18 months of build time
For teams without a dedicated DevOps function, an IDP gives you the same outcome without assembling it yourself
The right IDP doesn’t just abstract Kubernetes and Terraform. It makes every environment identical, every deployment consistent, and observability available from day one
Golden paths are not mandates. The best ones are defaults developers choose because they work

What Is a Golden Path Deployment Workflow in Platform Engineering?

A golden path is a predefined deployment workflow with sensible defaults already configured. Infrastructure provisioned. CI/CD wired. Observability running. The developer pushes code. The platform handles the rest.

Spotify built the concept because their autonomous team model created a different problem. Teams moved fast but independently. There was no standard way to ship a service. If you wanted to know how deployments worked, you asked whoever had done it most recently and hoped their answer still applied.

The fix was a supported, opinionated path from code to production. Not a mandate. A default. The path covered scaffolding, environment setup, CI/CD configuration, and monitoring. Teams could go off it when they needed to. Most didn’t, because the path was faster.

Netflix built the same idea and called it the Paved Road. Different name, same principle: make the right way the easy way.

What both companies had that most SaaS teams don’t is a dedicated platform team maintaining those workflows. Golden paths aren’t a one-time setup. They get versioned, updated, and extended as infrastructure changes. That maintenance work is real and ongoing.

An internal developer platform is how you operationalize a golden path without doing all of that yourself.

From Golden Path to IDP: How the Two Connect

A golden path describes the preferred developer workflow. In platform engineering, an internal developer platform is the automation, self-service tooling, and integrations that make that workflow the default rather than a document people are expected to remember.

How do internally built workflows break?

You can define a golden path without an IDP. Document your Terraform module structure. Standardize your CI templates. Put conventions in Confluence and tell every team to follow them. It holds until someone is under deadline pressure and skips a step. Someone else copies that pattern. Six months later you have three different deployment configurations across five services and no clear owner.

An IDP encodes those decisions into repeatable automation. When a developer connects a GitHub repo and spins up a new environment, the platform provisions a VPC, creates an EKS cluster, configures subnets, deploys Prometheus, Loki, and Grafana, and applies security defaults. The developer doesn’t make those decisions. The platform already made them.

That said, IDPs don’t force uniformity across every service. Mature platforms standardize the baseline and allow explicit variation for different workload types, compliance requirements, or regions. The goal is reducing unmanaged drift, not eliminating all configuration differences.

To build an internal developer platform this way means maintaining Terraform modules, running ArgoCD, operating the Backstage internal developer platform, and owning every integration point connecting all three. Roadie’s analysis of self-hosting Backstage puts the minimum team size at 3 dedicated engineers, with salary and overhead alone running around $450,000 annually, and time to something teams would actually use at 6-12 months.

For a 15-person SaaS team, that’s a significant operational commitment before a single application gets deployed.

How IDP Abstract Cloud Complexity from Application Developers

Abstraction in an IDP isn’t about hiding how infrastructure works. It’s about removing decisions that don’t belong in the application developer’s critical path.

On a team without an IDP, a developer who needs a new service has to answer questions that have nothing to do with the service itself. Which VPC does this go in? What node group size? How does the CI pipeline get wired? Where do logs ship? Who sets up the IAM role? These are solved problems at the infrastructure level. They shouldn’t require a decision every time a new service gets created.

An IDP moves those decisions upstream. The platform team, or in the case of a bought IDP, the vendor, makes those calls once. EKS cluster configuration, VPC topology, subnet layout, observability stack, security group defaults. They get encoded into the environment provisioning layer and applied consistently across every environment.

For example, when you are deploying on AWS, an internal developer platform handles EKS cluster provisioning, VPC configuration, IAM role setup, and observability wiring. Those are decisions the internal developer platform architecture makes once and applies consistently across every environment.

What the developer interacts with is a service abstraction. Define the service type: web service, background worker, cron job. Point it at a GitHub repo and branch. The platform builds the container image, deploys it to the cluster, wires up the load balancer, and starts routing traffic. By default, no Dockerfile is required. Teams that need custom build behavior can bring their own container image instead.

This is the right abstraction level for application developers. They retain full visibility into what’s running. Logs, metrics, and deployment history are accessible. But the infrastructure layer that doesn’t change between services is handled by the platform, not re-solved by each team independently.

The escape hatch matters too. When a team needs something outside the standard baseline, a custom RDS configuration or a specific SQS queue setup, the platform exposes VPC and subnet IDs so teams can extend using their own Terraform or Pulumi scripts without disrupting what the platform already manages.

What Is the Right Level of Abstraction in an IDP?

Too little abstraction and the IDP is just a thin wrapper around Kubernetes that still requires developers to understand node groups, ingress controllers, and Helm chart structure. Too much and developers lose visibility into what’s actually running, which makes debugging production incidents harder than it needs to be.

The right level is where infrastructure decisions that are identical across services get made once by the platform, and decisions that are legitimately different per service stay with the developer.

The line between these two isn’t fixed. It shifts based on team size, compliance requirements, and how much infrastructure variation a team legitimately needs. A fintech team running in a regulated environment will need more explicit control over networking and access policies than a team running a standard web API.

What matters is that the boundary is explicit. When developers don’t know where their responsibility ends and the platform begins, you get both: developers making infrastructure decisions they shouldn’t have to make, and platform teams fielding support tickets for things that should have been self-service.

A clear shared responsibility model fixes this. Here’s how LocalOps defines that boundary.

How IDPs Balance Developer Autonomy with Infrastructure Standardization

This is where golden paths either work or fall apart. Standardize too hard and developers work around the platform. Give too much freedom and you’re back to every team running their own infrastructure setup with no consistency across environments.

The way most IDPs resolve this is through opinionated defaults with explicit escape hatches. The platform makes the common case easy and the uncommon case possible, without making the uncommon case the default.

In practice this means three things.

The standard path covers the majority of workloads. Web services, background workers, cron jobs, and scheduled tasks all deploy through the same pipeline without any custom configuration. A developer shouldn’t need to touch infrastructure to ship any of these.
When a team has a legitimate reason to go off the golden path, the platform provides extension points rather than hard limits. Exposing VPC and subnet IDs so teams can provision their own RDS instances or SQS queues using Terraform is an example of this. The platform manages what it provisions. The team manages what they add. Both sets of resources sit inside the same network boundary.
Role-based access keeps the right people in control of the right things. Not every developer needs production deploy permissions. Not every engineer needs access to infrastructure configuration. The IDP enforces those boundaries without requiring a manual access review every time someone joins the team.

The outcome isn’t uniformity. It’s consistency at the infrastructure layer and autonomy at the application layer. Teams make decisions about how their service behaves. The platform makes decisions about how infrastructure gets provisioned and secured. Those two concerns stay separate.

What Actually Changes for a Developer When an IDP Implements the Golden Path

The easiest way to understand what an IDP delivers is to walk through what a developer actually does differently.

Without a golden path, spinning up a new service looks like this. The developer checks what the last team did, finds a repo with a vaguely similar setup, copies the Terraform, adjusts the variables, hopes the VPC IDs are still valid, writes a Dockerfile, figures out the CI pipeline configuration, sets up IAM roles, and then manually wires up monitoring after the first deployment. That process takes days. Sometimes weeks. And it produces another slightly different configuration that the next person will copy.

With an IDP implementing the golden path, the same developer creates a service, select the service type, point it at a GitHub repo and branch. The platform builds the container image without a Dockerfile. The first git push deploys the service. Prometheus is already scraping metrics. Loki is collecting logs. Grafana has a dashboard.

The whole thing takes under 30 seconds.

No Kubernetes manifest written. No Terraform module copied and modified. No DevOps engineer pulled into a provisioning task. No staging environment that diverges from production because someone tweaked it manually.

The developer’s job is to write application code and push it. The platform’s job is everything else on the golden path. That separation is what the title of this post is actually about.

Every team’s setup is different. If you want to see how this maps to your specific stack, LocalOps engineers can walk through it with you.

How to Standardize Deployments Across Engineering Teams Without Slowing Them Down

The standardization problem at scale isn’t philosophical. It’s operational. When five teams deploy five different ways, debugging a production incident means understanding five different pipeline configurations before you can even start looking at the application.

Environment drift is where this compounds. Team A’s staging environment has a different EKS node group configuration than production because someone changed it manually six months ago and nobody updated the baseline. Team B’s worker service has a different IAM role structure because it was set up by an engineer who has since left. None of this is visible until something breaks.

A golden path enforced by the platform eliminates this class of problem. Every environment is provisioned from the same template: dedicated VPC, private and public subnets with consistent CIDR ranges, EKS cluster with the same node configuration, Prometheus and Loki wired to the same Grafana instance. Differences between environments are explicit parameters, instance size, replica count, backup policy, not untracked manual changes.

Deployment pipelines follow the same principle. One artifact gets built per commit. That artifact gets promoted through environments. The pipeline configuration doesn’t live in a per-team CI script that diverges over time. It lives in the platform.

The compounding effect matters too. Every service that follows the golden path is one less snowflake configuration for the next engineer to reverse-engineer. At 5 services that’s manageable. At 50 it’s the difference between a team that can debug incidents quickly and one that can’t.

The 2024 DORA State of DevOps report found that best internal developer platforms reduce cognitive load and create standardized paths to production. Teams using mature platforms report faster onboarding and more consistent deployment practices across services.

SuprSend estimated replicating standardized deployment infrastructure for BYOC in-house would have taken 10 to 12 person-months of engineering effort. This figure is from CTO Gaurav Verma, but the cost structure is real regardless of the specific number. Read the full case study.

How the Golden Path Handles Day Two Operations

Most golden path content stops at first deployment. The service is running. The pipeline works. That’s where the documentation ends.

Day two is where things get real. A service starts returning HTTP 500s at 2am. A background worker is consuming more memory than the node can handle. A deployment goes out and latency spikes. Someone needs to roll back but isn’t sure which commit to target.

Without a golden path, each of these scenarios requires a different investigation path depending on how that service was deployed and by whom. With a golden path enforced by the platform, the answer is always in the same place.

Logs are in Loki. Metrics are in Prometheus. Both are accessible from the same Grafana dashboard that every environment gets by default. The on-call engineer doesn’t need to know which team built the service or how they set up monitoring. The observability stack is identical across every environment because the platform provisioned it the same way every time.

Rollbacks follow the same principle. Every deployment is triggered by a git commit. Rolling back means redeploying a previous commit from the same pipeline. No custom rollback scripts. No manual kubectl commands. The same path that deployed the service is the path that rolls it back.

Scaling is handled at the platform layer too. Auto-scaling is configured by default. The developer sets resource limits and replica counts as service parameters. The platform manages the rest.

Day two operations are where the golden path pays back the most. The initial deployment is a one-time event. Debugging, scaling, and rolling back happen repeatedly. A platform that makes those operations consistent across every service is worth more than one that just makes the first deploy fast.

Golden Path: Build It Yourself vs Let the IDP Handle It

Here’s what implementing a golden path looks like across both approaches:

Even at 150+ engineers, the answer isn’t necessarily to build everything. The practical approach is to buy the infrastructure layer, environment provisioning, CI/CD, observability, security defaults, and build on top of it for the parts that are genuinely specific to your organization. Custom compliance workflows, internal service catalogs, bespoke access policies. The IDP handles the golden path baseline. Your platform team owns what sits above it. That split lets a growing engineering org add platform capability incrementally without starting from zero on infrastructure every time the team scales.

FAQs

1. What is the difference between an internal developer portal vs platform in a golden path setup?

A portal is a UI. It surfaces a service catalog, documentation, and self-service actions. Backstage is the most common example. A platform is the infrastructure layer underneath: environment provisioning, CI/CD, access control, observability. In a golden path context, the portal is how developers interact with the path. The platform is what runs it. A portal without a platform gives developers a nice interface to request things that still take two weeks to get. The platform is what makes the golden path execute automatically.

2. How does an open source internal developer platform like Backstage compare to an IDP for golden path delivery?

Backstage doesn’t provision infrastructure. It doesn’t manage EKS clusters, configure VPCs, or wire up Prometheus. It gives you a service catalog and a scaffolding framework. To get a working golden path out of Backstage, you still need Terraform for infrastructure, ArgoCD for deployments, and a monitoring stack. Then you need engineers to maintain all three plus the integration layer connecting them. A purpose-built IDP ships all of that as one product. The golden path is the default behavior, not something you assemble. Build time with Backstage: 6-12 months minimum. With an IDP: under 30 minutes for a production-grade environment.

3. What is the difference between a golden path and a CI/CD pipeline?

A CI/CD pipeline handles build and deployment automation for a single service. A golden path covers the full journey: environment provisioning, infrastructure configuration, CI/CD setup, observability, and security defaults. The pipeline gets code deployed. The golden path ensures everything around that deployment is consistent and repeatable across every service and team.

4. Does using an IDP mean you lose control over your infrastructure?

No. A well-designed IDP makes the boundary explicit. The platform owns what it provisions: the VPC, the cluster, the observability stack, the security defaults. The team owns everything built on top of it. Most IDPs expose extension points, VPC IDs, subnet IDs, environment tags, so teams can provision additional resources using their own Terraform or Pulumi scripts inside the same network boundary. The platform manages its layer. The team manages theirs. What you give up is the need to make the same infrastructure decisions repeatedly. What you keep is full visibility and control over your application layer and any custom infrastructure your team adds.

5. When does an engineering team need a golden path and an internal developer platform to support it?

Three signals. Deployments are inconsistent across services and debugging a production incident means reverse-engineering someone else’s pipeline before you can even look at the application. New engineers take two weeks to ship their first change because there’s no documented, working path from code to production. Infrastructure setup is gated on one or two people and every new service requires a coordination round before it can deploy. Any one of these means the team is operating without a golden path. An IDP is how you implement one without a 6-12 month platform build first.

Conclusion

The golden path concept is simple. Make the right way the easy way. The hard part is implementation.

For large engineering organizations with dedicated platform teams, building a golden path in-house is a reasonable investment. They have the headcount, the runway, and the complexity that justifies it. For everyone else, assembling Backstage, Terraform, ArgoCD, and a monitoring stack and maintaining the integration layer between them is a significant engineering commitment that compounds over time.

The more useful question for most engineering leaders isn’t whether to have a golden path. It’s whether to build the platform that delivers it or use one that already does.

An IDP doesn’t replace engineering judgment. It removes the infrastructure decisions that don’t require it. VPC configuration, EKS provisioning, CI/CD pipeline setup, and observability wiring are solved problems. They shouldn’t consume engineering cycles on every new service or environment.

The teams that ship fastest aren’t the ones with the most DevOps expertise distributed across every developer. They’re the ones that encoded that expertise into a platform and got their developers back to writing business code.

If your team is spending more time on infrastructure setup than on the product itself, that’s the signal. A golden path won’t write your code. But it will stop infrastructure from being the reason it doesn’t ship.

If you’re figuring out how this would fit into your setup, then LocalOps team can help you work through it:

Book a Demo - Walk through how environments, deployments, and AWS infrastructure are handled in practice for your setup.

Get started for free - Connect an AWS account and stand up an environment to see how it fits into your existing workflow.

Explore the Docs - A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.

Heroku's Hidden Infrastructure Limitations: What CTOs Only Discover at Scale

Nidhi Pandey — Thu, 16 Apr 2026 05:20:44 GMT

Heroku’s infrastructure limitations are not hidden in the sense that Heroku conceals them. They are hidden in the sense that they are invisible until a specific trigger surfaces them, a failed enterprise deal, a compliance audit, a scaling incident, or an architecture decision that gets made differently because of what Heroku cannot support.

By the time these limitations become visible, they are no longer theoretical. They are active constraints shaping product decisions, blocking revenue, and accumulating technical debt. The CTO who discovers Heroku’s compliance ceiling during a live enterprise deal is not making a calm architectural decision; they are managing a crisis that adequate lead time would have prevented.

This guide covers the five infrastructure limitations that Heroku does not surface until scale, what each one costs when it surfaces, and what the AWS-native architecture that replaces Heroku actually looks like.

TL;DR

What this covers: Heroku’s five hidden infrastructure limitations, VPC isolation, SOC 2 and HIPAA compliance risks, access control and audit logging gaps, stateful workload limitations, and dyno scaling failures and the AWS-native alternatives that solve each structurally

Who it is for: CTOs and engineering leaders who are on Heroku, approaching Series A or beyond, and want to understand the infrastructure constraints before they surface as crises

The pattern: Every limitation on this list is invisible at the small scale and becomes a strategic constraint at the growth stage. The teams that navigate this well discover the constraints before they become urgent.

Want to see what your infrastructure looks like on AWS without these limitations? Speak with the LocalOps team →

Limitation 1: No VPC Isolation, No Private Networking, No Infrastructure You Control

Heroku’s networking model is simple by design: your application runs on Heroku’s shared infrastructure and communicates with the outside world over the public internet. There is no VPC. There is no private subnet configuration. There is no IAM-based access control at the network layer. Your application, your database, and your cache all communicate over public endpoints, secured at the application layer with credentials, but not isolated at the network layer.

For early-stage applications, this model is acceptable. The operational simplicity it provides is genuine and valuable. The limitations become visible when two things happen: the team starts selling to enterprise customers and its architecture begins to evolve toward inter-service communication.

Enterprise procurement processes require infrastructure controls that Heroku’s networking model cannot provide. VPC configuration, private subnets between services, network isolation between environments, and the ability to describe your infrastructure’s security posture in a vendor security review- none of these are possible on Heroku because the infrastructure is not yours to configure.

The architecture problem surfaces separately. As applications decompose into microservices, the services need to communicate with each other. On Heroku, all inter-service communication traverses the public internet. There is no private DNS. There is no service mesh. Services communicate through public endpoints with application-layer security. For architectures where internal services should never be publicly accessible, Heroku’s networking model is a fundamental mismatch.

What AWS-native alternatives provide:

Moving to AWS via an Internal Developer Platform like LocalOps automatically provisions a dedicated VPC for every environment. Private subnets separate application tiers. Security groups control traffic flow between services at the network layer. Services communicate over private IP addresses, never over the public internet. IAM roles govern access to every AWS resource with least-privilege policies applied automatically.

From the first deployment, the network architecture around which enterprise security questionnaires are built is in place. VPC configuration, private networking, and network isolation between environments are defaults, not configuration projects.

Every environment LocalOps provisions follows AWS Well-Architected standards by default, with private subnets, security group rules, and network ACLs applied automatically without manual configuration.

See how LocalOps handles network security by default →

Limitation 2: Heroku cannot support HIPAA or SOC 2 Workloads, and Most Teams Discover This During an Enterprise Deal

This is the limitation with the highest business cost, and the one that surfaces at the worst possible moment.

Heroku offers a Heroku Shield product for teams with compliance requirements, but even with Shield, the fundamental constraint remains: your infrastructure runs on Heroku’s systems. Your compliance posture is bound by what Heroku chooses to certify and support. When an enterprise security questionnaire asks about infrastructure ownership, data residency, VPC configuration, or audit logging, the honest answer on Heroku is that the team does not control those things.

For B2B SaaS teams selling to healthcare organizations, financial institutions, or any enterprise with a structured security review process, this constraint is not a technical inconvenience. It is a revenue blocker.

HIPAA compliance on Heroku: HIPAA requires administrative, physical, and technical safeguards around protected health information. The technical safeguards include access controls, audit logging, data integrity mechanisms, and transmission security. On Heroku, the infrastructure implementing these safeguards is Heroku’s, not the team’s. A Business Associate Agreement with Heroku provides some coverage, but the team cannot independently audit, configure, or demonstrate control over the infrastructure handling PHI. Enterprise healthcare customers consistently require infrastructure that the vendor controls, not infrastructure that a third party controls on the vendor’s behalf.

SOC 2 compliance on Heroku: SOC 2 Type II requires demonstrating consistent control over infrastructure over time. The controls around logical access, change management, risk assessment, and monitoring all require the ability to configure and audit the underlying infrastructure. Teams on Heroku cannot configure VPC access controls, cannot implement custom IAM policies, and cannot generate infrastructure-level audit logs, because those capabilities belong to Heroku, not the team.

What AWS-native alternatives eliminate:

When infrastructure runs in the team’s own AWS account, the compliance surface is AWS, which holds SOC 2 Type II, HIPAA BAA, GDPR adequacy, PCI DSS, FedRAMP, and dozens of additional certifications. Every environment LocalOps provisions includes private subnets, least-privilege IAM policies, encrypted secrets via AWS Secrets Manager, and audit logging through AWS CloudTrail.

The compliance architecture is not assembled after migration. It is in place from the first deployment, as a default, not as a configuration project initiated by a compliance audit or an enterprise deal.

For teams evaluating the best Heroku alternatives for compliance-sensitive workloads, infrastructure ownership in their own AWS account is the only path that satisfies enterprise security requirements without a compliance ceiling defined by a vendor.

Limitation 3: Heroku Has No Least-Privilege Access, No Role-Based Permissions, and No Audit Logging

Access control and audit logging are the two capabilities that SOC 2, HIPAA, and virtually every enterprise security framework require as baseline infrastructure controls. Heroku provides neither at the infrastructure level.

On Heroku, access control is application-level. Developers are granted access to Heroku applications, not to the infrastructure running underneath. There is no concept of least-privilege access to specific infrastructure resources. A developer with access to a Heroku application has the same access surface as every other developer with access to that application. Granular, role-based access to specific infrastructure components, the EKS cluster, the RDS database, and specific S3 buckets does not exist.

Audit logging at the infrastructure layer does not exist on Heroku. There is no equivalent to AWS CloudTrail, no log of who accessed what infrastructure resource, when, from where, and what action was taken. For teams undergoing SOC 2 Type II audits or responding to enterprise security questionnaires asking about infrastructure audit trails, this is a gap that cannot be closed while running on Heroku.

Implementing least-privilege access on an AWS-native platform:

AWS Identity and Access Management provides role-based access control at every layer of the infrastructure stack. IAM roles can be scoped to specific resources, specific actions, and specific conditions. A developer role can be granted read access to application logs without granting access to the production database. A CI/CD pipeline role can be granted permission to update a specific EKS deployment without any other AWS access.

LocalOps provisions IAM roles following least-privilege principles automatically. Every service gets a role scoped to exactly the AWS resources it needs. Developers access infrastructure through the LocalOps interface; the AWS account is always accessible, but direct infrastructure access is governed by IAM policies that the team controls.

AWS CloudTrail logs every API call to every AWS service, who made the call, when, from which IP, with which credentials, and what the response was. For SOC 2 audits, this audit trail is comprehensive, searchable, and exportable. It exists by default in every AWS account, not as a configuration project.

For teams evaluating Herokualternatives in 2026 with compliance requirements, the access control and audit logging gap is the most technically specific reason managed PaaS platforms fail enterprise security reviews. AWS-native infrastructure closes this gap structurally.

See how LocalOps implements least-privilege access automatically →

Limitation 4: Heroku Is Architecturally Unsuitable for Persistent, Stateful Workloads

Heroku’s application model is built around the twelve-factor app methodology, stateless processes, ephemeral filesystems, and external services for all persistence. This model works well for web applications following these patterns. It creates real constraints for workloads that do not.

Heroku dynos have an ephemeral filesystem. Any data written to the local filesystem within a dyno is lost when the dyno restarts, which can happen for any number of reasons, including Heroku platform events that are outside the team’s control. For applications that need to write temporary files, process large datasets, or maintain any local state, this is a constraint that requires architectural workarounds.

The dyno model also creates problems for workloads that need to maintain connections across restarts. Database connection pools, WebSocket connections, and long-running background jobs all behave differently when the underlying process can be restarted at any time by a platform the team does not control. Teams running these workloads on Heroku accumulate workarounds, connection retry logic, session state externalization, job queue durability mechanisms, that add complexity specifically to compensate for Heroku’s architectural model.

Persistent, stateful workloads, databases that need to run close to the application, stateful processing pipelines, machine learning inference services with large model files, legacy applications with filesystem dependencies, are all architecturally difficult on Heroku’s ephemeral, shared infrastructure model.

How AWS-native alternatives solve stateful workloads:

Kubernetes-based platforms running on AWS handle stateful workloads through persistent volumes, storage that survives pod restarts and is available to specific workloads. StatefulSets provide stable network identities and persistent storage for workloads that require them. Amazon EFS provides shared filesystem access across multiple pods when applications need it.

More significantly, the AWS service ecosystem provides purpose-built managed services for every category of stateful workload. Amazon RDS runs inside the team’s VPC, private, configurable, and not shared with any other tenant. ElastiCache provides Redis with VPC isolation and persistence configuration. Amazon SQS provides reliable message delivery for background job queues with dead-letter queue handling and retry logic.

LocalOps supports web services, background workers, cron jobs, internal services, and stateful workloads as first-class service types. Each is configured and scaled independently based on its own workload signals, not forced into Heroku’s dyno model that treats all workloads the same.

For teams evaluating Heroku open source alternatives or AWS Heroku alternative platforms for stateful workload support, the Kubernetes persistent volume model, combined with AWS managed services, is the correct architectural foundation. It is what modern production SaaS applications require and what Heroku’s design explicitly does not support.

Limitation 5: Heroku’s Manual Dyno Scaling Fails Under Real Traffic Patterns

Heroku’s scaling model is vertical and manual. When an application needs more capacity, the options are: upgrade to a larger dyno tier or add more dynos. Both decisions require human intervention. Both result in paying for the selected capacity level continuously, whether or not the traffic justifies it.

This model has three failure modes at scale that surface consistently across engineering teams.

The over-provisioning trap. Teams running workloads with variable traffic, B2B applications that peak during business hours, consumer applications that spike around campaigns, and event-driven systems with bursty processing requirements must provision for peak capacity and pay for it at all times. There is no mechanism to automatically scale down when traffic drops. Teams pay for peak capacity during off-peak periods continuously. The cost compounds with the service count.

The tier-jump problem. Heroku’s pricing scales in discrete tiers, not proportionally with usage. When resource requirements cross a tier boundary, the cost jumps to the next tier regardless of whether actual usage justifies the full tier ceiling. For finance teams preparing infrastructure forecasts, this makes cost modeling unreliable. Infrastructure spend jumps at irregular intervals unrelated to business metrics.

The response latency problem. When a traffic spike arrives, and the team has not pre-provisioned adequate capacity, Heroku’s response time for manual scaling is measured in minutes of human decision-making plus minutes of dyno startup time. For high-concurrency APIs serving real-time workloads, this latency is visible to customers as performance degradation during exactly the moments when reliable performance matters most.

How event-driven horizontal autoscaling solves this:

Kubernetes horizontal pod autoscaling responds to real workload signals, CPU utilization, memory pressure, request queue depth, and custom application metrics, automatically and in seconds. When traffic increases, the platform scales out. When traffic drops, it scales back in. Teams pay for actual compute consumption proportional to real usage, not for the tier ceiling required to handle the peak.

The scaling configuration on LocalOps is set once based on the application’s resource requirements and traffic patterns. From that point, scaling decisions are made by the platform in response to real signals, without human intervention, without manual dyno configuration, and without the over-provisioning that Heroku’s model structurally requires.

For teams evaluating alternatives to Heroku, specifically because of scaling problems, the difference between manual vertical scaling and event-driven horizontal autoscaling is not marginal. It is the difference between an infrastructure model designed for predictable linear workloads and one designed for the variable, bursty traffic patterns that real SaaS applications experience.

See how autoscaling works on LocalOps →

The Pattern Across All Four Limitations

These five limitations share a common structure. Each one is invisible at a small scale, where Heroku’s simplicity provides genuine value, and the constraints are either absent or manageable. Each one becomes a strategic constraint at the growth stage, Series A and beyond, when enterprise deals arrive, when architecture needs to evolve, when compliance frameworks become sales requirements, and when infrastructure cost compounds past the point of easy justification.

The teams that navigate this transition well are the ones that discover these limitations before they surface as crises. The CTO who identifies Heroku’s compliance ceiling six months before the first enterprise deal closes has time to plan a migration under calm conditions. The CTO who discovers it during a live security review does not.

How LocalOps + AWS Addresses All Five Limitations

LocalOps is an AWS-native Internal Developer Platform built specifically for teams replacing Heroku.

Connect your AWS account. Connect your GitHub repository. LocalOps provisions a dedicated VPC, EKS cluster, load balancers, IAM roles with least-privilege policies, encrypted secrets via AWS Secrets Manager, CloudTrail audit logging, and a complete Prometheus + Loki + Grafana observability stack, automatically. No Terraform. No Helm charts. No manual configuration. Production-ready in under 30 minutes.

From this point onwards, the developer experience is identical to Heroku. Push to your configured branch. LocalOps builds, containerizes, deploys, runs health checks, and handles rollback automatically. Preview environments spin up on every pull request. Horizontal autoscaling runs by default based on real traffic signals. Stateful workloads run as first-class service types with persistent storage.

The infrastructure runs in your AWS account. VPC isolation, private networking, IAM-based access control, audit logging, compliance-ready defaults, all present from the first deployment. If you stop using LocalOps, the infrastructure keeps running. Nothing needs to be rebuilt.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” – Gaurav Verma, CTO and Co-founder, SuprSend

Get started for free, first environment on AWS in under 30 minutes →

Frequently Asked Questions

How do teams get VPC isolation and private networking when moving off Heroku?

Moving to an AWS-native Internal Developer Platform provisions a dedicated VPC for every environment automatically. Private subnets separate application, database, and cache tiers. Security groups control traffic between services at the network layer. Services communicate over private IP addresses, never over the public internet. LocalOps provisions this VPC architecture automatically from the first deployment, following AWS Well-Architected standards. There is no manual VPC configuration required and no separate security project to complete before going to production.

What are the specific compliance risks of running HIPAA or SOC 2 workloads on Heroku?

HIPAA requires demonstrable control over infrastructure handling protected health information, access controls, audit logging, and transmission security, all implemented and auditable by the team. SOC 2 Type II requires consistent control over infrastructure over time. On Heroku, the infrastructure implementing these controls belongs to Heroku; the team cannot independently configure, audit, or demonstrate control over it. Moving to AWS via LocalOps puts the compliance surface in the team’s own AWS account, which holds HIPAA BAA, SOC 2 Type II, GDPR adequacy, and PCI DSS certifications. The compliance architecture is in place from the first deployment as a default.

How do teams implement least-privilege access and audit logging after leaving Heroku?

AWS IAM provides role-based access control at every layer of the infrastructure stack. LocalOps provisions IAM roles following least-privilege principles automatically; every service gets a role scoped to exactly the AWS resources it needs. AWS CloudTrail logs every API call to every AWS service automatically, providing a comprehensive audit trail for SOC 2 audits and enterprise security reviews. Both are present by default in every environment LocalOps provisions, not as separate configuration projects.

Why is Heroku unsuitable for stateful workloads?

Heroku’s dyno filesystem is ephemeral; any data written locally is lost when the dyno restarts. There is no persistent storage that survives dyno restarts. Database connection pools, WebSocket connections, and long-running stateful processes all behave unpredictably when the underlying process can restart at any time without the team’s control. Kubernetes running on AWS solves this through persistent volumes that survive pod restarts, StatefulSets that provide stable network identities, and Amazon EFS for shared filesystem access. LocalOps supports stateful workloads as a first-class service type.

How does event-driven autoscaling differ from Heroku’s dyno scaling model?

Heroku scaling is manual; humans decide when to add dynos or upgrade tiers, and teams pay for whatever capacity level is configured continuously. Kubernetes horizontal pod autoscaling responds to real workload signals, CPU, memory, and request queue depth, automatically and in seconds. When traffic increases, the platform scales out. When it drops, it scales back in. Teams pay for actual consumption rather than the tier ceiling required to handle peak load. For applications with variable traffic, the cost and reliability difference is significant.

Is LocalOps the best Heroku alternative for teams with compliance requirements?

For teams with SOC 2, HIPAA, GDPR, or enterprise compliance requirements, the defining characteristic of any Heroku alternative is whether infrastructure runs in the team’s own cloud account. Managed PaaS alternatives like Render and Railway run on the vendor’s shared cloud; the compliance ceiling is vendor-defined, the same structural problem as Heroku. LocalOps provisions infrastructure into the team’s own AWS account with compliance-ready defaults, private subnets, least-privilege IAM, encrypted secrets, and CloudTrail audit logging. The compliance surface is AWS, which holds the relevant certifications, not a vendor’s representations about what they support.

What does Rails hosting, a Heroku alternative, look like for compliance-sensitive Rails applications?

Rails applications have specific infrastructure requirements, Sidekiq workers, Postgres with connection pooling, Action Cable with Redis, Active Storage with object storage, and scheduled tasks. LocalOps handles all of these as first-class service types running inside a dedicated VPC. Web processes and Sidekiq workers scale independently. RDS provides Postgres with VPC isolation. ElastiCache provides Redis for Action Cable and job queuing. All services communicate over private networking. For Rails teams with compliance requirements, this is the Rails hosting Heroku alternative that satisfies enterprise security reviews and infrastructure ownership, with the developer experience intact.

Key Takeaways

The infrastructure limitations Heroku hides from small teams are the same limitations that become strategic constraints at the growth stage. VPC isolation and private networking become enterprise deal requirements. HIPAA and SOC 2 compliance become sales blockers. Least-privilege access and audit logging become audit requirements. Stateful workload support becomes an architectural necessity. Event-driven autoscaling becomes a cost and reliability requirement.

None of these surfaces is a crisis at a small scale. All of them surface before Series B for any B2B SaaS team with enterprise ambitions.

The best Heroku alternatives in 2026 are those that solve all five limitations simultaneously, not by adding compliance features on top of a managed PaaS, but by running infrastructure in the team’s own AWS account with compliance-ready defaults from the first deployment.

For engineering leaders evaluating alternatives to Heroku, the frame that produces the best decisions is the three-year question: what infrastructure limitations will constrain the business at the next stage? The answer to that question consistently points toward infrastructure ownership on AWS, with a platform layer that preserves the developer experience Heroku provided without the constraints it imposed.

Schedule a Migration Call → Our engineers review your current Heroku setup and walk through the AWS migration for your specific stack.

Get Started for Free → First production environment on AWS in under 30 minutes. No credit card required.

Read the Heroku Migration Guide → Full technical walkthrough, database migration, environment setup, DNS cutover.

Developer Self-Serve on AWS: How to Replace Heroku Without Creating an Ops Bottleneck

Nidhi Pandey — Wed, 15 Apr 2026 04:59:02 GMT

The most common way a Heroku to AWS migration fails is not a database problem or a DNS problem. It is an organizational one.

The infrastructure moves to AWS successfully. The technical configuration is correct. The compliance architecture is sound. And then developers who used to deploy themselves every 20 minutes on Heroku are filing tickets with the platform team and waiting 48 hours. Shipping velocity drops. Engineers are frustrated. The migration gets blamed, even though the infrastructure is fine.

This failure has a name in the engineering community: trading a PaaS dependency for a platform team dependency. The infrastructure problem is solved. The developer autonomy problem is recreated in a different form, with a different bottleneck, and the same cost.

Every team evaluating AWS as a Heroku alternative needs to answer one question before committing to an approach: Will any developer on the team be able to deploy their service, access their logs, and check their application health on day one, without asking anyone for help?

If the answer is no, the migration has not succeeded, regardless of what the infrastructure looks like underneath.

TL;DR

What this covers: How to preserve git-push deployments on AWS, what production-grade CI/CD looks like on a Heroku alternative, how to replicate Heroku Review Apps on Kubernetes, what genuine developer self-service requires, and whether small teams can run production SaaS on AWS without a dedicated platform function

The core principle: Developer autonomy is not a feature to add after the migration. It is a requirement that the migration must preserve from day one.

The answer: An AWS-native Internal Developer Platform that handles infrastructure complexity invisibly, so developers keep the workflows they already have, and the business gets the infrastructure it owns

Want to see what developer self-serve looks like on LocalOps? Schedule a walkthrough →

Why Teams Lose Developer Experience When They Move to AWS

Heroku’s developer experience was not an accident. It was a deliberate product decision: make deployment so simple that any developer on the team can do it without infrastructure knowledge. The result was a platform that product engineers loved precisely because it got out of the way.

When teams move to raw AWS, they get everything Heroku could not provide: VPC isolation, horizontal autoscaling, compliance-ready infrastructure, and direct pricing. What they do not get automatically is the abstraction layer that made Heroku’s developer experience possible.

Deploying to EKS requires configuring the cluster, the VPC, the load balancers, the IAM roles, the security groups, and the CI/CD pipeline. Writing Kubernetes manifests. Managing Helm charts. Configuring health checks and rollback logic. For a platform engineer, this is reasonable work. For a product engineer building features, it is an unreasonable prerequisite to deploy code.

The gap between “AWS infrastructure is provisioned” and “any developer can deploy independently” is typically a three to six-month platform engineering project, before accounting for preview environments, self-serve environment management, or integrated observability. Most teams do not plan for this. Most migrations stall here.

The solution is not to simplify AWS. AWS is appropriately complex for what it does. The solution is an Internal Developer Platform, a layer that sits on top of AWS infrastructure and handles every infrastructure operation invisibly, so the developer-facing workflow stays identical to what the team had on Heroku.

Preserving Git-Push Deployments on AWS

The git-push deployment workflow is the single most important thing to preserve in a Heroku migration. It is not just a convenience; it is the mechanism that enables developer autonomy. When any developer can push code and see it deployed without infrastructure knowledge, the platform team stops being a bottleneck.

Preserving this on AWS requires an abstraction layer that translates a git push event into the Kubernetes operations required to deploy the new version, automatically, without the developer ever touching Kubernetes directly.

With LocalOps, the workflow is identical to Heroku. A developer pushes code to a configured branch. LocalOps detects the push, builds a container image automatically, pushes it to Amazon ECR, updates the Kubernetes deployment on EKS, runs health checks against the new version, and handles rollback automatically if the health checks fail. Within minutes, the new version is live. The developer sees deployment status in the LocalOps interface. No kubectl. No Helm. No Terraform. No platform team notification required.

Heroku buildpack replacement happens transparently. If the team has a Dockerfile, LocalOps uses it directly. If not, LocalOps detects the language and framework automatically and generates a container configuration. Rails, Node.js, Python, Go, and .NET are all supported out of the box. The build trigger is a git push, identical to what the team did on Heroku.

What the platform must provide to make this genuinely equivalent to Heroku: pre-configured CI/CD that triggers on every push without external pipeline configuration, deployment status visibility without AWS console or kubectl access, and rollback capability that any developer can trigger without platform team involvement. Without all three, the git-push experience is incomplete even if the underlying deployment mechanism works correctly.

See how LocalOps handles continuous deployments →

What Production-Grade CI/CD Looks Like on a Heroku Alternative

Heroku’s CI/CD model is simple by design: push to a branch, Heroku builds the application using buildpacks and deploys it. There is no pipeline to configure. No YAML to write. No external service to connect. The entire build-deploy-verify cycle is handled by the platform automatically.

This simplicity does not scale with modern Git-based development practices in two specific ways.

First, Heroku’s pipeline model does not support preview environments natively without the Heroku CI add-on, and even with it, the implementation is limited compared to what Kubernetes-native platforms can provide. As teams grow and code review becomes more rigorous, the inability to spin up a full environment per pull request slows down QA and reduces deployment confidence.

Second, Heroku’s build system is opinionated about buildpacks and has limited support for multi-stage Docker builds, custom build tooling, and complex dependency graphs. Teams that outgrow Heroku’s buildpack ecosystem find themselves working around the platform rather than with it.

A production-grade CI/CD pipeline on a Heroku alternative has four characteristics that Heroku’s model lacks.

It builds from containers, not buildpacks. Container images are portable, reproducible, and not tied to any platform’s runtime assumptions. The same image that passes CI is the exact image that runs in production, no translation, no divergence.

It triggers on every push and every pull request automatically. No manual pipeline configuration. No YAML files to maintain. The platform detects the push, builds the image, and either deploys to a configured environment or spins up a preview environment for the pull request.

It includes health checks and automatic rollback as defaults. A deployment that fails health checks rolls back to the previous version automatically without human intervention. This is the behavior developers relied on with Heroku and expect to retain.

It provides deployment visibility without infrastructure access. Developers see build status, deployment progress, health check results, and recent deployment history in one interface, without navigating the AWS console or running kubectl commands.

LocalOps provides all four as the default configuration. CI/CD is wired in from the first deployment. There is no external pipeline to configure and no YAML to maintain.

Replicating Heroku Review Apps on AWS

Heroku Review Apps, ephemeral, per-pull-request environments with a live URL, are one of the most operationally valuable features teams lose when they move away from Heroku. Their absence slows QA, makes code review less confident, and reduces shipping velocity in ways that are hard to attribute directly but consistently felt by engineering teams.

Replicating this on AWS requires spinning up a complete, isolated environment automatically when a pull request is opened, with its own URL, its own database, its own configuration, and tearing it down automatically when the PR is closed, and resources are released. Technically possible on Kubernetes. Configuring it from scratch is a meaningful platform engineering project that most teams underestimate.

LocalOps handles this automatically. Every pull request triggers a complete, isolated preview environment with its own URL running the full application stack. No additional configuration. No platform team involvement. No approval workflow.

Each preview environment gets its own isolated namespace in the EKS cluster. Environment variables and secrets are inherited from the base configuration. The environment URL is posted automatically to the pull request as a comment. When the PR is closed, the environment tears down, and all AWS resources are released. Preview environments on LocalOps do not share a database with production or staging; each is fully isolated, with a dedicated test database or a seeded copy of production data. A broken preview environment has zero blast radius on other environments.

For CTOs evaluating Heroku alternatives, preview environments are one of the clearest signals of a platform’s production maturity. A platform that requires manual configuration or third-party tooling to provide per-PR environments has not matched what Heroku provided. A platform that provides them automatically as a default is meaningfully ahead.

See how preview environments work on LocalOps →

What Genuine Developer Self-Service Actually Requires

Developer self-service is not just about deployment. It is the full scope of infrastructure interactions a developer needs throughout the development cycle, without filing a ticket, without waiting for approval, without infrastructure knowledge.

On Heroku, this was implicit in the platform design. Every capability a developer needed, deployment, environment creation, log access, metrics viewing, and secret management, was available through the Heroku CLI or dashboard with no infrastructure knowledge required. The platform team did not need to be involved for routine developer operations.

On a raw AWS migration without a platform layer, all of this requires explicit design. Without it, the platform team becomes a bottleneck for every infrastructure interaction, not just deployments. Environment creation requires Terraform or manual AWS console work. Log access requires CloudWatch navigation or Kibana queries. Metrics require Prometheus query knowledge. Secret updates require AWS Secrets Manager access that may not be appropriate to grant broadly.

Genuine self-service on a Heroku alternative requires three things to be true simultaneously. First, deployment without tickets: any developer pushes code and sees it deployed, no approval workflow, no waiting. Second, environment management without ops involvement: developers create environments, configure variables, and manage secrets through a self-service interface without understanding VPCs, IAM roles, or Kubernetes namespaces. Third, log and metric access without AWS console knowledge: developers access their application’s logs and metrics through a unified interface without navigating CloudWatch or writing Prometheus queries.

The mechanism that makes self-service safe for compliance-sensitive teams is encoding security controls into the platform rather than into an approval process. With LocalOps, every environment is provisioned from hardened infrastructure templates following AWS Well-Architected standards. Private subnets, least-privilege IAM policies, encrypted secrets via AWS Secrets Manager, and security group configurations are applied automatically. Developers cannot provision insecure infrastructure because the insecure options are not available in the self-service interface.

Platform teams set the guardrails once. Developers work within them without knowing they exist. Security is enforced at the infrastructure level, not through a ticket queue. This is the model that eliminates the ops bottleneck without eliminating security controls.

See how LocalOps handles security by default →

Can a Team of Five to Ten Engineers Run Production SaaS on AWS Without a Dedicated Platform Function?

This is the question most directly relevant to early-stage and growth-stage teams, and the honest answer depends entirely on how they access AWS.

Raw AWS without a platform layer requires someone to own infrastructure configuration, security hardening, CI/CD pipeline setup, observability configuration, Kubernetes cluster management, and ongoing maintenance. For a team of five to ten engineers, this typically means one engineer spending 30–50% of their time on infrastructure rather than product. At the growth stage, there is a high cost to pay in engineering capacity.

An AWS-native Internal Developer Platform changes the calculation entirely.

LocalOps handles VPC provisioning, EKS cluster management, IAM configuration, security hardening, observability setup, CI/CD wiring, and autoscaling configuration automatically. A team of five to ten engineers can run production-grade AWS infrastructure, with full compliance architecture, built-in observability, and developer self-service, without any engineer owning those responsibilities full-time.

The threshold where dedicated platform engineering expertise becomes necessary is when requirements exceed what the platform handles automatically. For most teams with five to fifteen engineers, that threshold is well above where they currently operate. The platform handles the infrastructure. The team handles the product.

How LocalOps Fits In

LocalOps is an AWS-native Internal Developer Platform built specifically for teams replacing Heroku.

Connect your AWS account. Connect your GitHub repository. LocalOps provisions a dedicated VPC, EKS cluster, load balancers, IAM roles, and a complete Prometheus + Loki + Grafana observability stack automatically. No Terraform. No Helm charts. No manual configuration. First environment ready in under 30 minutes.

From there, the developer experience is identical to Heroku. Push to your configured branch. LocalOps builds, containerizes, deploys, runs health checks, and handles rollback automatically. Preview environments spin up on every pull request. Logs and metrics are available from day one in pre-built Grafana dashboards. Autoscaling runs by default.

The infrastructure runs in your AWS account. If you stop using LocalOps, it keeps running. Nothing needs to be rebuilt. Developer autonomy is preserved from day one. The ops bottleneck does not get created.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, , all of which LocalOps has saved for us.” – Gaurav Verma, CTO and Co-founder, SuprSend

Get started for free, first environment live in under 30 minutes →

Frequently Asked Questions

How do teams preserve Git push deployments after migrating to AWS without learning Kubernetes?

The answer is an Internal Developer Platform that sits between developers and AWS infrastructure, translating a git push into all the Kubernetes operations required to deploy the new version, invisibly. LocalOps detects the push, builds the container image automatically, pushes to Amazon ECR, updates the EKS deployment, runs health checks, and handles rollback if anything fails. Developers see the deployment in progress, and the new version is live within minutes. No Kubernetes knowledge required. No Helm charts. No Terraform. No platform team notification. The workflow is identical to Heroku. The infrastructure underneath is AWS running in the team’s own account.

What does a production-grade CI/CD pipeline look like on a Heroku alternative?

A production-grade pipeline on a Heroku alternative builds from container images rather than buildpacks, triggers automatically on every push and every pull request without manual pipeline configuration, includes health checks and automatic rollback as defaults, and provides deployment visibility without AWS console or kubectl access. LocalOps provides all four as the default configuration. There is no YAML to write and no external CI/CD service to connect. The entire build-deploy-verify cycle is handled by the platform automatically, the same behavior Heroku provided, running on infrastructure the team owns.

How do teams replicate Heroku Review Apps on Kubernetes-based platforms?

Replicating Heroku Review Apps on Kubernetes requires spinning up a completely isolated environment automatically when a pull request is opened, with its own URL, own database, and own configuration, and tearing it down when the PR closes. LocalOps handles this automatically on every pull request with no additional configuration required. Each preview environment gets its own isolated EKS namespace, inherits environment variables from the base configuration, and posts its URL automatically to the pull request. When the PR closes, the environment tears down, and AWS resources are released. No platform team involvement at any step.

What does genuine developer self-service require on a Heroku alternative?

Three things must be true simultaneously: deployment without tickets,, any developer pushes code and sees it deployed with no approval workflow; environment management without ops involvement,, developers create environments and manage secrets through a self-service interface without AWS or Kubernetes knowledge; and log and metric access without AWS console navigation, logs and metrics available in a unified interface from the first deployment. The mechanism that makes this safe for compliance-sensitive teams is encoding security controls into the platform rather than into an approval process, so guardrails are enforced at the infrastructure level without creating a ticket queue.

Can a five to ten-person team run production SaaS on AWS without a dedicated SRE or platform function?

Yes, with the right platform layer. Raw AWS without a platform layer requires someone to own infrastructure configuration, Kubernetes management, security hardening, observability setup, and ongoing maintenance. On a five to ten-person team, that typically means one engineer spending 30–50% of their time on infrastructure rather than product. LocalOps handles all of this automatically. The team runs production-grade AWS infrastructure with full compliance architecture, built-in observability, and developer self-service without any engineer owning infrastructure full-time. The threshold where dedicated platform expertise becomes necessary is well above where most five to fifteen-person teams currently operate.

Is AWS a good Heroku alternative for teams without DevOps expertise?

AWS is the right infrastructure foundation; the challenge is accessing AWS without requiring product engineers to become infrastructure engineers. An AWS-native IDP makes this practical. LocalOps handles VPC provisioning, EKS cluster management, IAM configuration, security hardening, CI/CD wiring, observability configuration, and autoscaling, automatically, from the first deployment. Teams of five to ten engineers run production-grade AWS infrastructure without a dedicated DevOps hire. Developers interact with git and a deployment interface. The AWS complexity is abstracted entirely, but the team’s AWS account is always fully accessible.

What makes LocalOps different from other AWS Heroku alternative platforms?

The infrastructure runs in the team’s own AWS account, not LocalOps’s. This means the compliance surface is the team’s AWS account, there is no vendor lock-in to unwind, and the infrastructure continues running independently if the team ever stops using LocalOps. Most AWS Heroku alternative platforms that provide developer-friendly workflows do so by running infrastructure in their own shared cloud, the same structural model as Heroku. LocalOps provides a Heroku-equivalent developer experience on infrastructure the team owns and controls entirely.

Key Takeaways

Replacing Heroku without creating an ops bottleneck requires treating developer autonomy as a first-class requirement, not as a feature to add after the migration is complete.

Git-push deployments, preview environments on every pull request, self-serve environment management, and unified log and metric access are all achievable on AWS. None of them requires developers to learn Kubernetes, Helm, or Terraform. They require a platform designed to absorb that infrastructure complexity invisibly, so developers keep the workflows they already have, and the business gets the infrastructure it owns.

For CTOs evaluating the best Heroku alternatives in 2026, the AWS Heroku alternative that preserves developer autonomy from day one is not the one with the most infrastructure features. It is the one where any developer on the team can deploy, access logs, and check application health without asking anyone for help, running on infrastructure the team owns, at direct AWS pricing, with no new vendor lock-in to unwind.

Schedule a Migration Call → Our engineers review your Heroku setup and walk through what developer self-serve looks like for your specific stack.

Get Started for Free → First environment on AWS in under 30 minutes. No credit card required.

Read the Migration Guide → Full technical walkthrough, database migration, environment setup, DNS cutover.

Self-Hosted Heroku Alternatives in 2026: Build vs. Buy for Platform Engineering Teams

Nidhi Pandey — Mon, 13 Apr 2026 08:52:13 GMT

A self-hosted Heroku alternative is any deployment platform that runs on infrastructure the team owns and controls, typically in their own AWS account, rather than on a shared third-party cloud.

This model solves the three most important structural problems with Heroku simultaneously: cost compounding from platform margin, compliance ceiling from shared infrastructure, and vendor lock-in from infrastructure that disappears when you leave. This is why the self-hosted category consistently dominates engineering community discussions when CTOs evaluate what comes after Heroku.

What it does not solve automatically is the operational burden of running the platform itself. That burden, and the true cost of building versus buying a self-hosted deployment platform, is what this guide covers directly.

TL;DR

What this covers: The most production-ready self-hosted Heroku alternatives in 2026, real operational limitations, compliance architecture, true build vs. buy cost, and how to avoid replicating vendor lock-in

Who it is for: CTOs and founders evaluating whether to build a self-hosted platform or buy a managed one

The core tension: Self-hosting gives you infrastructure ownership, compliance capability, and no platform margin. It transfers the full operational burden of platform maintenance to your team. For most Series A–C product-focused teams, that burden is higher than it appears before migration.

Want infrastructure ownership without building the platform yourself? Speak with the LocalOps team →

The Most Production-Ready Self-Hosted Options in 2026

The self-hosted landscape has three meaningful options for teams wanting to run on their own AWS account. Each has a distinct maturity profile and production ceiling.

Coolify is the most actively developed and provides the most Heroku-like interface, a web-based deployment dashboard, Docker-based hosting, database provisioning, SSL management, and environment variable handling. It is the most accessible entry point in this category. Its core limitation for production SaaS is autoscaling. Coolify does not natively support horizontal autoscaling based on real traffic signals; scaling is primarily manual or scheduled. Observability is not included. Proper multi-environment isolation requires manual configuration beyond what the default setup provides.

Dokku is the original self-hosted Heroku alternative. It delivers the most genuine git-push experience of any open-source option, push to a branch, application deploys, no Kubernetes required. The limitation is architectural: Dokku is a single-server platform. Horizontal scaling across multiple hosts requires significant additional work, and the single-server model creates a reliability risk for applications with SLA commitments. For teams running a small number of services with modest and predictable traffic, Dokku is a reasonable path. For production SaaS at the growth stage, the architecture is too constrained.

CapRover uses Docker Swarm to provide multi-node horizontal scaling, a meaningful step beyond Dokku. It supports a web dashboard, one-click app templates, and custom domain management. The limitation worth understanding before committing: Docker Swarm has been largely superseded by Kubernetes in the production engineering community. Teams choosing CapRover are building on a stack with declining ecosystem investment, and production patterns like canary deployments, preview environments, and application-metrics-driven autoscaling all require significant additional work.

Across all three, achieving proper multi-environment isolation, separate VPCs, environment-specific IAM policies, isolated databases, and network segmentation between dev, staging, and production requires manual configuration that none of these platforms provides automatically. This gap is the most common source of post-migration compliance and reliability incidents.

See how LocalOps handles multi-environment isolation automatically →

The Real Operational Limitations

The decision to self-host a deployment platform transfers a specific set of operational responsibilities from the platform vendor to the team. Understanding what those responsibilities actually cost is the core of the build vs. buy calculation.

Security patching and platform maintenance are ongoing and non-negotiable. CVEs in Docker, Kubernetes, the underlying OS, and the platform software require evaluation, testing, and deployment on a regular cadence. Observability setup is a multi-day project per environment that none of the platforms above includes out of the box. Prometheus for metrics, Loki for logs, Grafana for dashboards, and alerting rules all require separate configuration and ongoing maintenance as services are added. Platform on-call means that when the deployment platform has an incident, the engineering team owns the response. There is no vendor support. Scaling configuration on Kubernetes requires ongoing tuning as traffic patterns evolve; it is not a one-time setup.

The signal that a scaling startup should choose a managed AWS-native IDP over a self-hosted alternative is consistent: when engineering hours required to maintain the platform layer exceed the cost of a platform fee, and when those hours would otherwise be spent on product. For product-focused teams at Series A and beyond without a dedicated platform engineer, this threshold is crossed almost immediately. Platform maintenance consistently represents 4–8 engineering hours per week. At $100–150 per fully-loaded engineering hour for a senior engineer, that is $400–$1,200 per week in hidden maintenance costs, before any incident response.

Compliance: Self-Hosted vs. Managed PaaS

This is one of the most significant and most misunderstood dimensions of the self-hosted decision.

When your deployment platform runs on your own AWS account, your compliance surface is your own infrastructure. SOC 2 Type II, HIPAA, and GDPR assessments are conducted against your VPC configuration, your IAM policies, and your data handling practices, all of which you control. This is a structural difference from any managed PaaS alternative. On Heroku, Render, or Railway, the infrastructure is the vendor’s. Your compliance posture is bound by what the vendor certifies. When an enterprise security questionnaire asks about VPC configuration, private networking, and IAM audit logging, the honest answer on a managed PaaS is that the team does not control those things.

The compliance advantage of self-hosting is real. Realizing it requires correct implementation. A self-hosted platform running on EC2 instances without proper VPC isolation, without least-privilege IAM policies, without encrypted secrets management, and without infrastructure audit logging does not satisfy SOC 2 or HIPAA requirements, regardless of the fact that it runs in the team’s own account. Infrastructure ownership is necessary for compliance. It is not sufficient without the correct security configuration on top.

LocalOps applies all of the required compliance controls automatically in every environment, private subnets, least-privilege IAM policies, encrypted secrets via AWS Secrets Manager, security group configurations, and CloudTrail logging, following AWS Well-Architected standards as defaults, not as options. The compliance architecture is in place from the first deployment without additional configuration.

See how LocalOps handles compliance by default →

The True Build vs. Buy Cost

Most infrastructure reviews get this calculation wrong because they include only the infrastructure cost and exclude the engineering cost.

The initial build cost of a production-grade Internal Developer Platform on Kubernetes, one where any product engineer can deploy independently, with git-push workflows, preview environments, integrated observability, autoscaling, and secrets management, is consistently reported at three to six months of senior platform engineering time. At a fully-loaded cost of $200,000 per year, three months of a senior platform engineer’s time represents approximately $50,000 before the platform has shipped a single product feature. For a ten-person team, this is also three to six months during which one senior engineer is building platform infrastructure rather than product.

Ongoing maintenance adds $20,000–$40,000 per year in engineering time, permanently. Platform on-call creates incident response burden and context-switching overhead that is difficult to measure but genuinely costly. And if the developer experience migration is incomplete, if developers who used to deploy in 20 minutes on Heroku are now waiting hours for platform team involvement, the productivity cost compounds across the entire engineering team.

A managed AWS-native IDP like LocalOps charges a platform fee. The underlying infrastructure runs at AWS list pricing with no markup. Observability is included. The build cost, maintenance cost, on-call burden, and developer experience regression cost are all absorbed by the platform. For most Series A–C teams, the fully-loaded cost of building and maintaining a self-hosted Kubernetes platform significantly exceeds the cost of a managed IDP, before accounting for the opportunity cost of engineering hours redirected from product to platform.

The self-hosted build path makes economic sense for teams with two or more platform engineers whose full-time job is internal infrastructure. For product-focused teams without this capacity, the math consistently favors managed.

Walk through the cost comparison with a LocalOps engineer →

How to Avoid Replicating Heroku’s Vendor Lock-in

This is the strategic question most infrastructure evaluations underweight, and the one that determines whether the migration is made once or twice.

Heroku’s lock-in has a specific mechanism: infrastructure lives in Heroku’s systems, disappears when you leave, and accumulates dependencies with every year you stay. Managed PaaS alternatives replicate this mechanism with a different vendor name. The risk when choosing any alternative is recreating this structure in a new form.

Four infrastructure design decisions future-proof the platform choice. Infrastructure must run in your own cloud account, not the vendor’s. This is the binary decision that determines compliance ceiling, data residency, and exit optionality. The platform must use standard, portable technology, Kubernetes, not proprietary runtimes. This means infrastructure is manageable directly if you ever need to change the platform layer. The exit path must be verified explicitly before committing. Ask every vendor what happens if you stop using their platform tomorrow, and require a specific answer. And compliance requirements should be evaluated against 18-month projections, not just today’s requirements, because enterprise deals surface new requirements faster than most teams anticipate.

LocalOps is built around all four principles. Every resource is provisioned into the team’s own AWS account on standard Kubernetes. Infrastructure runs independently if the team stops using LocalOps. The compliance architecture supports SOC 2, HIPAA, and GDPR from day one as a default. The exit path is always open.

How LocalOps Fits In

LocalOps is an AWS-native Internal Developer Platform built for teams replacing Heroku, and for teams evaluating whether to build a self-hosted platform or buy a managed one.

Developers push to a configured branch. LocalOps builds, containerizes, deploys, health checks, and handles rollbacks automatically. Preview environments spin up on every pull request. Autoscaling runs by default. The infrastructure runs in your AWS account. If you stop using LocalOps, it keeps running. Everything that makes self-hosting strategically valuable, infrastructure ownership, compliance capability, no platform margin, and no vendor lock-in, is present. The operational burden of building and maintaining the platform is handled by LocalOps rather than your team.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” – Gaurav Verma, CTO and Co-founder, SuprSend

Get started for free, first environment on AWS in under 30 minutes →

Frequently Asked Questions

What are the most production-ready self-hosted Heroku alternatives in 2026?

Coolify, Dokku, and CapRover are the three most actively maintained options. Coolify provides the most Heroku-like interface but lacks production-grade autoscaling and integrated observability. Dokku is the most git-push-native but is architecturally limited to single-server deployments. CapRover provides multi-node scaling through Docker Swarm but builds on a stack largely superseded by Kubernetes. All three require significant additional configuration to achieve proper multi-environment isolation, production-grade observability, and compliance-ready infrastructure in an AWS account.

When should a scaling startup choose a managed IDP over a self-hosted alternative?

When the engineering hours required to maintain the platform layer exceed the cost of a platform fee, and when those hours would otherwise be spent on product. For product-focused teams at Series A and beyond without a dedicated platform engineer, this threshold is crossed almost immediately. Ongoing platform maintenance consistently represents 4–8 engineering hours per week. The managed IDP path makes sense for the majority of product-focused engineering teams. The self-hosted build path makes sense for teams with two or more platform engineers whose full-time job is internal infrastructure.

How do self-hosted alternatives compare to managed PaaS on SOC 2 and HIPAA compliance?

Self-hosted alternatives running on the team’s own AWS account have a structurally superior compliance architecture compared to managed PaaS platforms. The compliance surface is the team’s own AWS account, which holds SOC 2, HIPAA, GDPR, and additional certifications. However, the compliance advantage is only realized with correct implementation: proper VPC configuration, least-privilege IAM policies, encrypted secrets management, and audit logging. Infrastructure ownership is necessary for compliance. It is not sufficient without a correct security configuration on top.

What is the true build vs. buy cost of a self-hosted Kubernetes platform?

The full cost includes: initial build cost of three to six months of senior platform engineering time (approximately $50,000–$100,000), ongoing maintenance of 4–8 hours per week permanently ($20,000–$40,000 per year), on-call burden for platform incidents, and developer experience regression cost if git-push workflows are not fully replicated. For most Series A–C product-focused teams, the fully-loaded cost of building and maintaining a self-hosted Kubernetes deployment platform significantly exceeds the cost of a managed AWS-native IDP.

How do engineering leaders choose a Heroku alternative that avoids vendor lock-in?

Four decisions future-proof the choice: infrastructure must run in your own cloud account, the platform must use standard Kubernetes, not proprietary runtimes, the exit path must be verified explicitly before committing, and compliance requirements should be evaluated against 18-month projections. LocalOps satisfies all four: infrastructure in your own AWS account, standard Kubernetes, infrastructure that runs independently if you stop using the platform, and AWS compliance surface with no vendor-defined ceiling.

What is the difference between a Heroku self-hosted alternative and LocalOps?

A Heroku self-hosted alternative like Coolify or Dokku gives full infrastructure control with no licensing cost. Your team owns the complete operational burden, provisioning, patching, observability, scaling, and platform on-call. LocalOps provides the same infrastructure ownership; everything runs in your own AWS account, with the platform layer managed rather than built. The infrastructure is self-hosted. The platform is managed. For teams without dedicated platform engineering capacity, this distinction determines whether infrastructure ownership is operationally viable or not.

Key Takeaways

The self-hosted category in 2026 offers genuine strategic value, infrastructure ownership, AWS-based compliance architecture, no platform margin, and no vendor lock-in. These advantages are real and are why the category deserves serious evaluation.

The build vs. buy decision is not about whether to own your infrastructure. Infrastructure ownership in your own AWS account is the right architectural model for B2B SaaS teams with enterprise ambitions. The decision is about whether to build and maintain the platform layer on top of that infrastructure yourself or to use a managed platform that handles that layer while keeping the infrastructure in your account.

For most product-focused engineering teams at Series A and beyond, the answer is the same one the engineering community has been converging on throughout 2026: buy the platform layer, own the infrastructure.

Schedule a Migration Call → Our engineers review your current setup and walk through what infrastructure ownership looks like for your specific stack.

Get Started for Free → First environment on AWS in under 30 minutes. No credit card required.

Read the Heroku Migration Guide → Full technical walkthrough, database migration, environment setup, DNS cutover.

🎉 New release: SAML 2.0 support - Sign in using your org's Identity provider

LocalOps Inc — Mon, 13 Apr 2026 08:21:08 GMT

To make it easier for teams to on-board & off board their users, we just released SAML2.0 support.

If you are the org owner or admin (you can belong to multiple orgs, btw), you can now visit Account security settings to add a new SSO provider. With just a few clicks, you can enforce all users to sign in via your current Identity provider.

Any SAML 2.0 compliant identity provider is supported.

Okta
Microsoft Entra ID
Google Workspace SSO
OneLogin
Auth0

Talk to us, if you are using other identity providers.

Get started with LocalOps for free at https://console.localops.co/signup.

Or get a quick demo to see how LocalOps can solve DevOps bottlenecks and streamline AWS deployments in your org.

Heroku Alternatives the Engineering Community Actually Recommends in 2026

Nidhi Pandey — Fri, 10 Apr 2026 06:40:19 GMT

The most useful signal when evaluating Heroku alternatives is not vendor marketing or feature comparison pages. It is the pattern of decisions made by engineering leaders who have already undergone the migration and are willing to share what worked, what did not, and what they wish they had known before starting.

Across Reddit threads on r/devops, r/rails, r/node, and r/selfhosted, and Hacker News discussions on platform engineering and infrastructure decisions, consistent patterns have emerged in 2026 that did not exist with the same clarity two years ago. The community has been through enough Heroku migrations now, successful ones, failed ones, and ones that required doing twice, to have developed a genuine consensus on what works at production scale.

This guide synthesizes those patterns, maps them to the structural differences between platform categories, and gives engineering leaders a framework for choosing a Heroku alternative that does not replicate the vendor lock-in problem they are trying to solve.

TL;DR

What this covers: What the engineering community actually recommends as Heroku alternatives in 2026, how the landscape has shifted, how top alternatives compare on TCO, and how to choose a platform that avoids recreating Heroku’s lock-in

Who it is for: CTOs and engineering leaders evaluating Heroku alternatives who want a community-validated signal alongside structural analysis

The community consensus: Managed PaaS alternatives are a transitional step, not a destination. Infrastructure ownership on AWS, with a platform layer that preserves developer experience, is what the community consistently validates for production SaaS at scale.

Want to see what LocalOps looks like for your specific stack? Schedule a walkthrough

How the Heroku Alternatives Landscape Has Shifted in 2026

The Heroku alternatives landscape in 2026 looks meaningfully different from what it was two years ago, and the shift is not primarily about new platforms entering the market.

The shift is in how engineering leaders are framing the decision.

In 2023 and 2024, the dominant question in the community was: What is the easiest migration from Heroku? Teams were evaluating alternatives primarily on migration friction, how quickly they could get off Heroku, and how similar the new experience would feel.

In 2026, the dominant question has changed: what infrastructure foundation does our business need for the next three years? Teams are evaluating alternatives on strategic fit, compliance capability, cost structure at scale, exit optionality, and whether the platform they choose will be the last migration they make or the first of two.

That shift in framing produces significantly different answers. The easiest migration is often not the right strategic choice. A team that moves from Heroku to Render solves the immediate cost and developer experience problem. If they are selling to enterprise customers, they will face the same compliance conversation 18 months later, with more accumulated dependencies and less time to address it under better conditions.

Where compliance-sensitive teams are landing:

Enterprise-grade, compliance-sensitive SaaS teams have converged on AWS-native infrastructure with a platform layer on top. The reasons are consistent across community discussions:

SOC 2 and HIPAA requirements demand infrastructure running in the team’s own cloud account. Enterprise security questionnaires require VPC configuration, private networking, and IAM audit trails that managed PaaS platforms cannot provide. Cost efficiency at scale requires direct AWS pricing, not a PaaS margin that compounds with every service added. Architectural flexibility requires Kubernetes-grade infrastructure, not dyno-based compute.

The best Heroku alternatives for this profile in 2026 are platforms that run on AWS infrastructure the team owns, with enough abstraction that developers do not need to interact with that infrastructure directly.

What the Engineering Community Is Actually Recommending

The community signal on Heroku alternatives is worth examining directly, because it captures the failure modes that vendor comparisons do not surface.

Pattern 1: The Managed PaaS Stepping Stone

The most frequently discussed migration pattern in the community is one that involves two migrations, not one.

Team moves from Heroku to Render or Railway to reduce friction and cost. The migration goes smoothly. Developer experience is preserved. Costs reduce modestly. The platform feels like a clear upgrade from Heroku for the first 12–18 months.

Then something changes. An enterprise prospect sends a security questionnaire. Or a compliance audit surfaces infrastructure requirements that the platform cannot meet. Or the cost structure at 15+ services starts looking familiar, platform margins compounding across services the same way Heroku’s did.

The team migrates again. This time to infrastructure they own.

The community commentary on this pattern is consistent and pointed: the second migration is more expensive than going directly to infrastructure ownership would have been. More accumulated dependencies to untangle. More technical debt from the intermediate platform. More urgency because the enterprise deals are now in an active pipeline rather than theoretical future scenarios.

The observation that surfaces in nearly every thread where this pattern is discussed: the teams that went to infrastructure ownership directly made the migration once.

Pattern 2: The Raw AWS Complexity Trap

The second common pattern discussed is the team that moves directly to raw AWS and discovers that getting from “AWS is provisioned” to “any developer can deploy independently” is a multi-month platform engineering project.

The infrastructure is technically sound. The compliance posture is correct. The cost structure is right. But product engineers who used to deploy every 20 minutes on Heroku now file tickets with a platform team and wait 48 hours. Shipping velocity drops. The migration is blamed, even though the infrastructure itself is fine.

The community diagnosis is consistent: the technical migration succeeded. The developer experience migration failed. The team built the infrastructure layer without building the platform layer, which makes the infrastructure accessible to product engineers.

The resolution discussed in these threads is almost always the same: adopt a platform layer on top of the AWS infrastructure. In many cases, this is specifically what brings teams to AWS-native Internal Developer Platforms; they have the AWS foundation already, and they need the developer experience layer on top.

Pattern 3: The Rails Hosting Question

Rails teams are among the most active in these discussions, and their requirements surface a specific evaluation dimension that generic platform comparisons miss.

The community consensus on Rails hosting Heroku alternative options is clear and specific: any platform being evaluated for Rails production hosting needs to handle Sidekiq workers, Postgres with connection pooling, Active Storage with object storage, Action Cable with Redis, and scheduled tasks, as first-class service types, not as workarounds or add-on integrations.

Platforms that handle these as edge cases consistently receive community recommendations against for production Rails applications. Platforms that treat them as native deployment patterns, web processes and workers scaling independently, Redis inside the VPC, cron jobs as a first-class service type, consistently receive positive recommendations.

Pattern 4: The Infrastructure Ownership Conclusion

The most significant shift in community sentiment between 2024 and 2026 is the emergence of a clear conclusion on infrastructure ownership.

In 2024, the community was still debating whether infrastructure ownership was worth the operational complexity. In 2026, that debate has largely been settled for B2B SaaS teams with an enterprise go-to-market motion.

The observation that captures the current community position most precisely:

“The teams that moved to infrastructure they own early are the ones having the smoothest conversations with enterprise prospects. The teams still on managed platforms are the ones explaining to their board why a $200K deal is stuck in security review.”

This is not a technical observation. It is a business one. And it reflects how the community conversation has matured from infrastructure optimization to strategic positioning.

Read how engineering teams navigate this transition

How Top Heroku Alternatives Compare on Total Cost of Ownership

For teams running 20+ production services, the cost structure differences between Heroku alternative categories are significant and compounding. Surface-level pricing comparisons miss the structural dynamics that determine actual TCO at scale.

Managed PaaS alternatives: Render, Railway, Fly.io

These platforms reduce cost versus Heroku, but they do not eliminate the platform margin. Every compute resource, managed database, cache instance, and monitoring capability still carries a vendor margin layered on top of the underlying infrastructure cost.

At 20+ services, this margin compounds. Each new service adds compute margin, database margin, Redis margin, and monitoring cost simultaneously. The efficiency ceiling is lower than direct AWS because the platform margin persists regardless of scale. And observability typically requires additional add-ons with separate billing, recreating one of the most frustrating cost dynamics of Heroku at a slightly lower price point.

Open-source self-hosted alternatives: Coolify, Dokku, CapRover.

These platforms eliminate the platform margin. Infrastructure runs at direct cloud pricing with no vendor markup. For teams with dedicated platform engineering capacity, the compute and managed service costs are as low as they can be.

The TCO calculation changes when engineering time is included. Provisioning, security patching, observability setup, autoscaling configuration, and on-call response for platform issues all fall to the team. For most product-focused engineering teams, the engineering hours required to operate a self-hosted platform in production represent a higher total cost than a managed platform fee, even accounting for the elimination of the platform margin.

AWS-native Internal Developer Platforms: LocalOps

The cost structure is fundamentally different from both categories above. LocalOps charges a flat platform fee. The underlying infrastructure runs at AWS list pricing with no markup. Observability, Prometheus, Loki, and Grafana are included at no additional cost regardless of service count.

At 20+ services, the difference compounds in the IDP’s favour. Every additional service adds only AWS infrastructure cost. No observability cost increment. No platform margin on database or cache. The gap between managed PaaS total cost and AWS-native IDP total cost widens with every service added.

For an accurate TCO comparison based on your current Heroku invoice and service count, the LocalOps team will model it directly.

Structural Differences Between First-Generation Alternatives and AWS-Native IDPs

This is the distinction that most Heroku alternative evaluations underweight, because the surface-level experience of managed PaaS platforms and AWS-native IDPs can look similar to developers, while the underlying architecture is fundamentally different.

The structural difference is not about developer experience on day one. Both categories can provide a git-push deployment workflow. The structural difference is about what happens at month 18 when enterprise deals arrive, compliance requirements sharpen, and the cost structure at scale comes under scrutiny.

First-generation alternatives, Render, Railway, and Fly.io, improve on Heroku’s developer experience and pricing. The fundamental model is the same: your infrastructure runs on someone else’s cloud. Your compliance posture is bound by what the vendor chooses to support. Your exit path requires rebuilding infrastructure from scratch. You are trading one managed dependency for another.

AWS-native Internal Developer Platforms change the model entirely. Infrastructure runs in your cloud account. Developer experience is preserved, and git push deploys without Kubernetes knowledge. Observability, CI/CD, autoscaling, and secrets management are built in. And if you stop using the platform, your infrastructure keeps running. Nothing needs to be rebuilt.

This structural difference is what the community discovered through the stepped migration pattern described above. The teams that made it once recognized this distinction before choosing. The teams that made it twice discovered it afterwards.

How to Choose a Heroku Alternative That Avoids Replicating Vendor Lock-in

This is the question that separates a good infrastructure decision from a decision that creates the same problem in a different form.

Heroku’s vendor lock-in has a specific mechanism: your infrastructure lives in Heroku’s systems. When you leave, it disappears. You start from scratch. Every year you stay on Heroku, the dependencies accumulate, and the eventual migration becomes more expensive.

The risk when choosing a Heroku alternative is choosing a platform that replicates this mechanism with a different vendor name. The managed PaaS category does this structurally; Render, Railway, and Fly.io all use the same model. Your infrastructure lives in their systems. When you leave, you start from scratch.

The infrastructure design decisions that future-proof the platform choice:

Decision 1: Infrastructure must run in your cloud account.

This is the binary decision that determines everything downstream. If infrastructure runs in your account, your VPC, your EKS cluster, your RDS database, then the platform vendor you use to manage that infrastructure is replaceable. The infrastructure is yours. The management layer is a service you pay for, not a dependency you are locked into.

If infrastructure runs in the vendor’s systems, you are locked in structurally, regardless of how the platform markets itself.

Decision 2: The platform must use standard, portable technology.

Kubernetes is the standard. Helm charts are standard. Terraform is standard. Any platform that runs your workloads on standard Kubernetes in your own account gives you the option to manage that infrastructure directly if you ever need to change the platform layer.

Platforms that run on proprietary runtimes, proprietary deployment formats, or proprietary infrastructure abstractions create lock-in even if the infrastructure nominally runs in your account.

Decision 3: Verify the exit path before committing.

Ask every platform vendor directly: if we stop using your platform tomorrow, what does our infrastructure look like, and can we continue operating it independently?

LocalOps answers this question specifically and directly. Every resource LocalOps provisions lives inside the team’s own AWS account. EKS clusters, RDS databases, VPCs, load balancers, all owned by the team, all running in their account, all manageable directly through the AWS console or CLI if the team ever stops using LocalOps. There is no data to export. There is no infrastructure to rebuild. The exit path is always open.

Platforms that cannot answer this question clearly, or that answer it with migration timelines, data export processes, or infrastructure rebuild requirements, are creating vendor lock-in regardless of how they describe their model.

Decision 4: Evaluate compliance ceiling, not just current compliance.

Managed PaaS platforms have a compliance ceiling defined by what the vendor chooses to support. That ceiling may be adequate today. It may not be adequate in 18 months when enterprise procurement processes become part of the sales cycle.

AWS-native IDPs running in the team’s own account have no compliance ceiling. The compliance surface is AWS, which holds SOC 2, HIPAA, GDPR, PCI DSS, and dozens of additional certifications. The compliance architecture grows with the business rather than constraining it.

See how LocalOps handles compliance by default

How LocalOps Fits the Community’s Validated Pattern

LocalOps is an AWS-native Internal Developer Platform built specifically for teams replacing Heroku.

From that point onwards, the developer experience is identical to Heroku. Push to your configured branch. LocalOps builds, containerizes, and deploys to AWS automatically. Preview environments spin up on every pull request. Logs and metrics available from day one. Autoscaling and auto-healing run by default.

The infrastructure runs in your AWS account. If you stop using LocalOps, it keeps running. Nothing needs to be rebuilt. This is the architectural model the community has converged on: infrastructure ownership with developer simplicity, no new vendor dependency, no compliance ceiling.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“We saved months of DevOps effort by using LocalOps,” – Shobit Gupta, Ex-Uber, CTO and Co-founder, Segwise.

Get started for free, first environment live in under 30 minutes

Frequently Asked Questions

How has the Heroku alternatives landscape shifted in 2026, and which platforms have emerged as most viable for enterprise-grade SaaS teams?

The most significant shift is not in the platforms available; it is in how engineering leaders are framing the decision. In 2023 and 2024, teams optimised for migration ease. In 2026, teams are optimizing for the three-year infrastructure foundation their business needs. This shift produces different answers. Managed PaaS alternatives remain viable for early-stage teams without current enterprise compliance requirements. For enterprise-grade, compliance-sensitive SaaS teams, AWS-native Internal Developer Platforms running in the team’s own account have emerged as the clear choice, because they are the only category that satisfies compliance requirements without a ceiling, eliminates platform margin at scale, and provides a genuine exit path.

What are engineering teams on Reddit and Hacker News actually recommending in 2026?

The community consensus has coalesced around three consistent positions. First: managed PaaS alternatives like Render and Railway are a transitional step, not a destination; teams that go there first often end up migrating again when compliance requirements arrive. Second: going directly to raw AWS without a platform layer creates developer experience problems that erode the infrastructure benefits. Third: AWS-native Internal Developer Platforms, infrastructure in your own account with a developer experience layer on top, is the pattern the community validates for production SaaS teams with enterprise ambitions. Rails teams specifically require platforms that handle Sidekiq, Postgres, Active Storage, and Action Cable as first-class concerns.

How do Render, Railway, Fly.io, and AWS-native IDPs compare on TCO at 20+ services?

At 20+ production services, the structural cost differences become significant. Managed PaaS alternatives maintain a platform margin on every component, compute, database, cache, and monitoring, which compounds with each new service. At scale, the observability cost alone adds meaningfully as per-service monitoring costs multiply. AWS-native IDPs like LocalOps charge a flat platform fee with AWS list pricing on all infrastructure and observability included at no additional cost. The cost gap widens with every service added because every service adds another component where the margin difference applies. For an accurate comparison based on your current stack, the LocalOps team will model it from your Heroku invoice.

What are the structural differences between first-generation Heroku alternatives and AWS-native IDPs?

The fundamental difference is infrastructure ownership. First-generation alternatives, Render, Railway, and Fly.io, run infrastructure on the vendor’s shared cloud. No VPC ownership. Compliance ceiling defined by the vendor. Exit path requires a full infrastructure rebuild. Platform margin persists regardless of scale. AWS-native IDPs run infrastructure in the team’s own AWS account. Full VPC isolation. No compliance ceiling, the surface is AWS itself. Exit path is always open, infrastructure continues running independently. Direct AWS pricing with no platform margin. The developer experience on day one can look similar. The strategic implications diverge significantly at month 18.

How do engineering leaders choose a Heroku alternative that avoids replicating vendor lock-in?

Four infrastructure design decisions future-proof the platform choice. First: infrastructure must run in your cloud account, not the vendor’s. Second: the platform must use standard, portable technology, Kubernetes, not proprietary runtimes. Third: verify the exit path explicitly before committing, ask what happens if you stop using the platform tomorrow and evaluate the answer carefully. Fourth: evaluate compliance ceiling against 18-month requirements, not just current requirements. LocalOps specifically addresses all four: infrastructure in your AWS account, standard Kubernetes, explicit exit path with infrastructure running independently, and AWS compliance surface with no vendor-defined ceiling.

Is LocalOps a viable Heroku alternative for Rails applications specifically?

Yes. Rails applications require specific infrastructure handling: Sidekiq background workers, Postgres with connection pooling, Action Cable with Redis, Active Storage with object storage, and scheduled tasks. LocalOps handles all of these as first-class service types. Web processes and Sidekiq workers are configured and scale independently. Amazon RDS provides Postgres inside your VPC with connection pooling configuration. ElastiCache provides Redis for Action Cable and job queuing. Native cron jobs replace Heroku Scheduler. The rails hosting heroku alternative path through LocalOps preserves the git-push deployment workflow while running on infrastructure the team owns.

What is the difference between a Heroku self-hosted alternative and LocalOps?

A Heroku self-hosted alternative like Coolify or Dokku gives full infrastructure ownership with no platform vendor dependency. The team owns the complete operational burden, provisioning, security patching, observability setup, scaling configuration, and on-call response for platform issues. For teams without dedicated platform engineering capacity, the operational cost of running a self-hosted platform in production consistently exceeds initial estimates. LocalOps provides the same infrastructure ownership; everything runs in your own AWS account, with the platform layer managed. The infrastructure ownership is equivalent. The operational overhead is not. LocalOps is designed for teams that want infrastructure ownership without building and maintaining the platform themselves.

Key Takeaways

The engineering community’s consensus on Heroku alternatives in 2026 is clearer than it has ever been, because enough teams have now been through the full cycle of migration, operation, and in some cases re-migration to know what works at production scale.

Managed PaaS alternatives are a transitional step, not a destination. They solve the immediate Heroku problem and recreate the structural lock-in problem. Teams with enterprise ambitions discover this ceiling within 12–18 months.

Raw AWS without a platform layer solves the infrastructure ownership problem and creates a developer experience regression that erodes the infrastructure benefits. The two problems require two solutions, infrastructure ownership and developer experience preservation, not one.

AWS-native Internal Developer Platforms are the pattern the community validates for production SaaS teams at scale. Infrastructure in your own account. Developer experience preserved. No new vendor dependency. No compliance ceiling. Cost structure that scales proportionally with usage rather than in tier jumps.

The best Heroku alternatives in 2026 are the ones that solve the immediate migration problem and the long-term infrastructure ownership problem simultaneously, so the migration is made once, under conditions the team controls, and does not need to be repeated.

Schedule a Migration Call → Our engineers review your current Heroku setup and walk through what the migration looks like for your specific stack.

Get Started for Free → First production environment on AWS in under 30 minutes. No credit card required.

Read the Heroku Migration Guide → Full technical walkthrough, database migration, environment setup, DNS cutover.

The Real Cost of Heroku at Scale: A Teardown for CTOs Evaluating Alternatives

Nidhi Pandey — Tue, 07 Apr 2026 05:32:17 GMT

The real cost of Heroku is not what appears on the invoice.

The invoice is the visible portion: dyno tiers, database add-ons, Redis instances, and monitoring tools. It is real, it compounds, and it grows faster than engineering leaders expect. But for Series A and beyond SaaS teams, the invoice cost is the smallest component of what Heroku actually costs the business.

The higher costs are the ones that do not appear on any statement: the engineering hours spent working around platform limitations instead of building products, the architectural decisions shaped by what Heroku supports rather than what the system needs, and the enterprise deals that stall or never close because the infrastructure cannot satisfy the security questionnaire.

This guide is a complete cost teardown. It is written for CTOs who are evaluating whether the infrastructure decision in front of them is an operational one or a strategic one.

TL;DR

What this covers: The complete cost of Heroku at scale, invoice cost, add-on compounding, engineering opportunity cost, observability stack costs, and the true total cost calculation versus migrating to AWS

Who it is for: CTOs and engineering leaders evaluating whether the financial case for migrating from Heroku justifies the migration investment

The conclusion: The invoice cost is only one of three cost components. For B2B SaaS teams at Series A and beyond, the compliance cost and engineering opportunity cost together exceed the infrastructure invoice, and neither appears in standard infrastructure reviews.

Want to model what your Heroku setup costs on LocalOps + AWS? Speak with the LocalOps team

When Heroku’s Pricing Becomes Financially Indefensible

Heroku’s pricing model does not become a problem at a specific team size or traffic level. It becomes a problem at a specific combination of service count, add-on depth, and business ambition, and that combination arrives faster than teams expect.

The inflection point for B2B SaaS teams arrives somewhere between five and fifteen engineers. Not because the team is large. Because product complexity at that stage drives service count past the point where add-on costs become a significant and compounding line item.

What the inflection looks like in practice:

A team running a single production application on Heroku has a manageable bill. One dyno tier. One Heroku Postgres instance. One Redis instance. Maybe Papertrail for logs. The total is meaningful but explainable.

A team running five production services on Heroku has a fundamentally different cost structure. Each service has its own dyno configuration. Each service typically requires its own database tier. Heroku Postgres pricing is per-instance, not shared across services. Each service adds to the Redis connection count, pushing toward higher Redis tiers. Log volume across five services pushes Papertrail into higher pricing tiers. APM costs multiply across services.

The relationship between service count and cost is not linear on Heroku. It is multiplicative. Every new service does not add one cost layer. It adds five: compute, database, cache, logging, and monitoring, each carrying a platform margin.

What the comparison looks like when migrating to AWS:

The cost difference between Heroku and AWS via an Internal Developer Platform comes from two structural sources. First: the platform margin disappears entirely. Compute, database, cache, and job queue resources run at AWS list pricing with no markup. Second: observability is included. LocalOps includes Prometheus, Loki, and Grafana pre-configured in every environment at no additional cost, eliminating the Papertrail, New Relic, and APM add-on line items entirely.

The direction of this difference is structural. It does not change with scale; AWS pricing without a platform margin is 3-4x lower than PaaS pricing with one. The size of the difference depends on stack composition. For a model based on your current Heroku invoice, the LocalOps team will calculate it directly.

Why Heroku Add-On Costs Grow Faster Than Revenue

This is the cost dynamic that surprises engineering leaders when they first examine it systematically. Heroku’s add-on costs do not scale with revenue. They scale with product complexity, and product complexity grows faster than revenue at SaaS companies in the growth stage.

The Heroku Postgres compounding problem.

Heroku Postgres pricing is structured around tiers defined by row limits, connection limits, and storage. As applications grow, databases move through these tiers, but not in proportion to actual usage growth. A database that grows from 5 million to 7 million rows may jump a full pricing tier even though the actual resource consumption increase is modest. More significantly, in a multi-service architecture, each service typically requires its own Heroku Postgres instance. The database cost compounds per service, not per application.

The Heroku Redis compounding problem.

Heroku Redis pricing is structured around connection limits and memory. As more services connect to Redis, for session management, job queuing, caching, and pub/sub, the connection count drives tier upgrades. Redis tier upgrades on Heroku are significant price jumps. And like Postgres, a multi-service architecture typically requires multiple Redis instances, each on its own billing tier.

The monitoring add-on compounding problem.

Papertrail pricing scales with log volume. As service count grows, log volume grows, typically faster than traffic growth, because more services mean more internal log output independent of external request volume. New Relic and Scout APM pricing scales with host count or service count. Adding a new service does not just add compute cost. It adds monitoring cost across every observability add-on in the stack.

The AWS-native alternative cost structure:

On AWS via LocalOps, the cost structure is fundamentally different. Amazon RDS pricing is based on instance type and storage, not on row counts or arbitrary tier boundaries. A database with 7 million rows costs the same as a database with 5 million rows if the instance type handles both. Amazon ElastiCache pricing is based on node type and replication configuration, not on connection count tiers that force upgrades as services scale. And observability, logs, metrics, and dashboards are included in LocalOps at no additional cost, regardless of service count or log volume.

The structural difference: Heroku add-on costs are designed around tier boundaries that create forced upgrades as applications grow. AWS-native services are priced on actual resource consumption with no artificial tier boundaries driving cost jumps.

See how the full Heroku to AWS stack mapping works

The True Total Cost of Heroku: What CTOs Miss in the Analysis

The Heroku cost analysis that surfaces in infrastructure reviews covers only the invoice. For a CTO preparing a board-level infrastructure recommendation, the invoice is the wrong starting point.

The true total cost of Heroku has three components.

Component 1: The Invoice Cost

The visible portion. Dyno tiers, database add-ons, Redis instances, monitoring tools, scheduler add-ons, and log management. This is the number that appears on the credit card statement and in the finance team’s questions.

It is real, and it compounds, but for Series A and beyond teams, it is not the largest cost component.

Component 2: The Engineering Opportunity Cost

The hours engineering teams spend working around Heroku’s limitations rather than building a product. This cost does not appear on any invoice. It accumulates in recognizable patterns that every engineering leader at a scaling SaaS company has observed.

A senior architect scopes a feature differently because the technically correct implementation requires a storage pattern that Heroku handles poorly. A backend engineer spends three days building a workaround for a networking limitation that VPC-native infrastructure would handle natively. A team defers a microservices decomposition they know is right for the product because the operational complexity on Heroku is prohibitive without underlying networking primitives.

None of these decisions appears as infrastructure costs. All of them are real costs, paid in engineering time, in technical debt, and in product decisions made to serve the platform rather than the customer.

For a Series B SaaS company with fifteen engineers at an average fully-loaded cost of $200,000 per year, every engineering hour is worth approximately $100. If Heroku’s limitations consume two hours per engineer per week in workarounds, delayed decisions, and architectural compromises, the annual opportunity cost exceeds $150,000. This cost does not appear anywhere in infrastructure reviews. It is often the largest cost component.

Component 3: The Compliance Revenue Cost

For B2B SaaS teams with an enterprise go-to-market motion, this is frequently the significant cost component and the least visible until an enterprise deal surfaces.

Enterprise procurement processes require infrastructure controls that Heroku cannot provide: VPC isolation, private networking between services, IAM-based access control with audit logging, dedicated infrastructure and data residency in a specified region. When the security questionnaire arrives, and the honest answer to every infrastructure question is “we don’t control that,” the deal quickly starts going south.

The revenue impact of this infrastructure gap is real and calculable. A single $150,000 ARR enterprise deal delayed by one quarter because of infrastructure compliance questions costs $37,500 in revenue timing. A single deal that never closes because the infrastructure cannot satisfy the security review costs the full contract value. For a company with multiple enterprise deals in the pipeline, the compliance cost of staying on Heroku can dwarf every other cost component combined.

The Total Cost Calculation

When CTOs present the infrastructure transition to their board or CEO, the analysis that generates alignment is the one that includes all three components.

Invoice savings: structural, predictable, and begin immediately on migration. The platform margin disappears. Observability add-on costs disappear.

Engineering opportunity cost recovery: directionally clear, grows with team size. Senior engineering hours redirected from platform workarounds to product development.

Compliance revenue unlock: the component that makes the migration financially obvious for any B2B SaaS team with enterprise ambitions. Infrastructure that answers the security questionnaire cleanly is infrastructure that does not block deals.

Together, these three components reframe the infrastructure migration from an operational cost to a strategic investment with a compounding return.

Why Heroku’s Tier-Based Pricing Fails SaaS Companies

Heroku’s pricing model was designed for simplicity at a small scale. It is structurally misaligned with how SaaS businesses actually grow and operate at scale.

The tier-jump problem.

Heroku pricing scales in tiers, not proportionally with usage. When resource requirements grow past a tier boundary, the cost jumps to the next tier regardless of whether actual usage justifies the full tier ceiling. Teams pay for the tier ceiling, not for actual consumption.

For finance teams preparing infrastructure forecasts, this makes cost modeling unreliable. Infrastructure spend jumps at irregular intervals unrelated to business growth metrics. A 20% increase in traffic does not produce a 20% increase in infrastructure cost; it might produce a 0% increase or a 40% jump, depending on where the team sits relative to tier boundaries.

The seasonal traffic problem.

Many SaaS applications have non-linear traffic patterns. B2B applications peak during business hours and drop to near-zero overnight and on weekends. Consumer applications spike around product launches and marketing campaigns. Event-driven workloads process jobs in bursts that may be 10x the average load.

Heroku’s response to all of these patterns is identical: provision for peak capacity and pay for it continuously. Go to the performance tier, pay us more to save more. Teams either over-provision, paying for idle capacity at all times, or under-provision and accept performance degradation during spikes.

AWS horizontal autoscaling on EKS responds to this directly. Workloads scale out when traffic increases and back in when it drops, automatically, without human intervention. Teams pay for actual compute consumption proportional to real usage, not for the tier ceiling required to handle the peak.

The predictability gap.

For a CTO preparing a 12-month infrastructure budget, Heroku’s tier-based model creates a forecasting problem. The budget for next year is not last year’s Heroku invoice scaled by growth. It is last year’s invoice scaled by growth, plus the tier-jump events triggered by crossing the service count and traffic thresholds the product roadmap implies.

AWS-native infrastructure priced by actual consumption solves this forecasting problem directly. Infrastructure spend grows in proportion to actual usage. Budget modeling is straightforward. Surprises are eliminated.

See how autoscaling works by default on LocalOps →

The Real Cost of the Heroku Observability Stack

This is the cost teams underestimate, until they look at the Heroku invoice line by line and add up what monitoring actually costs.

A typical production Heroku stack assembles observability from multiple paid add-ons:

Papertrail for log management. Pricing scales by log volume, which grows with service count and traffic regardless of optimization. At production scale with multiple services, Papertrail costs accumulate quickly as log volume grows past free tier limits.

New Relic or Scout for application performance monitoring. APM pricing on Heroku add-ons scales with host count or agent count. Every new service added to the production stack adds another APM agent, another billing line item that compounds with each new service deployment.

Additional tools for error tracking, uptime monitoring, and alerting, each with their own pricing tier, their own billing cycle, and their own failure modes.

The operational cost beyond the financial one:

The financial cost of the Heroku observability stack is significant. The operational cost is often larger.

When an incident occurs at 2 am, a Heroku team correlates information across multiple dashboards from multiple vendors with different data models and different refresh rates. Logs in Papertrail. Metrics in New Relic. The relationship between a spike in error rates and a specific deployment requires context-switching between tools. Each tool switch adds minutes to incident response time. For SaaS applications with customer-facing SLAs, those minutes matter.

The tools are often configured independently with no unified alerting model. An alert threshold set in New Relic does not automatically correlate with a log pattern in Papertrail. Building that correlation requires manual work, or accepting that incidents will be identified more slowly than they would be on a platform with integrated observability.

What integrated observability looks like:

LocalOps includes Prometheus, Loki, and Grafana pre-configured in every environment at no additional cost.

Prometheus collects metrics automatically from every service, CPU, memory, request rate, error rate, and custom application metrics. No agent installation. No per-service configuration.

Loki aggregates logs from all services through standard output. No log drain configuration. No Papertrail account. No log volume pricing tiers.

Grafana provides unified dashboards with pre-built views for infrastructure metrics and application logs in a single interface. When something breaks at 2 am, logs and metrics are in the same place, with the same timestamps, correlated automatically.

The observability tools that are monthly line items on a Heroku invoice, adding up to hundreds of dollars per month for a typical production stack, are included in LocalOps as infrastructure. There is no add-on to configure. There is no additional cost. There is no vendor to manage.

See how built-in monitoring works on LocalOps

How LocalOps Addresses the Cost Problem Structurally

LocalOps is an AWS-native Internal Developer Platform built specifically for teams replacing Heroku.

From that point, the developer experience is identical to Heroku. Push to your configured branch. LocalOps builds, containerizes, and deploys to AWS automatically. Logs and metrics are available from day one. Autoscaling and auto-healing run by default.

The cost structure is fundamentally different from Heroku. LocalOps charges a flat platform fee. The underlying infrastructure runs at AWS list pricing with no markup. Observability is included. The tier-jump cost model is replaced by proportional pricing that scales with actual usage.

The infrastructure runs in your AWS account. If you stop using LocalOps, it keeps running. Nothing needs to be rebuilt.

“Their thoughtfully designed product and tooling entirely eliminated the typical implementation headaches. Partnering with LocalOps has been one of our best technical decisions.” – Prashanth YV, Ex-Razorpay, CTO and Co-founder, Zivy
“Even if we had diverted all our engineering resources to doing this in-house, it would have easily taken 10–12 man months of effort, all of which LocalOps has saved for us.” – Gaurav Verma, CTO and Co-founder, SuprSend

Get started for free, first environment on AWS in under 30 minutes →

Frequently Asked Questions

At what point does Heroku’s pricing become financially indefensible?

The inflection point varies by stack composition but consistently arrives when service count grows past five and add-on costs begin compounding across multiple services simultaneously. The signal is not the absolute invoice amount; it is when the invoice becomes difficult to attribute cleanly across services, difficult to forecast accurately, and impossible to optimize without changing the underlying platform. For B2B SaaS teams, this happens between five and fifteen engineers, driven by product complexity rather than team size directly.

Why do Heroku add-on costs grow faster than revenue as SaaS teams scale?

Heroku add-on costs scale with product complexity rather than with revenue. Adding a new service to a Heroku production stack does not add one cost layer; it adds compute, database, Redis, logging, and monitoring costs simultaneously, each carrying a platform margin. Database tier pricing is driven by row counts and connection limits that force upgrades independently of revenue growth. Log volume and APM agent counts scale with service count rather than with business metrics. The result is a cost structure where infrastructure spend grows faster than revenue at precisely the growth stage where unit economics matter.

How should a CTO calculate the true total cost of Heroku?

The full calculation has three components. Invoice cost: dyno tiers, database add-ons, Redis, monitoring tools, scheduler, totalled across all production services. Engineering opportunity cost: hours spent on platform workarounds, architectural compromises made to serve Heroku’s limitations, and deferred technical decisions that accumulate as debt. Compliance revenue cost: Deals are delayed or lost because the infrastructure cannot satisfy enterprise security questionnaires. For Series A and beyond B2B SaaS teams with enterprise ambitions, the compliance revenue component is the largest and least visible, and the one that makes the migration decision strategically obvious when it surfaces.

Why is Heroku’s tier-based pricing misaligned for seasonal or variable traffic?

Heroku requires teams to provision for peak capacity and pay for it continuously; there is no automatic scale-down when traffic drops. For B2B applications with sharp usage peaks during business hours, consumer applications with campaign-driven spikes, or any application with variable traffic patterns, the choice is between over-provisioning at continuous cost or under-provisioning and accepting performance degradation. AWS horizontal autoscaling on EKS scales out when the load increases and back in when it drops automatically. Teams pay for actual compute consumption, not for the tier ceiling required to handle the peak.

What does the Heroku observability stack actually cost at production scale?

A typical production Heroku stack assembles observability from Papertrail for log management, New Relic or Scout for APM, and potentially additional tools for error tracking and uptime monitoring. Each add-on has its own pricing tier that scales with usage, log volume for Papertrail, host or service count for APM tools. The combined cost compounds with the service count. Beyond the financial cost, the operational cost of correlating logs and metrics across multiple disconnected tools adds meaningful time to incident response. LocalOps includes Prometheus, Loki, and Grafana pre-configured in every environment at no additional cost, replacing the entire assembled observability stack with integrated tooling that provides better correlated visibility at zero marginal cost.

What is the difference between a Heroku self-hosted alternative and an AWS-native IDP in terms of cost?

A Heroku self-hosted alternative like Coolify or Dokku eliminates platform margin on infrastructure but requires the team to own the full operational burden, provisioning, security patching, observability setup, and on-call response for the platform itself. The infrastructure cost is lower. The engineering cost of running the platform is high and ongoing. An AWS-native IDP like LocalOps provides the same infrastructure cost efficiency, direct AWS pricing, and no platform margin, with the platform layer managed. For teams without dedicated platform engineering capacity, the total cost of a self-hosted alternative consistently exceeds the total cost of a managed IDP once engineering hours for platform maintenance are included.

How do Heroku’s open source alternatives compare on observability cost?

Heroku open source alternatives eliminate the platform margin on compute and managed services. They do not eliminate the observability cost problem; they shift it. Rather than paying for Papertrail and New Relic, teams running open-source alternatives take on the engineering cost of setting up, configuring, and maintaining their own observability stack. Prometheus, Loki, and Grafana are available as open-source tools, but setting them up correctly, integrating them with application infrastructure, and maintaining them over time requires engineering investment. LocalOps includes this observability stack pre-configured as part of the platform; the setup work is done, the maintenance is handled, and the cost is zero beyond the platform fee.

Key Takeaways

The real cost of Heroku at scale has three components, and infrastructure reviews only examine one of them.

The invoice cost is real, and compounds with every service added. The engineering opportunity cost is rarely measured but consistently significant for teams running more than five services. The compliance revenue cost is the largest component for any B2B SaaS team with enterprise ambitions, and the one that makes the migration decision strategically obvious rather than operationally optional.

The observability cost is a specific case study in how Heroku’s add-on model creates financial and operational overhead that integrated platforms eliminate. Hundreds of dollars per month in add-on fees, plus the operational cost of correlating incidents across disconnected tools, are replaced by a pre-configured observability stack at no additional cost.

For CTOs preparing the business case for infrastructure migration, the frame that generates board-level alignment is not “we should save money on infrastructure.” It is “we are currently paying a tax on every enterprise deal we close, and the migration eliminates that tax while also reducing infrastructure costs and recovering engineering capacity.”

That is the real cost of Heroku at scale. And that is the case for moving.

Schedule a Migration Call → Our engineers model your current Heroku costs against LocalOps + AWS and walk through the migration for your specific stack.

Get Started for Free → First production environment on AWS in under 30 minutes. No credit card required.

Read the Heroku Migration Guide → Full technical walkthrough, database migration, environment setup, DNS cutover.