How to Provision a Production-Grade AWS Environment Without Writing Terraform
What a production-grade AWS environment actually requires, and why an internal developer platform is a better alternative to owning Terraform at scale.
If you are running a 15 to 50-person engineering team, there is a good chance someone on your team is maintaining Terraform instead of building product. Not because they want to. Because someone has to.
Spinning up a production-grade environment on AWS is not a weekend project. The networking alone — VPC, subnets, NAT gateway, security groups — takes time to get right. Add EKS, IAM roles, secrets management, observability, and per-environment configuration, and you are looking at several days to a week of infrastructure work before the first feature deploys. Each new environment after that adds incremental overhead and increases the risk of configuration drift between staging and production.
At some point, the question stops being “how do we configure this correctly” and starts being “why is this our problem to own.”
Internal developer platforms exist to answer that question. This post covers what a production-grade AWS environment actually requires, what it costs to maintain that setup in-house across environments, and how an IDP changes the tradeoffs for engineering teams that would rather be building.
TL;DR
A production-grade AWS environment needs more than most teams plan for: a full networking stack (VPC, private and public subnets, NAT gateway), a managed Kubernetes cluster (EKS), compute nodes, a load balancer, IAM, secrets management, encryption, observability, and backup configuration.
Setting up this stack in Terraform typically takes several days to a week depending on complexity and team experience. The second and third environments are where state management overhead and configuration drift start compounding.
Every environment your team cannot spin up quickly is a bottleneck: slower staging, slower customer onboarding, slower incident isolation.
Internal developer platforms handle the provisioning layer so application developers do not have to. Developers get self-serve environments, GitHub-push deployments, and AWS resource access without managing infrastructure code directly.
What a Production-Grade AWS Environment Actually Needs
Most teams start with a rough mental model: a cluster, a database, some networking. The actual list of what constitutes a production-grade AWS environment is longer, and the gaps between what teams think they need and what they actually need tend to surface at the worst possible time.
Here is what a complete setup requires.
Networking
A dedicated VPC with a defined CIDR block. Private subnets for application workloads, with no public IP on application nodes. Public subnets for ingress components like load balancers. A NAT gateway with an Elastic IP in the public subnet so private subnet nodes can make outbound calls for pulling container images and reaching external APIs. An Internet gateway attached to the VPC. For private clusters, VPC endpoints for services like STS, ECR, and S3 are often required for integrations to work reliably.
Compute and orchestration
A managed Kubernetes cluster (EKS), where AWS runs the control plane (API server, etcd, scheduler) across multiple availability zones. Worker nodes run as EC2 instances in private subnets, attached to the cluster as a node group, with EBS volumes. Inbound traffic reaches workloads through an ALB or NLB provisioned via Kubernetes ingress or a Service of type LoadBalancer, not at the cluster level directly.
IAM
Node instance profiles for baseline EC2 permissions. IRSA (IAM Roles for Service Accounts) for pod-level AWS access, using OIDC-based access control so individual pods get scoped permissions rather than sharing the node role. Least-privilege policies per resource. Getting this wrong is how production incidents happen.
Secrets management
AWS Secrets Manager or SSM Parameter Store for credentials and environment-specific config. AWS’s own guidance for EKS uses IRSA and OIDC for Secrets Manager access from pods, so this ties directly to the IAM setup above.
Security
Encryption at rest on EBS volumes and managed data stores. Security groups in AWS are stateful allow rules, so the practical goal is tightly scoped ingress, minimal egress, and private-only access for databases and caches. Cluster endpoint access control to limit who can reach the Kubernetes API.
Observability
Prometheus for cluster and node metrics, Loki or CloudWatch for log aggregation, Grafana as the unified interface. Without this, debugging a live incident means guessing. AWS supports Prometheus-based metrics collection through Container Insights, but it still requires setup.
Backup and recovery
Automated backups for managed data stores with defined retention. AWS Backup can cover EKS cluster state and persistent storage, but a backup without a tested restore procedure is not a recovery plan.
Autoscaling and node lifecycle
Cluster autoscaler or Karpenter for node scaling. A defined strategy for node upgrades, because EKS minor version support windows are finite and upgrades require planning.
The sum of this is not exotic. It is the baseline. But it is also a significant number of infrastructure concerns that need to be correctly configured and kept consistent across every environment you run: production, staging, preview, and any customer-dedicated environments.
That gap between what teams plan for and what production actually requires is where most of the infrastructure overhead lives.
See what’s inside a LocalOps environment
The Real Cost of Owning Terraform at a Scaling Startup
The Terraform problem at a scaling startup is not that Terraform is hard. It is that owning it has a compounding cost that most teams only feel after they are already committed to it.
The First Environment Is Manageable
Community modules like terraform-aws-modules/vpc and terraform-aws-modules/eks handle a lot of the scaffolding. A senior engineer who knows what they are doing can get a well-configured first environment running in several days to a week. The work is real but it is bounded.
The Second Environment Is Where It Gets Complicated
Each environment needs its own state file. Teams typically end up with either Terraform workspaces, which have state isolation problems, or a directory structure like environments/prod and environments/staging where configuration is duplicated and diverges over time. Staging drifts from production. Nobody fully documents what changed or why. The engineer who built the original setup has usually moved on to other work by the time environment three is needed.
State Management Fails Quietly
Two engineers running terraform apply concurrently without remote state locking (S3 + DynamoDB) can corrupt state. Recovering from corrupted state means manual terraform state mv and terraform import operations. It is recoverable, but it takes time and focus that should be going elsewhere.
EKS Add-On Management Is a Separate Problem
Managed add-ons like CoreDNS, kube-proxy, VPC CNI, and the EBS CSI driver have independent version lifecycles from the EKS module. They fall behind cluster version requirements in ways that cause node registration failures or pod networking issues that are hard to diagnose if you have not seen them before.
Observability Is Almost Always Deferred
Terraform provisions the cluster. It does not install or configure Prometheus, Loki, or Grafana. That requires a separate Helm chart workflow. Most teams put it on the backlog and add it later under pressure, usually after a live incident makes it unavoidable.
The Actual Cost Is Not the Setup Time
It is the ongoing maintenance tax: keeping environments consistent, managing module version upgrades, handling state issues, and making sure the one person who understands the full setup is not a single point of failure. At a 20 to 40-person company, that tax comes directly out of product velocity.
A team with a dedicated platform engineer can run this well. Most scaling startups do not have that luxury.
For teams that go further, building an internal developer platform in-house adds another layer of cost on top of infrastructure: hiring for it, maintaining it, and keeping it current.
That investment is hard to justify when the underlying goal is simply to ship product reliably on AWS.
What Is an Internal Developer Platform and How Does It Change the Provisioning Model?
An internal developer platform is a layer that sits between application developers and cloud infrastructure. It owns the provisioning model, enforces production defaults, and gives developers a self-service interface that does not require infrastructure knowledge to use.
The term gets used loosely, so it is worth being precise about what an IDP actually does versus what it does not do.
What an IDP Is Not
The internal developer portal vs platform distinction is a common source of confusion. A portal gives teams a UI to browse services, view documentation, and track ownership. Backstage is a good example: it is a developer portal framework, not a platform. It does not provision infrastructure. Building a full IDP on top of Backstage means writing and maintaining plugins for environment provisioning, CI/CD, cloud resource management, and observability. That is a significant internal build investment, not a solved problem.
An IDP is also not a PaaS wrapper that puts a UI on top of managed services and abstracts away the cloud account. The distinction matters for engineering leaders: a real IDP provisions into your own cloud account, so you retain ownership of the infrastructure, the data, and the billing relationship with AWS.
What an IDP Actually Does
It owns a set of production-hardened provisioning templates and creates environments idempotently from those templates. Any engineer on the team can spin up a new environment without understanding VPC design, EKS configuration, or IAM. The platform handles what those things map to in AWS.
This is where platform engineering and internal developer platforms intersect. The goal is not to eliminate infrastructure complexity but to centralise ownership of it.
The critical distinction is between shallow and deep abstraction. A shallow abstraction wraps Terraform in a UI and still requires someone who understands the underlying resource model to maintain it. A deep abstraction owns the resource model entirely, enforces production defaults, and exposes only what developers need to configure.
What a Production-Grade IDP Provisions on AWS
When connected to AWS, Internal developer platform provisions a complete, exclusive infrastructure stack: a dedicated VPC, private and public subnets, a NAT gateway, an Internet gateway, a managed Kubernetes cluster, compute node groups with attached storage, a load balancer for inbound traffic, and a pre-configured observability suite. Each environment gets its own isolated stack. There is no shared multi-tenant infrastructure between environments.
This gives you network, data, and compute isolation by default, which matters for teams running staging environments alongside production, or operating customer-dedicated deployments.
The provisioning happens without Terraform files to write, without state to manage, and without IAM roles to configure manually. Your team connects an AWS account, creates an environment, and the platform handles the provisioning from there.
How IDPs Give Developers Self-Serve Access to AWS Without Exposing Infrastructure Complexity
Once an environment exists, application teams still need to declare what cloud resources their services depend on: databases, caches, queues, object storage. This is where most platforms fall short. They handle the networking layer but leave AWS service dependencies to the developer to figure out.
A well-built IDP solves this through a declarative service configuration file committed to the application repository. Developers declare what their service needs. The platform provisions it within predefined guardrails, wires the permissions, and injects the connection details at runtime. Without requiring routine AWS console access. Without needing to manually write IAM policies in most cases. Without credentials to rotate manually.
Here is what that looks like in practice.
Cloud Resource Dependencies
A developer declares an S3 bucket, an RDS instance, an ElastiCache cluster, an SNS topic, or an SQS queue directly in the service config file. The platform provisions these resources inside the environment’s VPC, configures encryption at rest, sets private-only network access, enables monitoring, and turns on automated backups. The IAM policies required to access each resource are generated automatically and attached to the pre-configured role the service containers run with.
This replaces a non-trivial amount of manual work. Wiring IAM for S3 access from EKS pods manually involves setting up an OIDC identity provider, creating an IAM role with a trust policy, annotating the Kubernetes service account, and mounting it correctly in the pod spec. The declarative config approach reduces this to a few lines that any developer can write without knowing what is happening underneath.
See how LocalOps provisions cloud resources
Environment Variable Injection
Resource connection details, ARNs, endpoints, hostnames, and ports are injected automatically as environment variables into the service containers. Application code reads from environment variables at runtime. There are no credentials stored in code, no secrets to manually sync across environments, and no risk of a developer hardcoding a production database URL in a staging config file.
Database Migrations and Init Jobs
Schema migrations run as init containers before the main service starts, in the correct order, with the option to run once across all pods rather than once per pod. This behavior is not straightforward to implement correctly in Kubernetes without additional logic. The developer specifies the command. The platform handles the execution model.
Health Checks
HTTP, TCP, gRPC, or shell-based health checks with configurable failure thresholds and automatic container restarts. This maps to Kubernetes liveness and readiness probes under the hood, but the developer does not need to know that or write a Helm chart to configure it.
Preview Environment Dependencies
For pull request preview environments, the service config supports a separate block for ephemeral dependencies: Postgres, MySQL, Redis, Memcache, or RabbitMQ instances that spin up per PR and are torn down when the PR closes. These run as lightweight in-cluster containers rather than managed AWS services, suited for faster, lower-cost testing rather than production parity. Production resources are explicitly excluded from preview environments via config flags, so there is no risk of a preview service accidentally pointing at production data.
The net result for engineering leaders is that developers can configure, deploy, and connect AWS-backed services without infrastructure knowledge, without routine AWS console access, and without waiting on a platform team to provision resources for them. The self-service model works because the platform owns the complexity that would otherwise require infrastructure expertise.
How Do You Let Developers Deploy to AWS With Just a GitHub Push?
The deployment model is where an IDP either earns its place or exposes its limits. Self-serve infrastructure means nothing if shipping code still requires a platform engineer to be in the loop.
A well-built IDP connects directly to the application repository. A developer pushes to a configured branch. The platform pulls the latest commit, builds the container image, and deploys it to the Kubernetes cluster automatically, without requiring developers to write or maintain Dockerfiles, Helm charts, or CI/CD pipelines directly.
For most developers, the primary interaction with the deployment system becomes a git push.
How the Deployment Pipeline Actually Works
The platform builds the container image from the application source, pushes it to a registry, and schedules the new version on the cluster. Health checks run against the new containers before traffic is gradually shifted. If they fail, the deployment does not proceed and the previous version continues running.
This is not a novel concept. It is how deployment should work for application teams. The reason it is worth stating explicitly is that most teams building on raw AWS end up owning a significant amount of CI/CD configuration to get here: GitHub Actions workflows, ECR push credentials, EKS deploy steps, rollback logic. Each of those is a maintenance surface.
Branch-Based Environment Mapping
Each service in an environment typically maps to a specific GitHub branch. Push to the staging branch and the staging environment updates. Push to the production branch and production updates. Different team members can work on separate branches mapped to separate environments without stepping on each other.
For pull requests, a full-stack preview environment can be provisioned automatically per PR, with its own ephemeral dependencies, and torn down on merge. Developers get an isolated environment for every PR without any manual setup or infrastructure request.
For Teams Migrating From PaaS
Teams moving from Heroku, Render, Vercel, or Fly.io are often already used to push-to-deploy workflows. The gap they hit when moving to raw AWS is that the deployment simplicity disappears and gets replaced with infrastructure work. An IDP preserves that deployment experience while moving the actual infrastructure to a dedicated AWS account that the team owns and controls.
Moving to an IDP significantly reduces the amount of re-platforming work by preserving the push-to-deploy experience while shifting infrastructure complexity away from developers and onto the platform.
What Happens When You Need Custom Infrastructure Beyond the Platform?
No platform covers every infrastructure requirement a growing engineering team will have. At some point, a service needs a resource the platform does not natively support: a DynamoDB table, an MSK cluster, a custom VPC endpoint, a third-party data pipeline. This is the lock-in question every engineering leader should ask before committing to any IDP.
The answer depends entirely on whether the platform gives you the primitives to extend it.
Extending the Environment With Custom IaC
A well-built IDP exposes the underlying infrastructure identifiers for every environment it provisions: the VPC ID, subnet IDs, and resource tags. With those, a team can write Terraform or Pulumi scripts that provision additional resources within the same VPC and private subnets, privately accessible from application containers, without disturbing anything the platform manages.
The custom resources sit cleanly alongside the platform-managed infrastructure. They share the same network boundary. Application services can reach them without any additional networking configuration.
The lifecycle of those custom resources is the team’s responsibility. The platform does not import or manage them. If the team deletes the environment, they need to tear down their custom Terraform stack first, before the platform removes the VPC and underlying networking. That is a reasonable tradeoff, not a gotcha.
Taking Full Ownership of the Infrastructure
Beyond extending environments, a trustworthy IDP documents a full eject path: the ability to take complete ownership of the provisioned AWS infrastructure and move it outside the platform entirely.
This matters for two reasons. First, it tells you something about how the platform is designed. A platform confident enough to document a clean exit is one that expects to earn continued use rather than trap it. Second, it de-risks the adoption decision. Committing to an IDP is easier when the downside scenario is controlled rather than open-ended.
How Often This Actually Comes Up
For most engineering teams, the custom infrastructure question comes up less often than anticipated. The declarative service config model handles the majority of AWS resource dependencies that application services need. The cases that genuinely require custom IaC tend to be specific integrations or compliance requirements that are well-defined enough to manage separately.
The right question to ask is not whether the platform handles everything. It is whether the escape hatch is clean enough to use when you need it
Internal Developer Platform Architecture
An IDP that handles AWS environment provisioning operates across three layers. Knowing where each layer sits tells you what you are delegating to the platform and what you are keeping.
The Infrastructure Layer
VPC, subnets, NAT gateway, EKS cluster, EC2 nodes, EBS volumes, load balancer, observability stack. The platform provisions these into your AWS account using internal templates. Your account gets billed. You own the resources. The platform manages their configuration and lifecycle.
This is not hosting. The infrastructure runs in your account, not the vendor’s.
The Platform Layer
Kubernetes orchestration, container scheduling, service deployment, health check enforcement, auto-scaling, auto-healing. This layer runs on top of the infrastructure. Developers do not touch it. The platform operates it.
This is also the layer that costs the most to build and maintain in-house. Teams that roll their own IDP on top of Backstage or raw Kubernetes end up owning this in full, which means hiring for it, maintaining it, and being on call for it.
The Developer Layer
A service configuration file in the application repo, a GitHub integration, and a dashboard. This is the only surface developers interact with. They declare what their service needs, push code, and check deployment status. Everything below is handled by the platform.
For the Build vs Buy Decision
Building in-house means owning all three layers. Using a platform means owning the developer layer, retaining the infrastructure layer through your AWS account, and handing the platform layer to the vendor.
The platform layer is the one most teams do not want to own. It requires Kubernetes expertise, on-call coverage, and ongoing maintenance that has nothing to do with the product being built. That is the core of the IDP value proposition, stated plainly.
See how LocalOps handles shared responsibilities
FAQ
1. What happens to our existing Terraform setup if we move to an IDP?
You do not have to throw it away. A well-built IDP provisions and manages the core infrastructure layer: VPC, subnets, EKS cluster, observability. Any custom resources your team has already built in Terraform can continue to run alongside the platform, as long as they sit within the same VPC. The IDP exposes VPC IDs, subnet IDs, and resource tags so your existing Terraform scripts can reference them directly. The practical split is: the platform owns the environment baseline, your team owns anything custom on top of it.
2. Can we trust the platform to enforce security defaults we would otherwise configure ourselves?
This depends on how the platform is built. A production-grade AWS internal developer platform should enforce encryption at rest on all storage, private-only network access for databases and caches, least-privilege IAM policies per resource, and security group rules with tightly scoped ingress by default. These should not be optional configurations. They should be on by default for every environment the platform provisions, without requiring your team to audit or maintain them manually.
3. What is the difference between an open source internal developer platform and a commercial one?
Open source options like Backstage are portal frameworks, not full platforms. They provide a service catalog and developer UI but the provisioning and operational layers still need to be built on top. That is a significant internal build investment. A commercial internal developer platform includes the full stack out of the box: infrastructure provisioning, Kubernetes orchestration, observability, and CI/CD. The tradeoff is customisability versus the ongoing cost of building and maintaining the platform layer yourself.
4. How do you choose the best internal developer platform for your AWS environment?
A few things worth evaluating. Does it provision into your own AWS account or the vendor’s? You want to retain ownership of the infrastructure and the billing relationship. How deep is the abstraction? A platform that wraps Terraform in a UI still requires someone to understand the underlying resource model. What is the escape hatch when you need custom infrastructure? And does observability come pre-configured or is it something your team sets up separately? The best internal developer platforms answer all four of these cleanly.
5. Can Terraform and an IDP coexist in the same AWS account?
Yes, and for most teams they should. An IDP handles the environment baseline that every team needs: networking, compute, Kubernetes, observability. Terraform handles the exceptions: custom integrations, compliance-specific resources, or anything outside the platform’s native support. The key is that the IDP exposes the infrastructure primitives (VPC ID, subnet IDs) so custom Terraform resources sit inside the same network boundary and are privately accessible from application services without additional configuration.
Conclusion
Most engineering leaders do not set out to own a complex infrastructure setup. It tends to happen incrementally: one environment becomes two, staging drifts from production, the engineer who built the original setup moves on, and maintaining it becomes a background cost that nobody explicitly budgeted for.
The IDP model does not make infrastructure disappear. The VPC still exists. The EKS cluster still runs. IAM policies still govern access. What changes is who is responsible for configuring and maintaining those things, and whether that responsibility sits with the people building the product or with a platform built specifically to handle it.
The tradeoff is real. You give up some direct control over the infrastructure layer in exchange for not having to maintain it yourself. Whether that is the right call depends on your team size, your infrastructure requirements, and how much engineering capacity you have available for platform work.
For teams with a dedicated platform engineer who wants to own this, building and maintaining the stack in-house is a reasonable path. For teams without that, an IDP is worth evaluating not because it solves every infrastructure problem, but because it solves the provisioning and environment management problem well enough that the rest of the team can stay focused on the product.
If you’re figuring out how this would fit into your setup, then LocalOps team can help you work through it:
Book a Demo -- Walk through how environments, deployments, and AWS infrastructure are handled in practice for your setup.
Get started for free -- Connect an AWS account and stand up an environment to see how it fits into your existing workflow.
Explore the Docs -- A detailed breakdown of how LocalOps works end-to-end, including architecture, environment setup, security defaults, and where engineering decisions still sit.



