It feels daunting, does it not? Infrastructure-as-code promises to provide consistent, reliable, repeatable, and automated infrastructure provisioning and management.
If you look at some of these tools out there, it can be quite a lot to take in. Can we get started with a reasonable effort?
In this list, there is a mix of tools. It includes both versatile and more simplistic tools, tools with lower-level details, as well as higher-level abstractions. Find out more about these tools, and if they may be a good fit for you!
My focus here is on AWS cloud infrastructure. That means some tools are only applicable to AWS, while other work across multiple cloud providers. In practice, you are likely not having everything in AWS.
The size and structure of your organization will likely play a role as well.
With that, let us go to the list of the seven infrastructure-as-code tools you can implement infrastructure-as-code.
The grand old fellow, Cloudformation
CloudFormation is the oldest of the services and tools in this list. It was launched in 2010 and has thus been around for more than 11 years. In CloudFormation you describe the AWS resources you want to provision through declarative configuration statements, in either JSON or YAML format. YAML format tends to be more common these days since it is generally easier to read for humans.
You group these configuration statements of AWS resources into collections called templates.
CloudFormation’s resource descriptions are somewhat low-level, they map pretty much 1-to-1 to underlying service APIs that AWS provides to manage resources. Thus it will generally require a fair amount of understanding of how various AWS resources fit together.
CloudFormation only manages resources created by CloudFormation in general. It is possible to import resources created by other means, such as the AWS Console, but it is not very convenient.
When CloudFormation provisions resources, it works with a unit of deployment called a stack. CloudFormation tracks the state of all resources in a stack, so if you do a change and ask CloudFormation to apply the change, only the resources that need to change will be changed. You can apply changes directly, or via Change Sets, for which you can review the expected changes before they are applied.
You provision a CloudFormation template as a stack in a specific AWS account and AWS region combination only. To automatically apply a template to multiple accounts and regions, you use a StackSet. This feature automatically creates stacks in all specified AWS account+region combinations.
CloudFormation only handles AWS resources. You can handle non-AWS resources in CloudFormation, but you have to build these resources handlers yourselves and create custom resources.
CloudFormation is free, it does not cost anything to perform provisioning and maintain a state of provisioned resources. You pay for the underlying resources you provision via CloudFormation.
Until recently, stacks and templates were the only means of modularising and re-using groups of CloudFormation resources. Nowadays also has CloudFormation Modules and CloudFormation Registry, no doubt inspired by another significant contender in this space, Terraform. It has not reached widespread adoption.
While most services that AWS provides have CloudFormation support, there are exceptions. Some services are not supported fully via CloudFormation, and some services are not supported via CloudFormation in all regions that the service is available.
Should I use CloudFormation, and why?
Cloudformation is a valid choice if most or all of your cloud infrastructure resources are in AWS. If only a minor part of your infrastructure is in AWS, then other options may be better.
In general, most solution patterns and examples of infrastructure from AWS will be provided as CloudFormation templates. AWS has examples for other tools as well, but not as many at this point.
As CloudFormation is somewhat low-level, templates can become verbose and may need a thorough understanding of the underlying services and resources to manage. At a micro-level, it is generally clear what CloudFormation provisions. There is no complex logic in the descriptions.
Do your organization has dedicated operations roles for handling infrastructure? Do they not do daily programming tasks? If so, CloudFormation may be a good choice in that case. Cloudformation work should be at the very least weekly, preferably more often.
The default tooling for CloudFormation from AWS is weak, although there are additional tools from AWS on Github, and there are also 3rd party tools.
Since CloudFormation is so common with AWS, it is good to know how to read CloudFormation syntax, even if you end up not using CloudFormation to provision your resources.
The noble jack-of-all-trades, Terraform
Terraform is the second-oldest tool in this list and one that competes with CloudFormation in terms of popularity. CloudFormation inspired the creation of Terraform but was also set to address some shortcomings of CloudFormation. The company behind Terraform is Hashicorp, which is well known for many useful (open-source) tools.
Terraform can provision resources not only in AWS but other providers as well. It is not restricted to cloud infrastructure providers only either. Instead, you can provision resources from many providers.
You use declarative configuration descriptions with Terraform, using HCL (Hashicorp Configuration Language), although it is technically possible to represent resources as JSON as well. The intention is that HCL is the primary representation.
The representation of resources is somewhat low-level, with a 1-to-1 mapping of the service APIs of the provider.
From the early stages, Terraform has had a focus on the workflow to provide useful tooling. You have a planning stage to see the impact of any changes. Then you have a separate applying any defined changes. Modules are a way to package complex infrastructure into more manageable units, and there are public registries for fetching and publishing modules, providers, etc. for re-use.
Resources managed by Terraform would typically be created by Terraform also. You can import resources created by other means. It may be a bit cumbersome.
The state of resources maintained by Terraform needs to be kept somewhere, and Terraform provides multiple options, including local file systems, storage services such as S3, or Hashicorp-managed service like Terraform Cloud.
Terraform itself is free to use. Hashicorp offers services in Terraform Cloud for teams and organizations. These services will cost some money.
Support for AWS services and resources is good, generally on par, or sometimes even better than CloudFormation.
There are many tools and examples for Terraform when it comes to AWS usage. However, AWS themselves would not be the primary source.
Should I use Terraform, and why?
If you are using many infrastructure providers, then Terraform is a good choice. Even if you focused on AWS, Terraform may still be a valid choice, as you are not likely using AWS for everything. You can use the same type of workflow for all the providers, in that case.
Examples in Terraform from AWS themselves are not that prevalent, even if they do exist. Instead, you can find this from other sources, and there is a huge body of knowledge around Terraform.
Terraform only recently released its 1.0 version, but it is very mature and has been in production use several companies for years.
The default Terraform tooling from Hashicorp is a good foundation, and other 3rd party tools provide additional improvements on top of that.
As Terraform is somewhat low-level, it can become verbose and may need a thorough understanding of the underlying services and resources to manage, although modules can ease that burden a bit. The HCL language is limited in what kind of logic it can apply, which can make descriptions reasonably clear to read at a micro-level. It is likely a new language to learn though and not used in many other scenarios.
Do your organization has dedicated operations roles for handling infrastructure? Do they not do daily programming tasks? If so, Terraform may be a good choice in that case. Terraform work should be at the very least weekly, preferably more often. This is especially true if you have a multitude of infrastructure providers besides AWS.
A starting point for Terraform is available at Hashicorp Learn.
The casual one, AWS AppRunner
If your use cases for cloud infrastructure are restricted to publicly available web applications and APIs, then AWS AppRunner may be a suitable choice.
This relatively new service from AWS is quite simple in terms of infrastructure management but supports only some specific use cases. Your solution should be publicly exposed on the internet, either be Node.js or Python-based or built to run in Docker containers. There should not be any complex infrastructure beyond that. Neither should there be complex environment staging processing, and source data should be managed via a version control hosting platform, such as Github.
AppRunner does not manage storage resources either. These resources you have to provide outside of AppRunner.
AppRunner has a limited scope. Within that scope, it does deployment, scaling, and logging mechanics and management.
Should I use AppRunner, and why?
AppRunner is not an enterprise-wide solution, but it can be well suited for specific application-focused use cases. For teams building web applications/APIs and which do not want or need to care much about infrastructure, AppRunner may be suitable. That is, as long as storage and other potential infrastructure outside this are managed elsewhere.
I think AppRunner works as a stepping stone. It can be the first step towards a more fully managed and automated solution. It is easy to get started with AppRunner. If your use case fits, you have an easy starting point.
One way to get started is to look at the AppRunner workshop, provided by AWS.
If you grow ut of AppRunner, the next step may potentially be the next item on the list…
The helpful hand - AWS Copilot
Similar to AWS AppRunner, AWS Copilot handles web application and microservice API use cases. AWS Copilot handles more use cases, and a bit more complex infrastructure solutions and staging scenarios.
AWS Copilot is a command-line tool that provides a simple interface to manages solutions that run containers in ECS Fargate. It handles the same use cases as AppRunner (in fact, it can deploy using AppRunner under the hood), but also more options to run both public and private services, set up CI/CD pipelines across multiple environments, and interact/provision with other AWS resources (via CloudFormation).
It does not cover all sorts of container-based scenarios, but it covers many pretty common ones. Management and configuration are done via the command-line or some configuration files for the most part. A lot of foundational infrastructure setup is done for you, and you can choose to run it manually via the command line or automate processing via CI/CD pipelines.
If you started with AppRunner, AWS Copilot would be the next natural step in the evolution of your cloud infrastructure management. If you are starting with running container-based workloads in AWS, you should have a look at AWS Copilot before considering more advanced or complex solutions. It may hit a sweet spot in terms of balancing complexity and versatility.
Under the hood, AWs Copilot uses and generates CloudFormation for you, but you do not need to care that much, as long as you do not need to add infrastructure outside your use cases.
Should I use AWS Copilot, and why?
If you want to keep the infrastructure parts of container-based solutions simple, then AWS Copilot may be for you. You do not need complex infrastructure solutions, and you want to focus resources elsewhere. Plus, AppRunner may be a bit too simplistic for your use cases, and not quite fit.
You may also have a bit more complex infrastructure, but want to provide developers with a relatively simple tool to manage part of that infrastructure that they can control themselves.
The tool is focused on running containers in ECS, using Fargate. So this needs to fit your use cases to be applicable, and probably a majority of your use cases (80-90%) should fit. If only a smaller part of your use cases fit, then other tools may be more appropriate.
AWS has an ECS Workshop, where one of the tracks to run through the labs is via the AWS Copilot CLI. (there is also a track for AWS CDK, see later item on the list) There is also a separate documentation site for AWS Copilot.
The Platypus with superpowers - Pulumi
Next in our list of infrastructure-as-code tools is Pulumi, the superpowered platypus. Pulumi is one of a relatively new group of tools, where you combine the power of the declarative state of your infrastructure with imperative/functional/object-oriented capabilities of regular programming languages.
Pulumi launched in 2018. It is an open-source toolset that covers a multitude of providers, cloud infrastructure, and others as well. Pulumi has a bridge feature that lets it use Terraform providers. A key difference here though is that it uses regular programming languages to describe the infrastructure, not a (custom) configuration language/format.
In Pulumi, infrastructure is organized in projects. A project in turn contains a program, which is the description of the infrastructure. A project may also consist of one or more stacks, which are the deployable units of the program.
A program in itself works with resources, which describe the infrastructure. These can be the provider’s resources, as well as higher-level resources composed of multiple lower-level resources. Inputs and outputs between these resources describe the dependencies involved.
Pulumi supports Typescript/Javascript, Python, .NET (C#, F#), and Go as languages. Most of the core Pulumi features are supported in all of these languages, while some specific pieces are only supported in some of the languages. The latter includes Crosswalk for AWS, which includes some higher-level components for AWS deployments. There is also CrossGuard, a tool to define guardrails/policies as code that can be used to check your infrastructure definitions that they comply with your defined policies.
You can build reusable components in any supported language. You can use these components from any of the supported languages. This allows for freedom for different teams and groups to use their language of choice.
A drawback with using programming languages is that there is a risk for less understandable infrastructure descriptions, due to the expressive power that comes with a full-fledged programming language. At the same time, it is also possible to make infrastructure more readable and understandable, due to the same expressive power.
Pulumi supports multiple types of backends to manage the state of the infrastructure, similar to Terraform. You can use a local state, a service backend (e.g. Amazon S3), or Pulimi’s own Pulumi Cloud service. Team-based usage of Pulumi Cloud will incur a cost, which depends on the number of resources managed.
It is also possible to import existing resources into Pulumi state and programs, which is a procedure that seems to be a bit more convenient than the corresponding options in CloudFormation or Terraform. Also, Pulumi can co-exist with CloudFormation and Terraform managed infrastructure, by accessing CloudFormation exports or retrieving data from Terraform state. This makes sense to play well with them, since Pulumi is kind of the new kid on the block, and will likely co-exist with the oldies like Terraform and CloudFormation.
Pulumi provides APIs for the deployment itself of Pulumi-described infrastructure, which allows for the creation of custom services that perform infrastructure provisioning, like SaaS solutions.
There are also integrations with some CI/CD tools, which you can use with Pulumi.
Should I use Pulumi, and why?
Pulumi supports many cloud infrastructure and other service providers and would be a consideration if you are not all-in or mainly focusing on AWS. If you have more of a mix of infrastructure platforms and solutions, Pulumi should be a consideration.
Initially, they relied to a large extent on their Terraform bridge to build support for a lot of providers. However, they have started to provide more Pulumi-native provider packages.
In general, infrastructure-as-code tools that use programming languages can provide powerful abstractions for different groups:
- Developers who do not care about the infrastructure that much, could be given easier-to-use components at a suitable level for them.
- Operations people who care about infrastructure, but do not want to deal with programming that much, could also be given abstractions suitable to them.
- Developer-oriented persons that care about infrastructure can provide suitable abstractions for both of the groups above.
If you have an organization with these types of roles, then infrastructure-as-code tools using programming languages will be a great option. Pulumi is a nice choice in this regard. At this time, many of the higher-level components would be things you build yourself.
Pulumi has a set of tutorials for various platforms and languages. There is also an AWS Workshop with a few labs to get started with Pulumi.
The new terraformer - Cloud Development Kit for Terraform
The Cloud Development Kit for Terraform, or CDKTF for short, is Hashicorps step into the world of infrastructure-as-code using actual programming languages.
Similar to Pulumi (and AWS CDK later in the list), it supports many programming languages, including Typescript, Python, C#, Go (soon), and Java. Any module or provider supported in Terraform is available to use through CDKTF.
CDKTF has a concept of an App. The app contains the program code that describes the infrastructure. The code may create one or more stacks, which are units of deployment. Constructs represent infrastructure resources. These maps directly to the provider’s resources, or more higher-level components. A process called synthesizing executes the code to produce Terraform JSON resource descriptions, which is then used to provision resources. You can use the CDKTF-specific tooling to deploy these stacks or the regular Terraform tooling.
CDKTF relies on two open-source projects from AWS, the constructs library and jsii. The latter project is what handles the multi-language support - JSII is a toolset that translates certain code constructs into a Javascript-based format. You can use a construct built with Typescript can be used from any of the supported languages. It includes functionality to build packages for package managers of the supported languages.
This means that you can build reusable components in any language, as long as it is used by code in the same language. If you want to build a reusable component for all supported languages, you have to build it in Typescript.
Should I use CDK for Terraform, and why?
Since CDKTF is Terraform under the hood, it pretty much leverages all that Terraform provides. On top of that, it provides some features that you get from a toolset using actual programming languages.
It is still an experimental project, and I think the main reason to consider CDKTF at this point is if you are already invested in Terraform and its tooling, but you want to try out what infrastructure-as-code with programming languages can be like.
Hashicorp has some introductory material for CDKTF at Hashicorp Learn.
The cooler cloud wrangler - AWS Cloud Development Kit
AWS Cloud Development Kit (AWS CDK) is the older sibling to Cloud Development Kit for Terraform and was initially presented in 2018. So it is somewhat similar in age to Pulumi.
AWS CDK shares many properties of CDKTF - it supports many programming languages, including Typescript, Python, C#, Go (soon), and Java. It also shared the concepts of an App, the program code that describes the infrastructure. The code may create one or more stacks, which are units of deployment. Constructs represent infrastructure resources. These maps directly to the provider’s resources, or more higher-level components. A process called synthesizing executes the code to produce Cloudformation JSON resource descriptions, which is then used to provision resources.
The AWS CDK-specific tooling is significantly better than what is provided for lain CloudFormation, and also provides additional assets handling in addition to the plain CloudFormation, such as deploying code and data.
AWS CDK is focused on AWS naturally and does not support other providers outside of AWS. Technically it is possible to build custom components outside of AWS.
A key benefit with AWS CDK is its many higher-level components on top of what is provided by CloudFormation. A general principle is to provide sane best practice defaults and allow customizations if needed.
Access permissions and network access have higher-level mechanisms in place. Thus you do not need to understand and know precisely which permissions or security rules to set, the AWS CDK will figure that out for you based on the guidance you have provided. If you connect different resources, it will often automatically set the permissions needed for these to communicate with each other.
AWS starts to provide more examples using AWS CDK also, and not only CloudFormation. It is a key toolset for AWS moving forward.
Should I use AWs CDK, and why?
If you are all-in on AWS, or the vast majority of your infrastructure is in AWS, then AWS CDK is absolutely a choice to consider.
The higher-level components provided out of the box with AWS CDK is a compelling reason to consider AWS CDK if you are ok with using programming languages for your infrastructure-as-code.
Also here, infrastructure-as-code tools like AWs CDK that use programming languages can provide powerful abstractions for different groups:
- Developers who do not care about the infrastructure that much could be given easier-to-use components at a suitable level for them.
- Operations people who care about infrastructure, but do not want to deal with programming that much, could also be given abstractions suitable to them.
- Developer-oriented persons that care about infrastructure can provide suitable abstractions for both of the groups above.
If you want to build reusable components for multiple programming languages, you have to build these in Typescript. If you focus on a single language of all the supported languages, you can use any language you want.
If you can and are ok with using programming languages, I would rather pick AWS CDK than Cloudformation. If you are already invested in using CloudFormation, AWS CDK would be a natural next step. In practice, it is valuable to know some Cloudformation when working with AWS CDK.
AWS has a few workshop sites for AWS CDK, including:
Some material there may be slightly out of date though, but should still work.
Final remarks
I have done a large portion of my infrastructure-as-code work with CloudFormation and AWS CDK. These are the tools here I know best.
That being said, I am a fan in general of tools that use actual programming languages, as long as there are roles in the organization that this is suitable for. The versatility, if used correctly, should be a big benefit in this space. I hope that Pulumi, CDKTF, and AWS CDK all will thrive.
At the same time, I love the simplicity that a tool like AWS Copilot can provide within a specific set of use cases. You do not always need a full-fledged solution for all types of cloud resources, sometimes simpler is better.
These newer tools stand on the shoulder of giants, CloudFormation and Terraform. They would not be there without them, I think.