Do you have a love/hate relationship with YAML or JSON for configurations?
Are your configurations getting messed up because of simple typos, or problems with white space, or confusion between strings and numbers?
Do you try to manage some templating solution on top of YAML?
Is it hard to manage complex overrides across multiple configuration files?
If any of these questions resonate with you, then this article may interest you!
Configuration complexity
In today’s configuration landscape, YAML is pretty much everywhere. It is relatively easy for humans to read and also fairly compact format. It is also easy to mess up with some idiosyncrasies and white-space/indentation dependent structure. JSON is a simpler format in its structure, but more limited and not really intended for human consumption. TOML is another format which tries to strike a balance for readability and maintainability.
Neither of them have any built-in capabilities for elaborate validation of data or feature to facilitate organising (complex) configurations. It gets messy, in particular with more complex configurations.
Can we do better? Yes, I believe we can. There is one effort in this space that I think is promising, and that is CUE (Configure Unify Execute), an open-source configuration language.
There is a lot to like about CUE, and I will explore a few bits and pieces of it in this article. It is only scratching the surface of what you can do with CUE though.
We are going to focus on a bit of data validation and simple conversions:
- Convert a YAML configuration file to CUE
- Step-by-step define schema and validation rules for that configuration file
Install CUE
The CUE website has installation instructions for how to install CUE. There are binary downloads you can download for macOS, Windows, and Linux distributions, or you can install via Homebrew on macOS and Linux.
brew install cue-lang/tap/cue
If you have the Go language installed, you can also download and build CUE directly easily, since CUE is written in Go.
You do not have to install CUE now if you do not want to follow the rest of the article, but if you want to play around yourself, you should do that.
You can also have a look at the CUE Playground. This is only for writing CUE and possibly converting it to JSON or YAML, so that is different use case than what we are covering in this article.
Read and convert YAML configuration
We are going to start with a relatively small configuration file in YAML. This configuration describes a network configuration (VPC - Virtual Private Cloud) in Amazon Web Services (AWS). It is not important exactly what it is though; we are only using this for illustration. Our YAML configuration file has the name network_config.yaml and looks like this:
account: "123456789012"
region: eu-north-1
network:
vpc_cidr: 10.100.0.0/16
az_count: 2
nat_gateway_count: 1
public_mask: 25
private_subnets:
- name: private1
mask: 21
isolated_subnets:
- name: isolated1
mask: 21
AWS has then notion of organising cloud resources into accounts and regions, so this is something that is relevant for provisioning cloud resources. The rest of the structure has a mix of networking related information, essentially telling us how a network configuration should be set up.
The configuration has a mix of single values, some in a hierarchy, some in lists/arrays. There is a mix of strings and numbers. There are restrictions on what may be valid values for several fields. All of that is invisible to us in the YAML file, though.
Let us start by converting this YAML into CUE. We can do this with the cue import command:
cue import network_config.yaml
This will produce a network_config.cue file, with the same content, which looks like this:
account: "123456789012"
region: "eu-north-1"
network: {
vpc_cidr: "10.100.0.0/16"
az_count: 2
nat_gateway_count: 1
public_mask: 25
private_subnets: [{
name: "private1"
mask: 21
}]
isolated_subnets: [{
name: "isolated1"
mask: 21
}]
}
As you can see, it looks fairly similar. CUE is a superset of JSON, any valid JSON is also valid CUE. However, there is a lot more to CUE, and it has a more readable syntax than plain JSON and does not have white-space/indentation formatting of YAML. It does not require commas between key-value pairs either.
One thing to note here also is that string values are quoted, always - like JSON and unlike YAML.
We will use this generated as a starting point to define data validation for our YAML configuration.
Define data validation
The CUE language has an unusual property which sets it apart from many other languages and formats - types and values are the same. Not the same as only same type of structure, like JSON and JSON Schema, but actually the same.
Our CUE file is just as much a schema definition as it is actual data values. The CUE command-line tool has the command cue vet to perform validation. You can provide both CUE and non-CUE files (e.g. JSON or YAML) to validate. So if we want to validate our YAML configuration towards the CUE data, we can run the command.
❯ cue vet network_config.cue network_config.yaml
There is no output, which means it validated ok. You might think, how do I really know it has performed some validation here? Let us just change a value in the CUE file, for example, the “eu-north-1” value to “eu-west-1”, then run the cue vet command again.
❯ cue vet network_config.cue network_config.yaml
region: conflicting values "eu-west-1" and "eu-north-1":
./network_config.cue:2:10
./network_config.yaml:2:10
As you can see, it now reports that there is a conflict, and points out where the data is conflicting in these files. So the validation works.
It is meaningless, though, to only have a hard-coded configuration to validate against. So let us generalise this a bit!
- For the region parameter, let us decide that we only allow the values “eu-north-1” and “eu-west-1”
- For other values, let us define general string and integer types.
The updated CUE file looks like below then. We have not added string and int types, and we have with the “|” character told CUE that either “eu-west-1” or “eu-north-1” are valid values, but nothing else.
account: string
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: string
az_count: int
nat_gateway_count: int
public_mask: int
private_subnets: [{
name: string
mask: int
}]
isolated_subnets: [{
name: string
mask: int
}]
}
If we run cue vet on this CUE file and the YAML file, we can see that it is still valid. In CUE, we can mix actual values with type definitions, that is all the same.
Reusable definitions and lists
We can see in the structure that we have two lists/array structures under private_subnets and isolated_subnets. The data structures in each of these are the same for both as well. Right now, if we would add another element in either of these lists in the YAML configuration, the validation would fail. So how can we make a reusable definition for the structure in the lists and also allow an arbitrary number of them in each list?
In CUE, a reusable description of a structure is called a Definition, and it looks like any other structure, only that the name itself is prefixed with “#” (or “_#”, but that is out of scope for now). We can in the list use a spread operator “…” to define a variable number of elements in the list. Our new CUE file looks like this now:
account: string
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: string
az_count: int
nat_gateway_count: int
public_mask: int
private_subnets: [...#Subnet]
isolated_subnets: [...#Subnet]
}
#Subnet: {
name: string
mask: int
}
You can run cue vet to check that it is still valid. In this case, we put the #Subnet definition after the rest, we just as well put it first as well. The order does not matter to CUE. This property becomes important when we organise our data in different ways.
Adding constraints
Next step is to add additional constraints on some fields. The integer fields we use cannot have any integer value. We have a few constraints we would like to impose on these fields:
- The az_count fields refers to the number of Availability Zones that our AWS network setup should include. In most AWS regions, this would be a number between 1 and 3.
- The nat_gateway_count field refers to the number of NAT Gateways provisioned. This is an infrastructure component that may be the same value as the number of availability zones, but could also be fewer. It should not be larger than the number of availability zones.
- The mask and public_mask fields can only have integer values between 16 and 28. This is a hard limitation set by AWS.
Besides types and values, we can also add constraints to our fields. The above rules can be described in the example below:
account: string
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: string
az_count: int & >=1 & <=3
nat_gateway_count: int & >=1 & <=az_count
public_mask: #Netmask
private_subnets: [...#Subnet]
isolated_subnets: [...#Subnet]
}
#Netmask: int & >=16 & <=28
#Subnet: {
name: string
mask: #Netmask
}
We added another reusable definition for the mask value, since that is used in two different fields. The constraints can be combined with the type data, or actual values. Multiple types/constraints/values for a field can be combined with the “&” symbol.
The syntax is concise, and also easy to read.
Now let us look at the strings. There are two fields, account and vpc_cidr, which we should impose some constraints on. An AWS account is a string value that always comprises 12 digits. The VPC CIDR value is a combination of an IP address, a slash, and a net mask value. One way to approach the constraints here is to use regular expressions to define what is allowed. If you are not familiar with regular expression syntax, this will be a bit like keyboard noise to you. Fear not, you are not alone! It is useful to define restrictions on string patterns, and we use that to define the constraints on account and vpc_cidr.
account: string & =~"^[0-9]{12}$"
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: #CIDR
az_count: int & >=1 & <=3
nat_gateway_count: int & >=1 & <=az_count
public_mask: #Netmask
private_subnets: [...#Subnet]
isolated_subnets: [...#Subnet]
}
#CIDR: string & =~"^10\\.(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){2}(0|16|32|64|128)(\\/(2[0-8]|1[6-9]))$"
#Netmask: int & >=16 & <=28
#Subnet: {
name: string
mask: #Netmask
}
You can run cue vet and see that it still validates. Try also to change any of the values that we have put constraints on to see what the result will be if we violate the constraints and run cue vet.
Let us do one more modification here, set a default value. For the vpc_cidr field, if no value is specified, with will use the string “10.0.0.0/16” as the default value. We use an asterisk to denote a default value:
account: string & =~"^[0-9]{12}$"
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: *"10.0.0.0/16" | #CIDR
az_count: int & >=1 & <=3
nat_gateway_count: int & >=1 & <=az_count
public_mask: #Netmask
private_subnets: [...#Subnet]
isolated_subnets: [...#Subnet]
}
#CIDR: string & =~"^10\\.(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){2}(0|16|32|64|128)(\\/(2[0-8]|1[6-9]))$"
#Netmask: int & >=16 & <=28
#Subnet: {
name: string
mask: #Netmask
}
We have dipped our toes into how to define schemas and constraints with CUE. There is more to explore here, however, these basics can get you a good start.
Enforcing validation
We have validated our YAML configuration, and this has worked fine. However, if we add another dummy entry to our YAML configuration and run cue vet on that, it will still work fine.
account: "123456789012"
region: eu-north-1
stuff: 333 ################### Our extra stuff here ###########
network:
vpc_cidr: 10.100.0.0/16
az_count: 2
nat_gateway_count: 1
public_mask: 25
private_subnets:
- name: private1
mask: 21
isolated_subnets:
- name: isolated1
mask: 21
❯ cue vet network_config.cue network_config.yaml
As long as CUE finds valid and non-conflicting data structures, it is all good. If we want to restrict the validation to only accept specific fields, we need to be explicit about what to validate. Let us just add a layer in our schema definition that includes the three top level fields we have, account, region and network.
#Config: {
account: string & =~"^[0-9]{12}$"
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: *"10.0.0.0/16" | #CIDR
az_count: int & >=1 & <=3
nat_gateway_count: int & >=1 & <=az_count
public_mask: #Netmask
private_subnets: [...#Subnet]
isolated_subnets: [...#Subnet]
}
}
#CIDR: string & =~"^10\\.(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){2}(0|16|32|64|128)(\\/(2[0-8]|1[6-9]))$"
#Netmask: int & >=16 & <=28
#Subnet: {
name: string
mask: #Netmask
}
By defining reusable definition, we can also be specific in what should be included in the non-CUE data we validate when we run cue vet:
❯ cue vet --schema '#Config' network_config.cue network_config.yaml
field not allowed: stuff:
./network_config.yaml:3:2
./network_config.cue:1:1
./network_config.cue:1:10
Now our extra stuff is not accepted. Sometimes, though, we want only partial validation and allow for extra items to be present. If our schema definition work is a work in progress, that may very well be the case. In those cases, we can add “…” to our definition, which makes the structure open to other fields, as opposed to the default closed structure definition.
#Config: {
account: string & =~"^[0-9]{12}$"
region: "eu-west-1" | "eu-north-1"
network: {
vpc_cidr: *"10.0.0.0/16" | #CIDR
az_count: int & >=1 & <=3
nat_gateway_count: int & >=1 & <=az_count
public_mask: #Netmask
private_subnets: [...#Subnet]
isolated_subnets: [...#Subnet]
}
...
}
#CIDR: string & =~"^10\\.(([0-9]|[1-9][0-9]|1[0-9]{2}|2[0-4][0-9]|25[0-5])\\.){2}(0|16|32|64|128)(\\/(2[0-8]|1[6-9]))$"
#Netmask: int & >=16 & <=28
#Subnet: {
name: string
mask: #Netmask
}
Running cue vet with this schema definition will accept our extra stuff.
Final words
This has been an introduction to CUE, a quite versatile configuration language which also can play well with existing tooling. This has only scratched the surface of CUE though, and there is a lot more to it!
There are some neat material on CUE and some good presentations, at different levels - a few of these can be found here: