Stack configuration, configuration management and infrastructure testing

A stack is a set of infrastructure resources, e.g. an Virtual Machine, Network and Database; or a Network Interface, A Virtual Harddrive and a Network Share.

These are my reflections, lessons learned and brief summary of chapters 7 and 8 of Infrastructure as Code and is part of the Infrastructure as Code series.

When you configure a stack, keep parameters simple. If you find that you need an advanced set of parameters - or you find yourself introducing branching logic - make a new, more specialized, stack.

When naming resources in your stack, consider using a combination of a static and dynamic name, like app-service-ENVIRONMENT. If you are working with resource managers where you can generate a unique ID, consider generating a unique ID for a root object and then use that ID as a suffix. See this article for more information.

When transitioning into the world of stack configuration, keep non-secret configuration in source control (as opposed manually typing it in, or passing it in through your CD system of choice). In particular, be careful with exposing secrets to your CI/CD system, as they are very susceptible to security vulnerabilities.

Source controlled configuration also comes with the benefit of being able to track the current-state of any stack, as you want to deploy any changes as soon as they are committed and gone through your testing pipeline. You can also use it of auditing to see who did what when.

Secrets are best stored outside source control or generated as part of your deployment process, so that RESOURCE1 (e.g. your web application) is the only thing allowed to access RESOURCE2 (e.g. your application database).

Testing your infrastructure code is a good idea, because it's code and it manipulates resources. As creating an automated test suite is hard work, here are some pointers:

Consider adopting a progressive testing mindset, maintaining multiple test suites to test different aspects of the system. You can run faster tests first to get quicker feedback if they fail and only run slower broader-scoped tests after those have passed. These broader-scoped tests aid you best if they automatically create a ticket in your issue tracking system when they fail, so that you can address the issues in due time and result in a new or updated test/staging/production environment if they succeed.

"The essence of CI is to test every change someone makes as soon as possible. The essence of CD is to maximize the scope of that testing... Quality assurance is about managing the risks of applying code to your system. Will the code break when applied? Does it create the right infrastructure? Does the infrastructure work the way it should? ... CD is about broadening the scope of risks that are immediately testing when pushing a change to the codebase, rather than waiting for eventual testing days, weeks or even months afterwards. So on every push, a pipeline applies the code to realistic test environments and run comprehensive tests. Ideally, once the code has run though the automates tages of the pipelines, it's fully proven as production-ready... Automated tools can test how quickly specific actions complete. Testing the speed of  network connection from point A to point B can surface issues with the network configuration or the cloud platform is run before you even deploy an application." (Chapter 8: Core Practice: Continuously Test and Deliver under What Should We Test with Infrastructure, my emphasis)

Although a suite of low-level tests of declarative code can feel like a bookkeeping exercise, they can help you identify issues such as identifying that the infrastructure code was never applied or that the code was applied, but the tool failed to apply it correctly. This is especially important when you're getting to know a new toolset.

As you start to test declarative code, if the code gets complex enough that it needs complex testing, you need to pull some logic out of your declarations and into a library written in procedural code so that you can (compile it and) test it using traditional testing methods.

Finding the right level of testing is important: When setting up multiple networking structures - an address block, a load balancer, some routing rules and a gateway - testing each individual part might provide little value. However, the end goal of these resources - are you able to connect from point A to point B - is critical to your application. Test this. Similarly, if you provision a set of resources that your application should have access to, you can author an independent test that ensures that the User Account provisioned for your application has access to perform the necessary read/write operations to the resources.

Prefer many small infrastructure stacks to few large ones while keeping track of their value (see above), in order to simplify testing. 

As with other code, clarify, minimize and isolate dependencies. Consider adding test-doubles if a invoking dependency may cause unintended side effects or take a long time to execute. 

Once in production, invest in good monitoring and observability. Do what you can to reach zero-downtime deployment (e.g. with staging slots!). Consider maintaining test data records, such as users in your system that won't trigger real-world actions (or limit the risk/damage of these actions by having company-internal (e-mail) addresses).

Comments

Popular posts from this blog

Auto Mapper and Record Types - will they blend?

Unit testing your Azure functions - part 2: Queues and Blobs

Testing WCF services with user credentials and binary endpoints