Categories
Cloud & Infrastructure

Terraform Managed AMIs With Packer

This article was originally published on Geek and I, January 14, 2021, and has been republished with the author’s permission.

I have been working with a friend on learning Terraform to manage his new and growing, AWS environment. One of the challenges I gave him was to use Terraform to manage the AMI updates that Packer creates or to initiate an update if the source AMI is newer than the current state.

Terraform doesn’t have Packer provider so this requires using other resources built into Terraform to accomplish a working and trackable state.

Problem Statement

Maintain current AMIs based on source AMI and userdata updates and rebuild the AMI as needed when the source, or gold image, AMI is updated, or you update your userdata, using Packer to accomplish customization.

  1. Figure out our source AMI via data; lookup(s)
  2. If source ami-id has changed, then initiate new AMI build
  3. If userdata has changed, then initiate new AMI build
  4. If source ami-id and userdata have not changed, do nothing (idempotent!)

Terraform built-in resources

I accomplished this by abusing the null_resource provider and local-exec provisioner.

First, let’s go find the AMI we need as the source:

data "aws_ami" "ubuntu" {
  most_recent = true
 
  filter {
    name   = "name"
    values = ["ubuntu/images/hvm-ssd/ubuntu-focal-20.04-amd64-server-*"]
  }
 
  filter {
    name   = "virtualization-type"
    values = ["hvm"]
  }
 
  # Canonical
  owners = [
    "099720109477"
  ]
}

This returns an ami-id of ami-0c007ac192ba0744b (as of 20210114 in AWS region us-east-2). These AMIs are updated by Canonical periodically, and there will be a new ami-id.

Now that we have an ami-id, we can add that as a trigger to execute changes to null_resource. This has a second trigger to check on the userdata file that will be used to do customization:

resource "null_resource" "build_custom_ami" {
  triggers = {
    aws_ami_id      = data.aws_ami.ubuntu.id
    sha256_userdata = filesha256("deploy/packer-customize.sh")
  }
 
  provisioner "local-exec" {
    environment = {
      VAR_AWS_REGION = var.aws_region
      VAR_AWS_AMI_ID = data.aws_ami.ubuntu.id
    }
 
    command = <<EOF
    set -ex;
    packer validate \
      -var "aws_region=$VAR_AWS_REGION" \
      packer-configs/custom_ami.json
    packer build \
      -var "aws_region=$VAR_AWS_REGION" \
      packer-configs/custom_ami.json
EOF
  }
}

So basically I have the following directory structure that is relevant. You will probably also have backend resource, perhaps some requirements, etc.

data.tf
ami.tf
-> packer-configs/
---> custom_ami.json
-> deploy/
---> packer-customize.sh

Implementation via Jenkins or other CI/CD systems is left to you to figure out.

What are the variables used for in local-exec?

I have items running in multiple regions and each region has its own AMIs (and resulting ami-ids). The above has been pared down a bit for brevity.

You can use the aws provider to connect to multiple regions concurrently:

### per region provider info using provider listings
provider "aws" {
alias  = "region-us-east-1"
region = "us-east-1"
}
 
provider "aws" {
alias  = "region-us-east-2"
region = "us-east-2"
}
 
provider "aws" {
alias  = "region-us-west-1"
region = "us-west-1"
}
 
provider "aws" {
alias  = "region-us-west-2"
region = "us-west-2"
}

Then you can build AMIs in each region. This example code is not complete but the concept is very straight forward:

data "aws_ami" "ubuntu-use2" {
  provider    = aws.region-us-east-2
  most_recent = true
  ...
}
 
data "aws_ami" "ubuntu-usw2" {
  provider    = aws.region-us-west-2
  most_recent = true
  ...
}
resource "null_resource" "build_usw2_ami" {
  provider = aws.region-us-west-2
  triggers = {
    aws_ami_id      = data.aws_ami.ubuntu-usw2.id
    sha256_userdata = filesha256("deploy/userdata.sh")
  }
 
  provisioner "local-exec" {
    environment = {
      VAR_AWS_REGION = "us-west-2"
      VAR_AWS_AMI_ID = data.aws_ami.ubuntu-usw2.id
    }
  ...
}

Of course you can do other things to make it even more dynamic using data calls for aws_caller_identity within the region you are working against and applying it programmatically but I’ll leave that to you for now.


Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

"I ran my own business for years, and I worked hard to be overly transparent with my team. To me, people are what make the business. Not the customers. Not the revenues. It’s the folks doing the work.  Trility provides a level of transparency that I appreciate. It’s a focused, yet laid-back culture, where I can count on my team members when needed." 

– Mike Horwath
Categories
Cloud & Infrastructure

AWS Multi-Account, Multi-Region Networking with Terraform

AWS Des Moines Meetup

Eric Gerling and Nathan Levis of Trility Consulting shared how to manage multiple AWS accounts with resources spread across multiple regions in a simple, cost-effective way. Required attributes included Infrastructure as Code (HashiCorp’s Terraform), single source of truth for AWS accounts, rapid deployment of new regions, and VPN access with access to dynamically created VPCs in multiple regions.

Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

“You can expect variety with the type of work Trility’s clients are pursuing. We aren’t an X shop, we are a ‘get the job done’ shop, which means you’ll have lots of different opportunities to solve challenging problems with various methods.”  

– Eric Gerling
Categories
Cloud & Infrastructure

Why I Started Using and Recommending Terraform

Having used AWS Config, AWS Codepipeline, and AWS CloudFormation for many years, I had everything automated and working smoothly, and did not see the need for tools external to the AWS environment. 

I looked at HashiCorp’s Terraform and saw it merely as a tool to automate deployments and track the state of resources. And I was already doing this with AWS CloudFormation and AWS Config. 

My argument to those pushing for Terraform: Why would I need to utilize Terraform?

Eventually, I had a situation that caused an outage. We had pushed an update to one of our critical CloudFormation infrastructure stacks and the stack entered a failed state. It could not be rolled-back or deleted. This led to an AWS support call and the stack was destroyed manually by AWS.

Unfortunately, other stacks had a dependency to the failed stack, and the failure required the complete production infrastructure to be destroyed and rebuilt. 

After we fixed the problem, I thought this was a rare anomaly, and probably would not happen again. Six months later, I had another failed stack. Fortunately, this stack was not a critical piece of infrastructure, and the stack could be redeployed.

After having this occur several times over the course of several years, I realized it was a bug in the AWS CloudFormation deployment and updating process, and began to research other options. 

I began to look more seriously at Terraform again. 

Could using Terraform for deployments prevent this situation? 

The answer is yes.

Terraform uses templates to deploy resources and they are not coupled together in the same way they are coupled together in CloudFormation stacks. Instead, Terraform resources are independent of each other, and can be easily updated without causing a failure.

As I dug deeper into Terraform, I also discovered many additional features which AWS did not or could not provide because of the way the AWS ecosystem was built.

Some of those additional features include:

  • Test suite to allow security and compliance testing of templates prior to deployment.
  • Local AWS cloud environments which allow the deployment of AWS resources locally to test templates and reduce costs and the number of unnecessary deployments.
  • Ability to leverage and integrate features in other cloud platforms, such as Azure Active Directory into AWS deployments.
  • Ability to build your own Terraform modules to standardize organizational deployments. This allowed all deployments to be exactly the same, with the same tags, security, etc.

At this point, I was totally sold on using Terraform and began developing my own modules. Unfortunately, many cloud engineers do not see the same benefits and they make the same arguments I previously made.

  • Everything already works for us, and we don’t need to add another layer of complexity.
  • We don’t need to test our infrastructure code, only developers do this.
  • We can ensure compliance by using AWS Config.
  • Why do you need to test locally, when it is very inexpensive to just deploy to an actual AWS environment.
  • We don’t need modules and libraries for infrastructure code.
  • We only use AWS, and do not use any other cloud environments.

While I can empathize with their arguments,  eventually they will experience multiple failed CloudFormation stacks and research processes to prevent it from happening in the future.

While they may not come to the same conclusion I did, I would hope they will look at Terraform as a potential solution.

– Will Rubel, Senior DevOps/Cloud Engineer


Join our team

We are always looking for people who love problems and welcome the hard work required to solve them.

"This is the best work environment I've ever had. People are honest, trustworthy, respectful, professional. I'm never leaving! I truly appreciate that our leadership has years of hands-on experience developing and implementing solutions for customers, and can easily discuss technical details with customers and developers. The owners of Trility are truly unique. They set the atmosphere, tone, pace, and ethos, and have built an organization that is unique from other organizations."

– Will Rubel
Categories
Cloud & Infrastructure

My Cheat Sheet for Understanding the Benefits of Cloud Computing

Here’s my take on the benefits of cloud computing. You can search and find several, but I thought I’d share my perspective from a typical end-user. Early on, I used this free Microsoft online course to understand the benefits: Cloud Concepts – Principles of cloud concepts. It does a great job of explaining the benefits (and more) that everyone markets and sells. And again, many of these benefits require 100% code and it is not blatantly stated in the course. My cheat sheet builds on this specific page to help you understand what the heck it all means. 

Elasticity – You pay for the computing power when you need it vs. all the time. I always think about this example: A Pizza Co. website on Superbowl night can add “enough power” to ensure the high volume of orders is processed in a manner that makes the customer happy.  With elasticity, you don’t have to pay for the high ceiling of computing power 24/7/365; however, you have it available whenever the need arises. 

I’d like to note, I use the term “power” instead of the technical term “resources.” If you are working from home now and you don’t have enough resources allocated to you, it means you might experience lag time when clicking between “stuff” you are working on. System elasticity is one thing that could help solve this problem for you.

Also, you may be learning more about VPNs (virtual private networks) and VPCs (virtual private computers). The VPN is where several people are accessing your company’s secure private place to work and the VPC is literally your “virtual computer” in that network. It’s a real-time copy of all the things you do on your company-provided desktop or laptop.

Scalable – This always felt the same as elasticity to me, but it’s where IT chooses to add resources so you have enough power to get stuff done. When adding to a server it’s vertical scaling and adding another server in addition, it’s horizontal. When you do it horizontally, you can leverage where better. 

An example for where to scale horizontally, ties nicely to the next benefit…

Accessible from anywhere – You can add power anywhere using horizontal scaling, which makes your environment  more accessible. 

Your teams in the Eastern Hemisphere can work from a server in that region (faster processing power). When they log off and leave work, you reduce “computing power” on that server and increase computing power for those working in the Western Hemisphere on a different server. This also applies to a business operating in the United States from coast-to-coast. For example, east coast workers access the cloud providers database located in the eastern region. 

I find visuals really help. You can view region maps for the major public cloud providers: AWS, Azure, and Google Cloud. A properly managed cloud enables accessibility using this flexible benefit. Your company can buy and locate its servers wherever it wants from day to day, hour to hour, and yes, even minute to minute.

Reduced infrastructure and maintenance costs – You also aren’t paying to maintain computer hardware which gives your people capacity to innovate rather than constantly completing hardware setup, upgrades, and other IT-related tasks. Historically, you needed huge server rooms that frequently needed updated and maintained. The cloud removes those needs.

Reliable – Redundancy can be built in. However, it can be done more effectively and efficiently if done in code. This means you have backups and backups of backups to ensure uptime for your people and your customers. So if for some reason, the public cloud service provider has a service failure in one region, it has a backup in place in another one. So you never miss a customer transaction or a team member can always login and access their files and applications. 

I learned about this when writing up a project description for a client who needed a roadmap and implementation for a centralized and automated solution for role-based access that allowed for rotating credentials every certain number of days and it needed to be highly available.

Human terms: A process to notify employees that they needed to change their login every 30, 60, or 90 days and that it always works with backup and redundancies in place. (Highly annoying, but very critical nowadays.)

By using HashiCorp’s Vault and Terraform (100% code), the client’s process was automated and set up to achieve 99.99% availability with the least amount of human interaction. Meaning it should only experience 8.64  seconds/day of downtime, shown in the chart below. 

Learn how you can build a Centralized Automated Vault Solution.

If you want to sound extra smart, you can refer to this as “four nines.” Think about the impact on your business: If you sell pizzas and have 97% availability, consider how many potential lost orders could occur in 43 minutes of downtime on Superbowl night.

Possibly more relevant: Consider all the employees working from home now, how does 43 minutes of downtime impact your business when multiplied by the number of employees logged on at a given time?

Availability Chart

Highly Available Chart
Source: Wikipedia, Highly available

Physically Secure – Physical security is provided by the cloud service provider. If you go back to those maps, these companies have fortresses to secure the data centers located around the globe. However, it’s still your responsibility to mitigate logical security risks and threats in your implemented solutions. What do I mean by logical? This is an entire article in itself that I’m planning to write next. Or you can read this article on becoming a Security-First Organization or Cloud Breaches Prove DevOps Needs Dedicated Testers.

Regulatory Compliant – Many public cloud providers publish their current compliance against domestic and international standards across many industries including HIPAA, HiTrust, PCI, NIST-CSF, NIST 800-53, SOC Type II, etc. By leveraging a provider that has already achieved these standards, it frees you up to concentrate on the compliance of the software stacks you put on top of their infrastructure ecosystems. 

Clarity on cloud computing

In writing, How to Truly Transform Your Business with Cloud Computing, I found the need to explain the full advantages of moving your business to the cloud in the most simple terms. If this cheat sheet sparked thoughts or questions, I’d love to hear them by connecting with me via LinkedIn or email.

Categories
Cloud & Infrastructure

How to Truly Transform Your Business with Cloud Computing

It’s as if the lightbulbs in my brain were all replaced with dimmer switches when I joined Trility Consulting a year ago. I had an opportunity to learn yet another industry in my marketing career and before me were some really smart, pragmatic experts willing to teach me.

As a journalism major from a long, long time ago, I thought my curiosity and ability to ask any question without feeling stupid would be invaluable. Don’t get me wrong, they were. However, learning how software developers and engineers build technology solutions in the cloud has so many facets there were questions I didn’t even know to ask. The golden nuggets always seem to come from follow-up or clarifying questions.

I’ve had several light bulb moments in the last year. So many that I now describe these moments as my light bulb is now just less dim. My colleague describes it as…

“Your learning light gets brighter as your understanding increases over time.”

– Jennifer Davis

So I’m sharing the lesson and explaining in terms that hopefully even non-technical business leaders and decision-makers can understand. 

Why? Moving your business to the cloud was critical in sustaining your business before COVID-19. Now, it’s necessary. And it’s necessary that you transform your business for the better vs. just move it to the cloud.

Here is the takeaway:

Finding value in a cloud solution requires constantly nudging that dimmer switch up to steadily optimize performance and reduce costs – and it requires 100% code.

To demonstrate this, I’m sharing my first so-called light bulb moment and then how nine months later, I didn’t truly grasp the lesson in that moment. I must admit, it shames me a little bit as a journalism major.  However, I’ve been told this lesson isn’t easily grasped – especially by non-technical people, even leaders and decision-makers who are in a position to really transform the way companies operate. 

It really is the Matrix. Everything is code.

Last fall, I sat in a lunch ’n’ learn and I asked about how Identity Access and Management (IAM)* permissions work in Amazon Web Services (AWS). In a previous job, I had a part in helping adding/adjusting/deleting users and permissions, so I had a basic understanding of how to maintain users in an on-premise environment.** 

*IAM regulates who has access to what and to what extent, i.e., role-based permissions.
**On-premise means your servers, applications, databases, etc., all exist on-site and you most likely have a locked server room that is very chilly and a great place to cool off if you’ve biked to work. 

The person leading this session showed how Trility typically approaches centralized IAM permissions to ensure the highest security practices. It’s done in code vs. what I remember being a series of menus and checkboxes to update/add/remove roles or users to our on-premise server. Which also meant it was on-demand and manual. Giving access to a new hire meant I could be a blocker for them to login. 

I asked, “So it’s like the Matrix? Everything is code?” 

Answer, “Yes.”

My clarifying question, “Is everything in code?”

Again, “Yes.”

Sweet. My light bulb popped on. Or so I thought. It was really just less-dim. 

I walked away from this conversation assuming everything built in the cloud had to be 100% code and this is how everyone does it. 

The Catch: Everything can be code – but not always 

Fast forward to this spring: I learned, unfortunately, not everyone builds in the cloud using 100% code. To take full advantage of the cloud, you need to consider doing it all in code – or at least take the steps towards it.

I offer up my cheat sheet of what those full advantages are and what that translates to those who don’t code.  

This light bulb moment happened at another lunch ’n’ learn where an example AWS instance was pulled up for an upcoming DevOps Meetup our company was hosting in Omaha. It provided a visual context and I asked a question that off-handedly led to learning that some people may click through settings vs. writing permissions in 100% code. 

For those interested, here’s the video of the DevOps Meetup where you can learn how to manage multiple AWS resources spread across AWS multiple regions in a simple, cost-effective way using Terraform. 

“So you don’t have to code in the cloud?” I asked.

“No,” was the short answer. 

I got a longer answer explaining why, but will offer a less-long answer in my terms:

Yes, there are just as many menus and boxes that can be checked in AWS, Google, Azure, or any other cloud services provider. 

Yes, it will allow you to move to the cloud in days, weeks, or even one month and you could even possibly do a complete lift ’n’ shift, which is pretty much always a bad idea but there could be a valid, contextual reason. And if you do this, you can’t just walk away and consider it done. You need to refactor everything and do it very, very quickly. 

You can choose this route, but…

And it’s a big but.

It won’t allow you to reap the benefits of cloud computing. Going this route, you could end up paying more or not achieving the return on investment that everyone sells: Move to the cloud. It costs less.

Innovating with the full power of the cloud at your disposal AND saving time, money, team capacity requires doing the work in code and building reusable patterns.

What do you mean by repeatable patterns? 

I asked this once and here’s my answer:

Think of code as a living template that’s housed in a repository which serves as the single source of truth for every digital aspect of your business. It is updated in one place and is pushed out (deployed) to all the places that use this code. For example, if you have IAM permissions that are both centralized and automated, this code can be reused for any new application that is needed. If you were developing a new application, the software developer can use automated tests (or at least manual, daily tests) against that code to confirm permissions and logins work for a new application. This applies to “any code templates” for automated critical integrations (such as security measures). This translates to bugs and issues being found daily vs. waiting to reactively try to solve them a week or months later when the application needs launched.

As a marketer, I just uncovered a major value proposition for my company and I didn’t learn it until 10 months into the job. By assuming everything built in the cloud had to be code, I did not realize this was a differentiator for Trility. It can also be a differentiator for your business. 

(Yes, my marketing head hangs in shame, but my journalistic brain says: “Hey, people need to know this, so share.”)

While Trility isn’t the only firm with this mindset, there are several companies and individuals who haven’t had the capacity to fully leverage this approach. In the race to move to the cloud, companies have many challenges, one of which is capacity.

Balancing capacity to maintain and capacity to innovate

Technology teams are busy maintaining IT systems. Moving to the cloud requires learning a new way of working. You may hire a team or contract people to come in and do the work. When choosing these options, I encourage you to ensure your contracts include training or documentation that equips your people for maintaining and iterating with the necessary “future-state skills.” Otherwise, you just have a new system and a new vendor who has guaranteed themselves billable hours for months or years to come. 

Equip your people to build in the cloud

To change the way you do business in the short- and long-term, you need to enable your own teams for the long haul. My advice to those midway through a migration, operating in a hybrid situation, or still contemplating how to do it:

Equip them with the opportunity to learn to do it all in code. 

  • It will require understanding their capacity. These are the same people tasked with maintaining current systems and providing support. 
  • It may require hiring a firm to come in and train and teach your people. Make sure you hire a firm that builds this way.
  • It may require outsourcing a project. Again, your people will need to understand how to maintain it after that consultant is gone. If you go this route, ensure training and documentation are included from any firm you select.

Whatever you do, ensure in the contract that you are setting up your people for success when the contract ends. At Trility, we’ve found great success in providing services in this way.

What else have I learned?

What I’ve learned so far is, the cloud isn’t just another datacenter. It is a new way of doing work. I realize I’m only scratching the surface. However, as I learn more and more about value propositions, 100% software-defined cloud is making more and more sense to me daily.

If you found this insightful, read: My Cheat Sheet for Understanding the Benefits of Cloud Computing. It builds on this article to help you understand what the heck this stuff all means. 

I plan to keep sharing my less dim light bulb moments in the simplest of terms.

If this article sparked thoughts or questions, I’d love to hear them so I can continue to bring clarity to a complex and now a very necessary way to do work. Feel free to connect with me on LinkedIn or email me.

Categories
Cloud & Infrastructure

AWS Multi-Account, Multi-Region Networking with Terraform

Omaha DevOps Meetup Event

Eric Gerling of Trility Consulting spoke at the Omaha DevOps Meetup to share how to manage multiple AWS accounts with resources spread across multiple regions in a simple, cost-effective way. Required attributes included Infrastructure as Code (HashiCorp’s Terraform), single source of truth for AWS accounts, rapid deployment of new regions, and VPN access with access to dynamically created VPCs in multiple regions.

If you view the video on YouTube, the description includes a timeline of the event to jump to the areas of most value to you.

Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

“You can expect variety with the type of work Trility’s clients are pursuing. We aren’t an X shop, we are a ‘get the job done’ shop, which means you’ll have lots of different opportunities to solve challenging problems with various methods.”  

– Eric Gerling
Categories
Cloud & Infrastructure

Part IV: Complex Practical Examples of DevOps Unit Testing

In my previous article, I provided a simple example of mocking an AWS resource using localstack, and testing with the python terraform-compliance module. In this example, I will provide a more extensive example using kitchen-terraform and terraform-compliance to deploy the following resources in AWS us-east-1 and us-west-2 regions.

  1. VPC
  2. Subnet
  3. Internet Gateway
  4. Route Table
  5. Route Table Association
  6. Security Group
  7. Key Pair
  8. 2 X EC2 Instance

To begin this example, you will need the following:

  1. Terraform 
  2. Ruby
  3. Python3
  4. Python3 virtualenv module
  5. An AWS account with credentials configured in ~/.aws
  6. An AWS role or user with at least the minimum permissions:
{
 "Version": "2012-10-17",
 "Statement":
   [
     {
       "Sid": "Stmt1469773655000",
       "Effect": "Allow",
       "Action": ["ec2:*"],
       "Resource": ["*"]
     }
   ]
}

Next, we need to set up a Python3 virtual environment, activate the environment and install the python terraform-compliance module.

which python3
/Library/Frameworks/Python.framework/Versions/3.8/bin/python3
cd ~
mkdir virtualenvs
cd virtualenvs
virtualenv terraform-test  -p /Library/Frameworks/Python.framework/Versions/3.8/bin/python3
source terraform-test/bin/activate
pip install terraform-compliance

Now, we need to create a projects directory and download the sample code from github.

cd ~
mkdir projects
cd projects
git clone git@github.com:rubelw/terraform-kitchen.git
cd terraform-kitchen

Now we are ready to run our tests, by executing the ‘execute_kitchen_terraform.sh’ file.

This script will perform the following functions:

  1. Install bundler
  2. Install required gems
  3. Create public and private key pair
  4. Initialize terraform project
  5. Test terraform plan output against terraform-compliance features
  6. Execute kitchen test suite
  • kitchen destroy centos(us-east-1)
  • kitchen create centos(us-east-1)
  • kitchen converge centos(us-east-1)
  • kitchen verify centos (us-east-1)
  • kitchen destroy centos(us-east-1)
  • kitchen destroy ubuntu(us-west-2)
  • kitchen create ubuntu(us-west-2)
  • kitchen converge ubuntu(us-west-2)
  • kitchen verify ubuntu(us-west-2)
  • kitchen destroy ubuntu(us-west-2)
./execute_kitchen_terraform.sh

This script will begin by checking if bundler is installed, and then installing the necessary ruby gems.

Successfully installed bundler-2.1.4
Parsing documentation for bundler-2.1.4
Done installing documentation for bundler after 2 seconds
1 gem installed
Fetching gem metadata from https://rubygems.org/.........
Fetching gem metadata from https://rubygems.org/.
Resolving dependencies..
…
Using kitchen-terraform 5.2.0
Bundle complete! 1 Gemfile dependency, 185 gems now installed.
Use `bundle info [gemname]` to see where a bundled gem is installed.

Next the script will test if the public/private keypair exists in the test/assets directory, if not, it will create the key pair.

checking if test/assets directory exists
Generating public/private rsa key pair.
Your identification has been saved in test/assets/id_rsa.
Your public key has been saved in test/assets/id_rsa.pub.
The key fingerprint is:
SHA256:0oryWP5ff8kBwQPUSCrLGlVMFzU0rL7TQtJSi6iftyo Kitchen-Terraform AWS provider tutorial
The key's randomart image is:
+---[RSA 4096]----+
|       ooo*X=    |
|       ..o. *o   |
|      o .  . o   |
|     o +  o .    |
|    . +.S= . .   |
|     +.o+ =   .  |
|  . +..  +.o . o |
|   *E  ...+.. +  |
|  . o+=+o. o..   |
+----[SHA256]-----+

Next, the script will test the terraform project, using the python terraform-compliance module, and features located in test/features.

The script begins by testing if the terraform project has been initialized, and if not, initializing the project.

Initializing the backend...

Initializing provider plugins...
- Checking for available provider plugins...
- Downloading plugin for provider "random" (hashicorp/random) 2.1.2...
- Downloading plugin for provider "aws" (hashicorp/aws) 2.51.0...

Terraform has been successfully initialized!

You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.

After terraform initialization, the script will execute ‘terraform plan’ and output the plan in json format. It will then test the terraform output against the features in the test directory.

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.


------------------------------------------------------------------------

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_instance.reachable_other_host will be created
  + resource "aws_instance" "reachable_other_host" {
      + ami                          = "ami-1ee65166"
      + arn                          = (known after apply)
      + associate_public_ip_address  = true
      + availability_zone            = (known after apply)
…
Plan: 11 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

This plan was saved to: myout

To perform exactly these actions, run the following command to apply:
    terraform apply "myout"

terraform-compliance v1.1.11 initiated

🚩 Features	: /terraform-kitchen/test/features
🚩 Plan File	: /terraform-kitchen/myout.json

🚩 Running tests. 🎉

Feature: security_group  # /terraform-kitchen/test/features/security_group.feature
    In order to ensure the security group is secure:

    Scenario: Only selected ports should be publicly open
        Given I have AWS Security Group defined
        When it contains ingress
        Then it must only have tcp protocol and port 22,443 for 0.0.0.0/0

1 features (1 passed)
1 scenarios (1 passed)
3 steps (3 passed)

You may be asking, why do we need both terraform-compliance features and kitchen-terraform fixtures for our testing? The purpose of terraform-compliance features is to have a repository of global, enterprise-level features and tests, which get applied to all projects. For example, the test displayed above will test security groups, so only ports 22 and 443 are open. No other ports should be open in the security group.

The kitchen-terraform fixtures and tests are designed for unit testing a single terraform project, and are not to be applied to every terraform project. 

Continuing with the script execution, the script will now run the kitchen-terraform tests. It begins by attempting to destroy any existing terraform state in the applicable region.

-----> Starting Test Kitchen (v2.3.4)
-----> Destroying <complex-suite-centos>...
$$$$$$ Verifying the Terraform client version is in the supported interval of >= 0.11.4, < 0.13.0...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.21
       + provider.aws v2.51.0
       + provider.random v2.1.2
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Initializing the Terraform working directory...
       Initializing modules...
       
       Initializing the backend...
       
       Initializing provider plugins...
       
       Terraform has been successfully initialized!
$$$$$$ Finished initializing the Terraform working directory.
$$$$$$ Selecting the kitchen-terraform-complex-suite-centos Terraform workspace...
$$$$$$ Finished selecting the kitchen-terraform-complex-suite-centos Terraform workspace.
$$$$$$ Destroying the Terraform-managed infrastructure...
       module.complex_kitchen_terraform.random_string.key_name: Refreshing state... [id=none]
…
       Destroy complete! Resources: 11 destroyed.
$$$$$$ Finished destroying the Terraform-managed infrastructure.
$$$$$$ Finished destroying the Terraform-managed infrastructure.
$$$$$$ Selecting the default Terraform workspace...
       Switched to workspace "default".
$$$$$$ Finished selecting the default Terraform workspace.
$$$$$$ Deleting the kitchen-terraform-complex-suite-centos Terraform workspace...
       Deleted workspace "kitchen-terraform-complex-suite-centos"!
$$$$$$ Finished deleting the kitchen-terraform-complex-suite-centos Terraform workspace.
       Finished destroying <complex-suite-centos> (3m31.75s).
-----> Test Kitchen is finished. (3m32.88s)

The script will then initialize the terraform working directory and select a new terraform workspace.

-----> Starting Test Kitchen (v2.3.4)
-----> Creating <complex-suite-centos>...
$$$$$$ Verifying the Terraform client version is in the supported interval of >= 0.11.4, < 0.13.0...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.21
       + provider.aws v2.51.0
       + provider.random v2.1.2
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Initializing the Terraform working directory...
       Upgrading modules...
       - complex_kitchen_terraform in ../../..
       
       Initializing the backend...
       
       Initializing provider plugins...
       - Checking for available provider plugins...
       - Downloading plugin for provider "random" (hashicorp/random) 2.1.2...
       - Downloading plugin for provider "aws" (hashicorp/aws) 2.51.0...
       
       Terraform has been successfully initialized!
$$$$$$ Finished initializing the Terraform working directory.
$$$$$$ Creating the kitchen-terraform-complex-suite-centos Terraform workspace...
       Created and switched to workspace "kitchen-terraform-complex-suite-centos"!
       
       You're now on a new, empty workspace. Workspaces isolate their state,
       so if you run "terraform plan" Terraform will not see any existing state
       for this configuration.
$$$$$$ Finished creating the kitchen-terraform-complex-suite-centos Terraform workspace.
       Finished creating <complex-suite-centos> (0m16.81s).
-----> Test Kitchen is finished. (0m17.97s)

The next step in the script is to run the ‘kitchen converge’.  This step will converge the platforms in the kitchen.yml file.

-----> Starting Test Kitchen (v2.3.4)
-----> Creating <complex-suite-centos>...
$$$$$$ Verifying the Terraform client version is in the supported interval of >= 0.11.4, < 0.13.0...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.21
       + provider.aws v2.51.0
       + provider.random v2.1.2
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Initializing the Terraform working directory...
       Upgrading modules...
       - complex_kitchen_terraform in ../../..
       
       Initializing the backend...
       
       Initializing provider plugins...
       - Checking for available provider plugins...
       - Downloading plugin for provider "random" (hashicorp/random) 2.1.2...
       - Downloading plugin for provider "aws" (hashicorp/aws) 2.51.0...
       
       Terraform has been successfully initialized!
$$$$$$ Finished initializing the Terraform working directory.
$$$$$$ Creating the kitchen-terraform-complex-suite-centos Terraform workspace...
       Created and switched to workspace "kitchen-terraform-complex-suite-centos"!
       
       You're now on a new, empty workspace. Workspaces isolate their state,
       so if you run "terraform plan" Terraform will not see any existing state
       for this configuration.
$$$$$$ Finished creating the kitchen-terraform-complex-suite-centos Terraform workspace.
       Finished creating <complex-suite-centos> (0m16.81s).
-----> Test Kitchen is finished. (0m17.97s)

Finally, the script will execute ‘kitchen verify’ to test the deployed project against the test suite.

-----> Starting Test Kitchen (v2.3.4)
-----> Setting up <complex-suite-centos>...
       Finished setting up <complex-suite-centos> (0m0.00s).
-----> Verifying <complex-suite-centos>...
$$$$$$ Reading the Terraform input variables from the Kitchen instance state...
$$$$$$ Finished reading the Terraform input variables from the Kitchen instance state.
$$$$$$ Reading the Terraform output variables from the Kitchen instance state...
$$$$$$ Finished reading the Terraform output variables from the Kitchen instance state.
-----> Starting verification of the systems.
$$$$$$ Verifying the 'local' system...

Profile: complex kitchen-terraform (complex_suite)
Version: 0.1.0
Target:  local://

  ✔  state_file: 0.12.21
     ✔  0.12.21 is expected to match /\d+\.\d+\.\d+/
  ✔  inspec_attributes: static terraform output
     ✔  static terraform output is expected to eq "static terraform output"
     ✔  static terraform output is expected to eq "static terraform output"


Profile Summary: 2 successful controls, 0 control failures, 0 controls skipped
Test Summary: 3 successful, 0 failures, 0 skipped
$$$$$$ Finished verifying the 'local' system.
…
$$$$$$ Finished verifying the 'remote' system.
$$$$$$ Verifying the 'remote2' system...
DEPRECATION: AWS resources shipped with core InSpec are being moved to a resource pack for faster iteration. Please update your profiles to depend on git@github.com:inspec/inspec-aws.git . Resource 'aws_vpc' (used at /private/tmp/terraform-kitchen/test/integration/complex_suite/controls/aws_resources.rb:11)
DEPRECATION: AWS resources shipped with core InSpec are being moved to a resource pack for faster iteration. Please update your profiles to depend on git@github.com:inspec/inspec-aws.git . Resource 'aws_subnets' (used at /private/tmp/terraform-kitchen/test/integration/complex_suite/controls/aws_resources.rb:16)
DEPRECATION: AWS resources shipped with core InSpec are being moved to a resource pack for faster iteration. Please update your profiles to depend on git@github.com:inspec/inspec-aws.git . Resource 'aws_security_group' (used at /private/tmp/terraform-kitchen/test/integration/complex_suite/controls/aws_resources.rb:22)

Profile: complex kitchen-terraform (complex_suite)
Version: 0.1.0
Target:  aws://

  ✔  aws_resources: VPC vpc-00aa64d66abfa8e9c
     ✔  VPC vpc-00aa64d66abfa8e9c is expected to exist
     ✔  VPC vpc-00aa64d66abfa8e9c cidr_block is expected to eq "192.168.0.0/16"
     ✔  EC2 VPC Subnets with vpc_id == "vpc-00aa64d66abfa8e9c" states is expected not to include "pending"
     ✔  EC2 VPC Subnets with vpc_id == "vpc-00aa64d66abfa8e9c" cidr_blocks is expected to include "192.168.1.0/24"
     ✔  EC2 VPC Subnets with vpc_id == "vpc-00aa64d66abfa8e9c" subnet_ids is expected to include "subnet-000c991d9264c3a5f"
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f is expected to exist
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f is expected to allow in {:ipv4_range=>"198.144.101.2/32", :port=>22}
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f is expected to allow in {:ipv4_range=>"73.61.21.227/32", :port=>22}
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f is expected to allow in {:ipv4_range=>"198.144.101.2/32", :port=>443}
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f is expected to allow in {:ipv4_range=>"73.61.21.227/32", :port=>443}
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f group_id is expected to cmp == "sg-0bcdd1f63ba2a4b6f"
     ✔  EC2 Security Group sg-0bcdd1f63ba2a4b6f inbound_rules.count is expected to cmp == 3
     ✔  EC2 Instance i-0db748e47640739ea is expected to exist
     ✔  EC2 Instance i-0db748e47640739ea image_id is expected to eq "ami-ae7bfdb8"
     ✔  EC2 Instance i-0db748e47640739ea instance_type is expected to eq "t2.micro"
     ✔  EC2 Instance i-0db748e47640739ea vpc_id is expected to eq "vpc-00aa64d66abfa8e9c"
     ✔  EC2 Instance i-0db748e47640739ea tags is expected to include {:key => "Name", :value => "kitchen-terraform-reachable-other-host"}


Profile Summary: 1 successful control, 0 control failures, 0 controls skipped
Test Summary: 17 successful, 0 failures, 0 skipped
$$$$$$ Finished verifying the 'remote2' system.
-----> Finished verification of the systems.
       Finished verifying <complex-suite-centos> (0m43.58s).
-----> Test Kitchen is finished. (0m44.76s)

The last step in the script is the ‘kitchen destroy’.  This will destroy all AWS resources instantiated for the test.

-----> Starting Test Kitchen (v2.3.4)
-----> Destroying <complex-suite-centos>...
$$$$$$ Verifying the Terraform client version is in the supported interval of >= 0.11.4, < 0.13.0...
$$$$$$ Reading the Terraform client version...
       Terraform v0.12.21
       + provider.aws v2.51.0
       + provider.random v2.1.2
$$$$$$ Finished reading the Terraform client version.
$$$$$$ Finished verifying the Terraform client version.
$$$$$$ Initializing the Terraform working directory...
       Initializing modules...
       
       Initializing the backend...
       
       Initializing provider plugins...
       
       Terraform has been successfully initialized!
$$$$$$ Finished initializing the Terraform working directory
…
       module.complex_kitchen_terraform.aws_vpc.complex_tutorial: Destroying... [id=vpc-00aa64d66abfa8e9c]
       module.complex_kitchen_terraform.aws_vpc.complex_tutorial: Destruction complete after 1s
       
       Destroy complete! Resources: 11 destroyed.
$$$$$$ Finished destroying the Terraform-managed infrastructure.
$$$$$$ Selecting the default Terraform workspace...
       Switched to workspace "default".
$$$$$$ Finished selecting the default Terraform workspace.
$$$$$$ Deleting the kitchen-terraform-complex-suite-centos Terraform workspace...
       Deleted workspace "kitchen-terraform-complex-suite-centos"!
$$$$$$ Finished deleting the kitchen-terraform-complex-suite-centos Terraform workspace.
       Finished destroying <complex-suite-centos> (2m47.02s).
-----> Test Kitchen is finished. (2m48.17s)

Now the scripts will perform the same steps with ubuntu instances in us-west-2 region.

Future of Infrastructure Testing and Standards

In summary, I hope you have enjoyed this four-part series regarding infrastructure testing.  While these articles only covered specific situations and scenarios for infrastructure testing and deployments, I hope it causes your organization to open a discussion about the future direction of infrastructure testing and standards.

Read the Entire DevOps Testing Series


Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

“The best work environment I've ever had.  People are honest, trustworthy, respectful, professional. I'm never leaving! I truly appreciate that our leadership has years of hands-on experience developing and implementing solutions for customers, and can easily discuss technical details with customers and developers. The owners of Trility are truly unique. They set the atmosphere, tone, pace, and ethos, and have built an organization that is unique from other organizations.”  

– Will Rubel 
Categories
Cloud & Infrastructure

Part III: Practical Examples of DevOps Unit Testing

In my last two articles, I’ve talked conceptually and theoretically about the need for DevOps testers.

Part I: Does DevOps Need Dedicated Testers?
Part II: 2019 Cloud Breaches Prove DevOps Needs Dedicated Testers

In this article, I will provide practical examples of unit testing.

Since public cloud storage seems to be a common problem, I will begin with an example unit test for a terraform project which creates a simple S3 bucket.

First, we need to install localstack, so we can test AWS locally.

pip install localstack
export SERVICES=s3
export DEFAULT_REGION='us-east-1'
localstack start

In a new console/terminal and new directory, create a simple terraform project. The provider.tf file should point to the localstack ports.

provider "aws" {
	region = "us-east-1"
	skip_credentials_validation = true
	skip_metadata_api_check = true
	s3_force_path_style = true
	skip_requesting_account_id = true
	skip_get_ec2_platforms = true
	access_key = "mock_access_key"
	secret_key = "mock_secret_key"
	endpoints {
    	s3 = "http://localhost:4572"
	}
}

resource "aws_s3_bucket" "b" {
  bucket = "test"
  acl    = "private"

  tags = {
	Name    	= "My bucket"
	Environment = "Dev"
  }
}

Deploy the terraform project.

terraform plan
Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

aws_s3_bucket.b: Refreshing state... [id=test]

------------------------------------------------------------------------


An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_s3_bucket.b will be created
  + resource "aws_s3_bucket" "b" {
  	+ acceleration_status     	= (known after apply)
  	+ acl                     	= "private"
  	+ arn                     	= (known after apply)
  	+ bucket   	               = "test"
  	+ bucket_domain_name      	= (known after apply)
  	+ bucket_regional_domain_name = (known after apply)
  	+ force_destroy           	= false
  	+ hosted_zone_id          	= (known after apply)
  	+ id                          = (known after apply)
  	+ region                  	= (known after apply)
  	+ request_payer           	= (known after apply)
  	+ tags                    	= {
      	+ "Environment" = "Dev"
      	+ "Name"	    = "My bucket"
    	}
  	+ website_domain          	= (known after apply)
  	+ website_endpoint        	= (known after apply)

  	+ versioning {
      	+ enabled	= (known after apply)
      	+ mfa_delete = (known after apply)
    	}
	}

Plan: 1 to add, 0 to change, 0 to destroy.

------------------------------------------------------------------------

Note: You didn't specify an "-out" parameter to save this plan, so Terrafor can't guarantee that exactly these actions will be performed if
"terraform apply" is subsequently run.

$ terraform apply
aws_s3_bucket.b: Refreshing state... [id=test]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # aws_s3_bucket.b will be created
  + resource "aws_s3_bucket" "b" {
  	+ acceleration_status     	= (known after apply)
  	+ acl                     	= "private"
  	+ arn         	            = (known after apply)
  	+ bucket                  	= "test"
  	+ bucket_domain_name      	= (known after apply)
  	+ bucket_regional_domain_name = (known after apply)
  	+ force_destroy           	= false
  	+ hosted_zone_id          	= (known after apply)
  	+ id                      	= (known after apply)
  	+ region                  	= (known after apply)
  	+ request_payer           	= (known after apply)
  	+ tags                    	= {
      	+ "Environment" = "Dev"
      	+ "Name"    	= "My bucket"
    	}
  	+ website_domain          	= (known after apply)
  	+ website_endpoint        	= (known after apply)

  	+ versioning {
      	+ enabled	= (known after apply)
      	+ mfa_delete = (known after apply)
    	}
	}

Plan: 1 to add, 0 to change, 0 to destroy.

Do you want to perform these actions in workspace "kitchen-terraform-base-aws"?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_s3_bucket.b: Creating...
aws_s3_bucket.b: Creation complete after 0s [id=test]

Create a test.py file with the following code to test the deployment of the S3 bucket.

import boto3


def test_s3_bucket_creation():
	s3 = boto3.client(
    	's3',
    	endpoint_url='http://localhost:4572',
    	region_name='us-east-1'
	)
	# Call S3 to list current buckets
	response = s3.list_buckets()

	# Get a list of all bucket names from the response
	buckets = [bucket['Name'] for bucket in response['Buckets']]

	assert len(buckets) == 1

Test that the bucket was created.

$ pytest test.py
=============================================================== test session starts ===============================================================
platform darwin -- Python 3.6.0, pytest-5.2.2, py-1.8.0, pluggy-0.13.0
rootdir: /private/tmp/myterraform/tests/test/fixtures
plugins: localstack-0.4.1
collected 1 item

test.py .

Now, let’s destroy the S3 bucket.

$ terraform destroy
aws_s3_bucket.b: Refreshing state... [id=test]

An execution plan has been generated and is shown below.
Resource actions are indicated with the following symbols:
  - destroy

Terraform will perform the following actions:

  # aws_s3_bucket.b will be destroyed
  - resource "aws_s3_bucket" "b" {
  	- acl                     	= "private" -> null
  	- arn                     	= "arn:aws:s3:::test" -> null
  	- bucket                  	= "test" -> null
  	- bucket_domain_name      	= "test.s3.amazonaws.com" -> null
  	- bucket_regional_domain_name = "test.s3.amazonaws.com" -> null
  	- force_destroy           	= false -> null
  	- hosted_zone_id          	= "Z3AQBSTGFYJSTF" -> null
  	- id                      	= "test" -> null
  	- region                  	= "us-east-1" -> null
  	- tags                    	= {
      	- "Environment" = "Dev"
      	- "Name"    	= "My bucket"
    	} -> null

  	- object_lock_configuration {
    	}

  	- replication_configuration {
    	}

  	- server_side_encryption_configuration {
    	}

  	- versioning {
      	- enabled	= false -> null
      	- mfa_delete = false -> null
    	}
	}

Plan: 0 to add, 0 to change, 1 to destroy.

Do you really want to destroy all resources?
  Terraform will destroy all your managed infrastructure, as shown above.
  There is no undo. Only 'yes' will be accepted to confirm.

  Enter a value: yes

aws_s3_bucket.b: Destroying... [id=test]
aws_s3_bucket.b: Destruction complete after 0s

Destroy complete! Resources: 1 destroyed.

Next, we will install the terraform-compliance python module.

pip install terraform-compliance

Next, we will set up the directory for our test.

mk features
cd features

Next, make a file named s3.features inside the features directory with the following content.

Feature: test

 	In order to make sure the s3 bucket is secure:

 	Scenario: No public read
     	Given I have AWS S3 Bucket defined
     	When it contains acl
     	Then its value must not match the "public-read" regex

Now, we will return to the root directory for the project and run a terraform plan to get the plans output in json format.

terraform plan -out=myout
terraform show -json myout > myout.json

Lastly, we will test the terraform project against the feature file to see if the project is compliant.

$ terraform-compliance -p /tmp/junk/myout.json -f /tmp/junk/features
terraform-compliance v1.1.7 initiated

🚩 Features : /tmp/junk/features
🚩 Plan File : /tmp/junk/myout.json

🚩 Running tests. 🎉

Feature: test  # /tmp/junk/features/s3.feature
	In order to make sure the s3 bucket is secure:

	Scenario: No public read
    	Given I have AWS S3 Bucket defined
    	When it contains acl
    	Then its value must not match the "public-read" regex

1 features (1 passed)
1 scenarios (1 passed)
3 steps (3 passed)

As you will notice from the results, all tests passed because the S3 bucket deployed is private.

While these are just basic examples, they are intended to demonstrate the concept of unit testing infrastructure-as-code, and testing for various rules.

Read the Entire DevOps Testing Series


Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

 “The best work environment I've ever had.  People are honest, trustworthy, respectful, professional. I'm never leaving! I truly appreciate that our leadership has years of hands-on experience developing and implementing solutions for customers, and can easily discuss technical details with customers and developers. The owners of Trility are truly unique. They set the atmosphere, tone, pace, and ethos, and have built an organization that is unique from other organizations.”

 – Will Rubel 
Categories
Cloud & Infrastructure

Part II: 2019 Cloud Breaches Prove DevOps Needs Dedicated Testers

To prove that DevOps needs a tester, you have to look no further than IdentityForce.com‘s biggest breaches of 2019 and review the types of breaches involved, and investigate why they occurred.

If you notice, a large percentage of the breaches were related to misconfiguration of cloud storage, and the lack of multi-factor authentication to access systems.

So who is primarily at fault: development, operations, security, networking, or DevOps?

While there could be many reasons for ‘open’ cloud storage and single-factor authentication to systems, I would suggest these are DevOps-related mistakes, and DevOps failed to: (1) properly test the security configuration of cloud storage prior to deployment, and (2) also failed to set up multi-factor authentication for accessing systems, and scan images for proper authentication to systems.

Last Line of Defense before Deployment is the Continuous Integration/Continuous Delivery

Some may argue that operations, security and/or networking departments are at fault, but the last line of defense before deployment is the Continuous Integration/Continuous Delivery (CI/CD) pipeline, which should include the application of common rule-sets and tests and is primarily the responsibility of DevOps.

Terraform, Sentinel, Nexpose, Other Tools

Others will argue, proper CI/CD tools, such as Terraform, Sentinel, or Nexpose, or setting-up AWS config rules and using OpsWorks will prevent these issues; and they would be partially correct. These tools provide a layer of security and protection which is similar to application vulnerability scanning tools, but they do not replace unit-testing or integration testing.

Unit Testing Ensures Actual Results Meet Expected Ones

The purpose of unit testing is to ensure the actual results match the expected results. Using public cloud storage as an example, the infrastructure project to create the cloud storage should contain unit-tests which include a: 

  1. check for existence 
  2. check authorizations
  3. check security settings 

Upon deployment of the project, the CI/CD pipeline will execute the unit-test, and if passing, perform integration testing.

Integration Testing for Cross-Boundary Access

The purpose of integration testing is to test individual units combined as a group. From an infrastructure perspective, this means the testing of cross-boundary access, permissions, and functionality. Utilize the public cloud storage example, and assuming the cloud storage had a permission to allow another account access to the storage, there would need to be an integration test for an external account to access the cloud storage – but who writes this code, and how do they know they need to write it?

This is where the concept of a DevOps tester is most applicable. Two separate infrastructure projects have been deployed; one for an account which has a dependency on cloud storage in a separate account, and one for cloud storage in a separate account. Ideally, DevOps should have recognized the dependency when creating the account, and created a unit-test that tests the permission of a mocked-up storage account. Someone would then need to write a separate integration test that is run in the CI/CD pipeline upon completion of both deployments.

Managing the inter/intra project dependencies, ordering, and priority of various infrastructure projects could become very overwhelming for DevOps, and is one of the primary reasons a DevOps tester is needed. Currently, I’m only seeing minimal infrastructure unit-testing, and not seeing any coordinated integration testing across infrastructure projects.

Just like when developers first began performing unit and integration testing, they performed these functions themselves. As the need arose, organizations would hire a software tester, and software testers would take more and more of the testing responsibilities; until software testing fully matured. DevOps is no different than normal software development and is still maturing as a concept.

Infrastructure as Code Maturity Will Require Quality Gateways

As Infrastructure as Code becomes the norm, unit-testing and integration testing will become more common. Eventually, we will mature to a point where we are evaluating infrastructure code for code-quality, and preventing deployments that do not meet quality gateways.

The bottom-line: Infrastructure as Code will eventually mature to include unit-tests and integration testing, and become very similar to a normal software development lifecycle. Organizations should begin to further refine their own strategy on how this maturation will occur, and who will be responsible for the infrastructure testing.

In my next article to publish tomorrow, I provide Practical Examples of DevOps Unit Testing

Read the Entire DevOps Testing Series


Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

 “The best work environment I've ever had.  People are honest, trustworthy, respectful, professional. I'm never leaving! I truly appreciate that our leadership has years of hands-on experience developing and implementing solutions for customers, and can easily discuss technical details with customers and developers. The owners of Trility are truly unique. They set the atmosphere, tone, pace, and ethos, and have built an organization that is unique from other organizations.”

 – Will Rubel 
Categories
Cloud & Infrastructure

Part I: Does DevOps Need Dedicated Testers?

As a DevOps/Cloud Engineering professional, and human being, I will make eight mistakes for every 100 words typed. This means I make hundreds, if not thousands, of mistakes each week. 

So how do I catch my mistakes? I would like to say I write good unit and integration tests for my infrastructure-related code and have over 90 percent code coverage, but this would be a lie. 

In fact, if you’re like most DevOps and cloud engineering professionals, you are not expected to write unit and integration tests and will rely on external tools to test the infrastructure-related errors. So, why aren’t the same unit and integration testing procedures, which are applied to application code, being applied to infrastructure code?

So, why aren’t the same unit and integration testing procedures, which are applied to application code, being applied to infrastructure code?

While the infrastructure team can utilize resources like Terraform, localstack and Terraform-compliance to mock and test resources, they can not mock the platform and services which will live within the infrastructure. Thus, infrastructure teams will do actual deployments to the development environment, in order to test their infrastructure.

Unfortunately, from a developer-perspective, the development environment is ‘production’, and is expected to be stable, and always available. Developers do not want downtime because the infrastructure team is deploying and testing an infrastructure change – and breaks something.

So, how do we resolve this conflict, in the simplest way possible (assuming the development environment is used 24 hours per day)?

I’ve had good results utilizing the same software testing strategy utilized for applications,  for the infrastructure code-base.

By having infrastructure-related unit and integration tests written and tested against the infrastructure code prior to deployment to a development environment, you can ensure infrastructure changes will not break the development environment.

Infrastructure Unit tests might include:

  • Testing the resource is created and has the proper parameters
  • Testing pipeline logic to handle exceptions

Infrastructure Integration tests might include:

  • Testing connectivity
  • Testing security
  • Testing permissions

Application/Platform/Service integration tests might include:

  • Testing Network Access Control Lists
  • Testing Security Groups
  • Testing Route Tables
  • Testing Permissions
  • Testing for infrastructure controlled keys
  • Testing for shared resources, and access to shared resources

Writing Good Tests Requires Infrastructure, Architectural Knowledge 

While development software testers could write Application/Platform/Service tests, they may not have the infrastructure and architectural knowledge to understand how to write good tests. Instead, a DevOps Software Tester team should be responsible for coordinating with all development software testers for infrastructure-related integration tests.

The infrastructure-related integration tests would then become part of the infrastructure deployment pipeline.

For example, before any infrastructure-related changes are deployed to the ‘development’ environment, the infrastructure should be deployed to a new environment and validated. Once all tests are passing, then the infrastructure is deployed. In addition, like with application code, infrastructure code should have at least 90 percent code coverage for all infrastructure resources, contain good infrastructure-related integration tests, and have  90 percent coverage for application-related integration tests.

While this solution does not guarantee an outage to your development environment, it applies a consistent, organizational-wide testing strategy for all code, and should help to catch many of the infrastructure-related coding mistakes.

It also provides an additional career path for software testers to enhance their knowledge and skills, and teach DevOps and cloud engineers how to do proper testing.

Or, you can continue to deploy infrastructure-as-code and not write any unit or integration tests.

Read the Entire DevOps Testing Series

To further support this growing need, I will publish three more articles in the coming days.


Join the Team

We are always looking for people who love problems and welcome the hard work required to solve them.

“The best work environment I've ever had. People are honest, trustworthy, respectful, professional. I'm never leaving! I truly appreciate that our leadership has years of hands-on experience developing and implementing solutions for customers, and can easily discuss technical details with customers and developers. The owners of Trility are truly unique. They set the atmosphere, tone, pace, and ethos, and have built an organization that is unique from other organizations.”

 – Will Rubel