Security Chaos Engineering 101: Getting Your Hands Dirty
Security Chaos Engineering (SCE) is a novel approach to cyber security; its core fundamentals are based on the principles of chaos engineering, though the objective is to enable cyber resiliency. Chaos engineering allows enterprises to survive outages that might result from availability and performance-related faults. Conversely, by adopting SCE, enterprises can become resilient to cyber attacks, e.g. ransomware attacks.
This article presents adopting SCE as a security engineering practice that is not esoteric but achievable. The primary motivation is to allow security engineering and professionals, in general, to view SCE as any other security engineering effort and demystify the effort and impact of adopting it. Furthermore, several misconceptions about SCE are addressed in this article to present objective information and clarity of knowledge.
This article is a follow-up to an earlier article in which several fundamental aspects of Security Chaos Engineering were discussed, including some common misconceptions. In this article, practical examples of conducting SCE experiments are provided.
Hello World SCE
Most of us already practice some form of SCE unknowingly. Creating, modifying, or deleting cloud resources forms the basis for SCE experiments, and these are the foundational techniques. So why not directly call it SCE, two major points: mindset and intent.
Adopting the Right Mindset
The mindset adopted for SCE is critical for crafting the right hypothesis and conducting successful experiments. Mindset generally refers to a way of thinking, an attitude, opinion, especially a habitual one. It is critical to have an `assume-breach` mindset; otherwise, a conflicting hypothesis might be crafted that does not support effective experiments. A mindset able to challenge existing beliefs and culture is requisite. Talking about an `assume breach` mindset, the assumption that an attacker can gain access into a cloud environment needs to be taken. This is the first hurdle a conflicting mindset will encounter, the need to be convinced that attackers can by-pass an `iron-cladded` preventive defense. The examples for these kinds of compromise abound e.g. the LastPass data breaches.
Adopting the Right Intent
The intent for conducting an SCE experiment is encapsulated in learning from failures and being proactive. You want to get evidence about specific assumptions before making conclusions. Ideally, security decisions should not be based on gut feelings or vendor promises but on experiments, facts and data. There is room for knowledge that comes from experience; however, this has to be balanced with empirical analysis. Also, as discussed in the last blog article about SCE misconceptions , the intention is not to overwhelm the environment with balistic attacks. Attempts to do this will result to burnout, stress and displeasure from management and other security folks. The key thing is to start small, learn and improve your strategies gradually.
SCE Experiment - Public S3 Bucket
The experiment we will use is based on a public S3 bucket. The aim is to experiment and gather evidence about the events and reactions that would unfold if an S3 bucket becomes public, intentionally, mistakenly, or due to adversarial action. We will be using an existing bucket for this experiment. However, feel free to create a new bucket. The aim is to observe what happens when the S3 bucket already exists; if a security control works effectively.
Step 1: Establish The Steady State
Once the target bucket has been selected, the steady state has to be established. For this example, the steady state can be as simple as the configuration of the target S3 bucket. Infrastructure-as-Code can be leveraged for marking the steady state.
Step 2: Make the S3 Bucket Public
There are several ways to make an AWS bucket public. Two of these methods are shown below using the AWS CLI. The first command allows everybody on the internet to access the bucket and its contents (objects).
aws s3api put-bucket-acl --bucket sce-experiment --acl public-read-write
The second command completely disables the `public-access-block-configuration`.
aws s3api delete-public-access-block --bucket sce-experiment
Steps 3 & 4: Observe
When the bucket is made public, a few events are expected for a well-secured AWS account. Ordinarily, security controls are deployed to prevent, detect or recover from security events. These security controls are contextual to the environment, based on the security architecture. Let's assume GuardDuty is deployed as a detective security control; hence it is expected to raise an alert based on a previous configuration. Note the assumption is that GuardDuty has been configured to send alerts based on some rules to a slack channel. For details on configuring GuardDuty alerts with slack notifications, visit the following documentation. When you try this in your environment, depending on your set-up, you might receive the alerts as GuardDuty findings. Some key questions to consider
- Can you identify the exact GuardDuty finding?
- Does the GuardDuty finding make sense to you, i.e. can you interpret it?
- Is the GuardDuty finding actionable?
- How long did the GuardDuty finding take to arrive from when the bucket was made public?
More questions can be creatively carved out but let's keep it simple.
Step 5: Recover
Finally, we will like to return the bucket to its steady state. This can be done using the AWS CLI or via terraform. It could be possible to adopt other strategies that allow the persistence of cloud resources, e.g., an agile cloud inventory and asset management system can be leveraged to roll back the earlier changes.
aws s3api put-public-access-block --public-access-block-configuration BlockPublicAcls=true,IgnorePublicAcls=true,BlockPublicPolicy=true,RestrictPublicBuckets=true --bucket sce-experiment
Step 6: Analysis and Planning
The observations and results from the experiment are critical and useful for making fact-based decisions that improve security and cyber-resiliency. Several approaches can be adopted. For example, in our S3 experiment, we expected notifications from GuardDuty via slack integration. However, timely notifications are more useful, as they could breach the gap between a successful attack and a stopped one. Hence a lesson to derive will be to determine practically how long it takes to get the GuardDuty notification and decide if the delivery time is acceptable. An improvement to this will be the implementation of S3 bucket events and accompanying Lambda functions. This allows the events to be triggered and reported almost immediately. See the details of implementing S3 bucket events in the AWS documentation. After this improvement, a follow-up experiment could be conducted to verify its effectiveness and other necessary improvements. Note this is just one dimension of improvement. Answers to the posed questions are contextual, and answering them provides proper guidance for the right improvement approaches. The key thing is to quickly evaluate your security controls and investments and make informed, evidence-based decisions.
The Mitigant SCE Platform
Mitigant SCE platform aims to facilitate cyber-resiliency as a first-class citizen in cloud-native infrastructure. It is suitable for companies of all sizes and allows quick and safe adoption of SCE without going through the cost and resource overhead. The cost of implementing an SCE strategy could be daunting for most enterprises. Mitigant solves these challenges by providing a SaaS platform.
Mitigant SCE platform consists of several cloud attacks which can be leveraged as building blocks for constructing complex attack scenarios against AWS infrastructure. The platform enables safe and controlled SCE experiments, attacks can be started and stopped with button clicks, and all changes made to the cloud infrastructure are rolled back and restored seamlessly. Additionally, all attacks are mapped to the MITRE ATT&CK library, enabling the implementation of real-world attacks in the wild.
Sign up today for a free trial of the Mitigant SCE platform to help build cyber-resiliency for cloud infrastructure at https://mitigant.io/sign-up.