Defeating Ransomware Attacks With Security Chaos Engineering - Part II
The first blog post in this 2-part series discussed ransomware’s impact, existing countermeasures, and limitations of these countermeasures. Security Chaos Engineering (SCE) was thereafter introduced as an effective approach for overcoming the limitations of these contemporary countermeasures. This blog post digs deeper into how SCE approaches can be leveraged to enable resilience against ransomware attacks in AWS Infrastructure. A use-case scenario of AWS S3 ransomware is provided to substantiate our assertions.
Security Chaos Engineering
Security Chaos Engineering (SCE) is an emerging cyber security sub-domain focused on evidence-based approaches rather than assumed security postures. SCE emerged from Chaos Engineering, a discipline that has helped several companies overcome outages, performance, and availability-related challenges. Due to its relationship with chaos engineering, SCE employs hypotheses crafted to express desired security outcomes. Unlike traditional cyber-security approaches, SCE techniques take practical steps to prove outlined hypotheses. These steps are experiments themselves, and the result is essential for iterative security hardening efforts.
The ability to continuously and incrementally implement SCE techniques provides a sound understanding of the security risks and gives confidence to defenders. Achieving this requires automation and sensible integration with the target system, such as cloud infrastructure. At the core of SCE is the notion of verifying the cardinal security properties: confidentiality, availability, and integrity. A straightforward way to achieve this notion is by evaluating security controls that ensure security properties are intact. Typically, security controls fall into three categories: detective, preventive, and recovery security control mechanisms. SCE techniques can be leveraged to verify security mechanisms in any of these categories, as illustrated in Figure 1.
Verifying Ransomware Readiness: AWS S3 Ransomware Attack Scenario
Virtually all cloud resources are vulnerable to cloud ransomware; this might be surprising to some readers. An understanding of the Shared Responsibility Model is imperative to erasing this misconception. Hopefully, another blog post will dive into this model; the takeaway here is that cloud infrastructure is as vulnerable to ransomware attacks as on-premises infrastructure. Luckily, the cloud offers unique opportunities, which could be harnessed to enable ransomware countermeasures. Broadly, the probability of a successful ransomware attack depends on the overall quality of the security posture. Hence, reducing the attack probability requires continuous maintenance of a solid cloud security posture.
Let's consider an AWS S3 ransomware scenario to describe the use of SCE techniques to defend against ransomware attacks. Every SCE experiment starts with the construction of a hypothesis. The hypothesis for this scenario is - The S3 Buckets in our environment are not vulnerable to ransomware attacks. To prove this hypothesis, an SCE experiment consisting of 7 steps is implemented. One or more attacks are orchestrated in each step, the outcomes are observed, and critical deductions are made. The deductions should highlight an understanding of the identified weaknesses in the ransomware countermeasures. Furthermore, it is essential to analyze the impact on confidentiality, integrity, and availability. The attack steps for this SCE experiment are:
Step 1 (Get Steady State): The steady state is the actual state of the cloud infrastructure before the start of the SCE experiment. It is a critical requirement for rolling back the infrastructure at the end of the SCE experiment, i.e., reversibility. Several approaches exist for establishing the steady-state, e.g., Infrastructure-as-Code.
Step 2 (Create User Bob): An IAM user named Bob is created and assigned permissions necessary to access S3 buckets. Ordinarily, the access control mechanisms, e.g., CIEM (Cloud Infrastructure Entitlement Management), and threat detection tools, e.g., AWS GuardDuty, should detect this event and generate an alert or even prevent the event completely.
Step 3 (Get S3 Buckets): The newly created user - Bob, enumerates the AWS S3 buckets and tries to identify buckets with sensitive data. This involves making multiple API calls, e.g.,’ aws s3api ls’. These API calls are often noisy, thus triggering alerts if cloud security tools are well configured. But due to alert fatigue challenges, there is a high chance that security teams will ignore such alerts. Security mechanisms that correlate alerts from several sources in real-time are better positioned to prioritize the alerts and react.
Step 4 (Compromise an S3 bucket): Bob identifies a suitable bucket and takes over the bucket. First, the objects are exfiltrated, and then legitimate access to the bucket by owners is revoked. These actions are noisy and potentially trigger security alerts. AWS Cloudtrail does not cover object-level events. S3 server logs are needed for capturing these events.
Step 5 (Encrypt Bucket): At this point, Bob encrypts the bucket's contents with his encryption key, essentially locking access to the bucket and its contents. This step is common to all ransomware attacks. The AWS Key Management service might be used to provide the encryption key, from an account that is not accessible to the victim, this way
Step 6 (Request Ransomware): This is the most dreaded part, Bob contacts the owner of the AWS account and requests a ransom. Further conditions could be placed by Bob to create a sense of urgency, e.g., objects are deleted after every delayed hour. Normally, security incident response activities are triggered at this point and could be, unfortunately, not helpful if necessary countermeasures have not been implemented. These incident response efforts are largely reactive and at best reduce the damage inflicted by the attackers. There are several proactive countermeasures recommended by AWS that might be helpful:
1. S3 Object Versioning: This allows for keeping a unique copy of each modified object. However, these copies are kept in the same bucket, thus accessible to an attacker following a successful bucker compromise.
2. MFA Delete: By configuring this feature, objects cannot be deleted without accessing the bucket owner's MFA device. This feature could be a hindrance for attackers though not a deterrence.
3. Object Lock: By enabling this feature, objects cannot be edited after a single write operation. The effective period could be either permanent or until a specified date.
4. AWS Backup for S3: This is a new AWS service that provides the possibility of backing up AWSS3 buckets that have enabled versioning (Figure 3). This is a great approach; however, it might be necessary to implement a holistic plan that caters to the recovery options following ransomware attack.
Step 7 (Restore Steady State): This is the last step in the SCE experiment, and all actions carried out are rolled back and cleaned up. The steady state defined in Step 1 is used to enable a seamless operation.
The steps highlighted above are stripped-down versions of what might happen in a real ransomware attack. The most important lesson here is to verify if the necessary security measures either detect, prevent or recover malicious activities as designed.
Mitigant’s Security Chaos Engineering Platform
Mitigant implements several SCE techniques as a self-service that can be easily and safely used for conducting security gameday exercises for security teams. This drastically reduces the time and effort required for implementing SCE techniques, most importantly, the learning curve for implementing SCE is bypassed. Ultimately, the lessons learned can be iteratively applied to harden the cloud security posture thereby reducing the chances of successful ransomware attacks. Mitigant's SCE aligns with the recommended practice of conducting simulation exercises to verify security incident response. The AWS Incident Response Guide provides some simulation examples which may be challenging to implement. With Mitigant SCE, such simulation services are completely automated and contextualized to fit your infrastructure. Furthermore, SCE delivers a superior value proposition due to the use of realistic attacks instead of simulations.
Conclusion
This article demonstrated how Security Chaos Engineering techniques provide unique opportunities for verifying the efficiency of detective, preventive, and recovery security controls. The specific use case of an AWS S3 ransomware attack scenario was used to substantiate several SCE techniques. Starting from a defined hypothesis, a 7-step SCE experiment is implemented to verify how the efficiency of the cloud security controls. These steps help in identifying violations of confidentiality, integrity, and availability that might lead to successful ransomware attacks.