Hebert Ntse
Back to systems
Case Study · Remediation · Guardrails

Wiz auto remediation with safety checks built in.

Built remediation logic for cloud misconfigurations where automation can reduce exposure quickly, while still respecting ownership, exceptions, production sensitivity, and operational risk.

Wiz
SNS
SQS
Lambda
Assume Role
Audit
Architecture

Control result to customer-account action.

The remediation workflow starts in Wiz. A custom control detects a misconfigured resource, then a Wiz automation rule sends the finding payload into an AWS event pipeline. The platform account validates and routes the event before a remediation Lambda assumes a controlled role in the customer account and applies the approved fix.

  • Wiz Control detects a policy violation or cloud resource misconfiguration.
  • Automation Rule sends a structured finding payload to SNS.
  • SQS buffers events, supports retry behavior, and protects Lambda from spikes.
  • Lambda validates eligibility, loads config, and selects a remediation path.
  • Assume Role uses external ID, scoped permissions, and traceable session context.
  • Audit logging records decision, action, status, failure reason, and evidence.
Challenge

Fast remediation can create risk without context.

Some cloud findings are repetitive and time-sensitive, but remediation can create risk if automation changes resources without understanding environment, ownership, exceptions, and blast radius. The goal was to automate only cases where conditions were clear and defensible.

Implementation

Separate the remediation decision from the action.

A finding must pass eligibility checks before any change is attempted. Ambiguous cases move to notification or review rather than automatic execution.

eligible_for_remediation =
  severity in ["HIGH", "CRITICAL"]
  and environment != "restricted-production"
  and owner is not None
  and exception_status != "approved"
  and resource_type in approved_resource_types
  and proposed_change in allowed_actions

Cross-account remediation uses a dedicated execution role in the platform account and a customer account role that grants only the permissions needed for the approved fix. Session names and correlation IDs tie actions back to the original Wiz finding.

Guardrails

Automation proceeds only when the decision is safe.

  • Severity, control ID, resource type, account, and environment must match the remediation catalog.
  • Missing owner or unclear application context routes to notification instead of automatic change.
  • Approved exceptions, waivers, or suppression tags are logged and skipped with evidence.
  • Production and restricted environments can require approval, dry-run mode, or a narrower action set.
  • Cross-account access uses scoped Assume Role trust instead of long-lived credentials.
  • Denied API calls, failed role assumption, and unsupported resources are recorded and sent for review.
Decision Matrix

Predictable actions for different risk signals.

Critical exposure, approved control, owner found, no exception

Auto-remediate and write audit evidence.

Critical exposure, owner missing

Notify security queue and application team lookup path.

Exception tag or approved waiver present

Skip remediation, record exception reference, and close the loop.

Restricted production resource

Require approval or dry-run evidence before change.

Unsupported resource type or malformed payload

Reject event, log failure reason, and route for review.