Auto Remediation Architecture
The remediation workflow starts in Wiz. A custom control detects a misconfigured resource, then a
Wiz automation rule sends the finding payload into an AWS event pipeline. The platform account
validates and routes the event before a remediation Lambda assumes a controlled role in the customer
account and applies the approved fix.
Detection Plane
Wiz
01
Wiz Control
Detects a policy violation or cloud resource misconfiguration.
02
Automation Rule
Triggers on the control result and sends a structured finding payload.
Platform Account
AWS Event Pipeline
03
SNS Topic
Receives the event and fans out messages for downstream processing.
04
SQS Queue
Buffers events, supports retry behavior, and protects Lambda from spikes.
05
Lambda Orchestrator
Validates eligibility, loads config, and selects a remediation path.
Trust Boundary
Cross-Account Access
06
AssumeRole
Uses external ID, scoped permissions, and traceable session context.
07
Customer Account
Applies the approved resource update in the affected account.
Evidence Plane
Audit + Feedback
08
Audit Log
Records decision, action, status, failure reason, and evidence.
09
Notify / Review
Routes skipped, failed, or sensitive findings for human review.
Event durability
SQS absorbs bursts, supports retry, and gives failed events a replay path.
Least privilege
Customer account remediation roles expose only the APIs needed for approved fixes.
Decision evidence
Every automated, skipped, and failed action is tied back to the Wiz finding.
Challenge
Some cloud security findings are repetitive and time-sensitive, but remediation can create risk if
automation changes resources without understanding context. The goal was to automate only the cases
where conditions were clear and defensible.
Approach
- Created Wiz controls that identify specific misconfiguration patterns and provide a consistent finding payload.
- Configured Wiz automation rules to trigger only when the control detects a qualifying resource.
- Used SNS and SQS as a durable handoff between Wiz and Lambda so remediation events can be buffered and retried.
- Used Lambda to validate payload shape, severity, affected resource, account metadata, and remediation eligibility.
- Assumed a scoped remediation role in the customer account before applying any cloud resource change.
- Defined remediation eligibility checks for severity, resource type, environment, ownership, and exception state.
- Separated detection, decision, action, and audit logging so each step could be tested independently.
- Routed ambiguous findings, protected environments, and missing metadata to notification or manual review instead of automatic action.
Implementation Details
The playbook separates the remediation decision from the remediation action. A finding must pass
eligibility checks before any change is attempted, and ambiguous cases move to notification or review
rather than automatic execution.
eligible_for_remediation =
severity in ["HIGH", "CRITICAL"]
and environment != "restricted-production"
and owner is not None
and exception_status != "approved"
and resource_type in approved_resource_types
and proposed_change in allowed_actions
Cross-account remediation uses a dedicated execution role in the platform account and a customer
account role that grants only the permissions needed for the approved fix. The assumed-role session
should include traceable session names and correlation IDs from the Wiz payload so actions can be
connected back to the original finding.
Trust Boundaries and Failure Handling
- Wiz to AWS: Only accepted automation payloads are processed; malformed events are logged and rejected.
- SNS to SQS: Queueing creates back pressure protection and gives failed events a retry path.
- Lambda to customer account: Remediation requires explicit role trust, scoped permissions, and account allowlisting.
- Failure path: Failed role assumption, missing resource context, or denied API calls are recorded with the finding ID and routed for review.
- Audit path: Every skipped, failed, and successful decision writes evidence for security operations and customer reporting.
Outcomes
Guardrailed response
Remediation only proceeds when severity, environment, owner, exception, and action checks align.
Event-driven scale
SNS and SQS decouple Wiz from Lambda so findings can be processed reliably without losing events during spikes.
Human review where needed
Findings with missing metadata or sensitive production context are routed for review instead of changed blindly.
Auditable changes
Each decision can be logged with the finding, resource, eligibility result, action, and final status.
Security Controls Demonstrated
- Reduced response time for recurring high-confidence misconfigurations.
- Improved trust in automation by making the decision path explicit.
- Kept human review in the loop for cases where automated action was not safe enough.
- Produced audit records that explain what was changed and why.
Production Considerations
- Policy-as-code rules for remediation eligibility and exception handling.
- Change approval integration for sensitive production resources.
- Rollback metadata for actions that can be safely reversed.
- Dead-letter queue processing and replay tooling for failed remediation events.
- Per-customer role permission reviews to keep remediation access least privilege over time.