Compliance Evidence Collector

Architecture

How it works

Click any component to see implementation details.

🔌

API Sources

Okta · GitHub · AWS

Source Connectors

Each source has a dedicated connector module that authenticates via bearer token, handles pagination, and normalizes API responses into a standard evidence format.

→

📋

YAML Controls

soc2_controls.yaml

Compliance-as-Code

Controls are externalized in YAML — GRC analysts can update thresholds, add new controls, or adjust pass/fail criteria without touching application code.

→

⚙️

Evaluator

PASS · WARN · FAIL

Evaluation Engine

Compares evidence values against YAML-defined thresholds using a safe operator lookup. Produces a three-tier result — PASS, WARNING, or FAIL — with human-readable messages for each control.

→

📊

Reports

JSON · CSV

Audit-Ready Output

Structured reports with control ID, status, severity, actual values, and thresholds. Exportable as JSON for automation or CSV for spreadsheet review.

Evidence Collection

Select a scenario and collect evidence

SCENARIO:

🔐

Identity Provider

Simulated Okta-style identity data: MFA enrollment, inactive accounts, admin access reviews.

CC6.1CC6.2 CC6.3CC6.6

🐙

GitHub

Repository security: branch protection, PR review requirements, secret scanning coverage.

CC8.1CC8.1.2 CC8.1.3

☁️

AWS

Cloud infrastructure: CloudTrail logging, encryption at rest, root account MFA enforcement.

CC7.1CC7.2 CC7.3

compliance-evidence-collector

Evaluation Results

Control	Name	Status	Severity	Details

Control Definitions

Compliance-as-Code

Controls are defined in YAML — add new checks without touching code.

Under the Hood

How It Works

The collector authenticates to each source API using secure credential management, retrieves configuration and policy data, then evaluates findings against control thresholds defined in a YAML-based control catalog.

Each evidence source has a dedicated collector module that normalizes API responses into a standard evidence format. This modular design allows new sources to be added without modifying the evaluation engine — when a new compliance source is required, only a new collector module needs to be written.

The evaluation engine processes each collected data point against its corresponding control definition, applying the defined threshold logic to produce an instant PASS, WARNING, or FAIL determination. Results are aggregated into a structured report exportable as JSON for pipeline automation or CSV for audit review.

Logic Flow by Evidence Source

🔐 Identity Provider (Okta)

Queries user and policy APIs to evaluate: MFA enforcement coverage across all user types, password policy compliance against defined minimum standards, inactive account identification, and admin privilege distribution.

🐙 Code Repository (GitHub)

Evaluates repository security posture including: branch protection enforcement on production branches, code review requirements (minimum reviewer thresholds), secrets scanning activation, and access control configuration.

☁️ Cloud Infrastructure (AWS)

Assesses cloud security configuration across: encryption-at-rest status for storage services, access logging enablement via CloudTrail, and IAM policy compliance including root account protection.

Architecture

Design Decisions

Key architectural choices and the reasoning behind them.

Design Decision

Why YAML-driven controls?

Control definitions are externalized in YAML files rather than hardcoded. This means GRC analysts can update control thresholds, add new controls, or adjust pass/fail criteria without touching code — critical for maintaining audit readiness as frameworks evolve.

Design Decision

Why modular collectors?

Each API source has an independent collector module. When a new compliance source needs to be added (e.g., a new cloud provider or identity platform), only a new collector needs to be written — the evaluation engine and reporting layer remain unchanged.

Design Decision

Why automated pass/fail?

Traditional evidence collection requires manual review of each control. The automated evaluation engine applies defined thresholds to collected evidence and produces instant pass/fail determinations, reducing audit prep from weeks to minutes.

Technology

Technology Stack

Technologies chosen for this tool and the rationale behind each selection.

Python

Chosen for rich API client libraries, rapid prototyping, and broad adoption in GRC automation tooling.

REST APIs

Direct integration with source platforms (Okta, GitHub, AWS) for real-time evidence retrieval without intermediary services.

YAML

Human-readable control definitions that GRC analysts can maintain without engineering involvement — keeping compliance logic accessible.

CLI

Scriptable command-line interface enables scheduled and automated evidence collection runs within existing CI/CD or audit workflows.