โ† ventsislava.com Compliance Evidence Collector
SOC 2
PORTFOLIO PROJECT
SYSTEM STATUS: ACTIVE

Compliance Evidence
Collector

Automated SOC 2 evidence collection from Identity, GitHub, and AWS sources. Evaluates findings against YAML-defined controls and generates audit-ready reports.

How it works

Click any component to see implementation details.

๐Ÿ”Œ
API Sources
Okta ยท GitHub ยท AWS
Source Connectors
Each source has a dedicated connector module that authenticates via bearer token, handles pagination, and normalizes API responses into a standard evidence format.
โ†’
๐Ÿ“‹
YAML Controls
soc2_controls.yaml
Compliance-as-Code
Controls are externalized in YAML โ€” GRC analysts can update thresholds, add new controls, or adjust pass/fail criteria without touching application code.
โ†’
โš™๏ธ
Evaluator
PASS ยท WARN ยท FAIL
Evaluation Engine
Compares evidence values against YAML-defined thresholds using a safe operator lookup. Produces a three-tier result โ€” PASS, WARNING, or FAIL โ€” with human-readable messages for each control.
โ†’
๐Ÿ“Š
Reports
JSON ยท CSV
Audit-Ready Output
Structured reports with control ID, status, severity, actual values, and thresholds. Exportable as JSON for automation or CSV for spreadsheet review.

Select a scenario and collect evidence

SCENARIO:
๐Ÿ”
Identity Provider
Simulated Okta-style identity data: MFA enrollment, inactive accounts, admin access reviews.
CC6.1CC6.2 CC6.3CC6.6
๐Ÿ™
GitHub
Repository security: branch protection, PR review requirements, secret scanning coverage.
CC8.1CC8.1.2 CC8.1.3
โ˜๏ธ
AWS
Cloud infrastructure: CloudTrail logging, encryption at rest, root account MFA enforcement.
CC7.1CC7.2 CC7.3
compliance-evidence-collector
Control Name Status Severity Details

Compliance-as-Code

Controls are defined in YAML โ€” add new checks without touching code.

How It Works

The collector authenticates to each source API using secure credential management, retrieves configuration and policy data, then evaluates findings against control thresholds defined in a YAML-based control catalog.

Each evidence source has a dedicated collector module that normalizes API responses into a standard evidence format. This modular design allows new sources to be added without modifying the evaluation engine โ€” when a new compliance source is required, only a new collector module needs to be written.

The evaluation engine processes each collected data point against its corresponding control definition, applying the defined threshold logic to produce an instant PASS, WARNING, or FAIL determination. Results are aggregated into a structured report exportable as JSON for pipeline automation or CSV for audit review.

Logic Flow by Evidence Source

๐Ÿ” Identity Provider (Okta)

Queries user and policy APIs to evaluate: MFA enforcement coverage across all user types, password policy compliance against defined minimum standards, inactive account identification, and admin privilege distribution.

๐Ÿ™ Code Repository (GitHub)

Evaluates repository security posture including: branch protection enforcement on production branches, code review requirements (minimum reviewer thresholds), secrets scanning activation, and access control configuration.

โ˜๏ธ Cloud Infrastructure (AWS)

Assesses cloud security configuration across: encryption-at-rest status for storage services, access logging enablement via CloudTrail, and IAM policy compliance including root account protection.

Design Decisions

Key architectural choices and the reasoning behind them.

Design Decision
Why YAML-driven controls?

Control definitions are externalized in YAML files rather than hardcoded. This means GRC analysts can update control thresholds, add new controls, or adjust pass/fail criteria without touching code โ€” critical for maintaining audit readiness as frameworks evolve.

Design Decision
Why modular collectors?

Each API source has an independent collector module. When a new compliance source needs to be added (e.g., a new cloud provider or identity platform), only a new collector needs to be written โ€” the evaluation engine and reporting layer remain unchanged.

Design Decision
Why automated pass/fail?

Traditional evidence collection requires manual review of each control. The automated evaluation engine applies defined thresholds to collected evidence and produces instant pass/fail determinations, reducing audit prep from weeks to minutes.

Technology Stack

Technologies chosen for this tool and the rationale behind each selection.

Python

Chosen for rich API client libraries, rapid prototyping, and broad adoption in GRC automation tooling.

REST APIs

Direct integration with source platforms (Okta, GitHub, AWS) for real-time evidence retrieval without intermediary services.

YAML

Human-readable control definitions that GRC analysts can maintain without engineering involvement โ€” keeping compliance logic accessible.

CLI

Scriptable command-line interface enables scheduled and automated evidence collection runs within existing CI/CD or audit workflows.