Methodology
Scores are computed by deterministic static rules (reproducible). LLMs only explain results in plain language. We show evidence (path / line / snippet) and call out false-positive / false-negative risk.
Dimensions & Weights
| Dimension | Weight | Intent |
|---|---|---|
| Code Toxicity | 30% | Dangerous APIs / destructive behavior / dynamic execution |
| Privacy Risk | 25% | Sensitive data handling and outbound network risk |
| Permission Scope | 20% | Breadth of capabilities (filesystem / network / process / env) |
| Author Reputation | 15% | Best-effort trust signals about author and repo |
| Code Quality | 10% | Tests / docs / dependency hygiene / CI signals |
Risk Level Thresholds
Kill Switch
If we detect clearly destructive filesystem operations (e.g. recursive deletion), we immediately classify as High and show the matching evidence on the detail page.
Supply Chain Threat Signals
We add targeted checks for malicious skill distribution and hidden instruction attacks.
- Blocklist matching on known risky authors, domains, and indicators.
- Prompt injection checks in SKILL.md (hidden comments and invisible Unicode).
- Detection of high-risk exfiltration / obfuscation patterns in source files.
- Toxic flags are shown in cards and detail pages for transparent triage.
Uncertainty & Limitations
Static analysis can produce false positives/negatives. LLMs may hallucinate; they do not affect scoring. Disagreement is surfaced as an explicit signal.
Disclaimer
Reports are best-effort analysis and do not guarantee safety. Always review code and run in isolated environments for sensitive use.