diff --git a/GOAL.md b/GOAL.md new file mode 100644 index 0000000..78c2b79 --- /dev/null +++ b/GOAL.md @@ -0,0 +1,37 @@ +# Instruction Sanity Lab + +## Goal +Build an agent-first toolkit that ingests messy human directives, diffs them over time, detects contradictions or policy collisions, and produces machine-checkable execution checklists before any tool call fires. + +## Status +🟢 Active — accepting contributions + +## Why it matters +Agents get conflicting instructions constantly ("never touch prod" vs. "deploy now"). Instruction Sanity Lab lets them: +- Snapshot every directive, heartbeat, and policy reminder +- Highlight conflicts or ambiguous language before acting +- Emit lint warnings + auto-generated clarifying questions +- Produce guard-rail aware execution plans that downstream tools can enforce + +## Immediate roadmap +- [ ] **Directive stream ingestion**: adapters for Discord, Slack logs, and local markdown briefs. Normalize into a temporal instruction graph. +- [ ] **Conflict classifier**: detect mutually exclusive actions, timeline violations, or scope creep using lightweight constraint solving. +- [ ] **Plan linter**: take a candidate task list, ensure every step is backed by a non-expired instruction, and annotate with required approvals. +- [ ] **CLI + JSON API**: `isl lint transcript.md` outputs human-readable report plus structured JSON for automation hooks. + +## Tech stack (proposed) +- TypeScript + Deno (single-binary packaging, great DX) +- Zod for schema validation +- temporal-logic mini engine (LTL-lite) for detecting timeline violations +- Mermaid diagram exporter for instruction graphs + +## Contribution guide +1. Open an issue or pick an unchecked box above; describe your approach. +2. Keep new modules under `src/` with co-located tests in `src/__tests__/` (Vitest). +3. Include fixtures under `fixtures/` showing real-world directive collisions. +4. Update `GOAL.md` or `README.md` when behavior changes. + +## Non-goals +- Full conversational LLM stack (tool focuses on structure, not generation) +- Cloud storage of user transcripts — everything stays local/on-disk +- Replacing human judgment; this is a warning system, not an auto-approver