Cliff Analytics — Merchant-Facing AI Agent
Basin Climbing and Fitness
What It Is
An analytics agent that helps gym owners and GMs understand their business through natural conversation. Instead of navigating dashboards and running reports, the owner asks questions in plain English and gets answers with charts, tables, and investigation when needed.
How It Works
Three-Layer Approach System
Every question goes through an automatic retrieval pipeline before the agent responds:
-
Business Taxonomy — 20+ concepts that map business terms to data. “Fitness” ≠ “Team” ≠ “Programming.” “Family” = family + duo. The agent knows what each term means in the data.
-
Analytical Patterns — 11 patterns that teach the agent HOW to approach each type of question. Aggregation, time series, comparison, decomposition, ratio, funnel, temporal, frequency, methodology check.
-
Worked Examples — 35+ proven question-answer pairs. The agent retrieves the most relevant examples and follows their approach. “For LTV questions, use survival curve not raw average.”
This means the agent doesn’t wing it — it follows tested approaches while still being flexible enough to handle novel questions.
Understand First, Compute Second
The core principle. Before calling any tool, the agent considers:
- Clear question → answer directly
- Somewhat clear → answer + “I interpreted this as X — is that what you meant?”
- Very ambiguous → clarify first (“Conversion from what to what?”)
The agent also checks historical context before interpreting numbers — knows about the 90-for-90 promo, seasonal patterns, Klaviyo flow fixes, and other events that explain anomalies.
What It Can Answer
Revenue & Financial
- Total revenue, by category, month-over-month comparison
- MRR (monthly recurring revenue) adjusted for billing frequency
- Revenue per category per month grids
- Fitness-specific revenue (correctly excludes team and camps)
Membership
- Active member count (people, not plans)
- Net growth (new - cancelled), breakdown by type
- At-risk member list with risk categories
- Cohort retention and survival curves
- LTV analysis by segment (solo vs multi-person)
- New member profile (who is joining, by source/type)
Visits & Traffic
- Total checkins, by entry method (member/day pass/guest/free/event)
- Day of week and hour of day patterns
- Member visit frequency (avg visits/member/month + distribution)
- Member vs non-member traffic split
Conversion & Funnel
- Full funnel: first visit → return → frequent → member (with drop-off rates)
- Return rate for first-time visitors
- 2-week pass performance and conversion rate
- Day pass volume (count, not just revenue)
Diagnostics
- “Why did churn spike?” → decomposes by segment, tests hypotheses
- “Where are we losing people?” → funnel with live data
- “How did the 90-for-90 perform?” → full promo lifecycle analysis
Flow Audit
- “Why isn’t [person] in a Klaviyo flow?” → traces full pipeline
- “Who’s falling through the cracks?” → finds customers with visits but no flows
Charts & Visualization
- Revenue trend, breakdown, stacked composition
- Membership growth over time
- Check-in trends (member vs non-member)
- Custom charts: any metric × any filter × any time grain
Memory System
The merchant agent has tenant-scoped memory via the /remember command:
- Owner types
/remember MRR excludes team dues for our gym - Stored as JSONL in S3:
cliff_merchant_memory/{tenant_id}/memories.jsonl - Loaded into the system prompt on every new conversation
- Accumulates over time — the agent gets smarter about this specific gym
Feedback Loop
Every conversation is stored with:
- Question, response, tool calls used, timestamp, user
- Thumbs up/down + text feedback from the admin UI
This enables:
- Offline analysis: Review what questions are being asked, where the agent struggles
- Improvement iteration: Negative feedback → add to worked examples / fix taxonomy
- Future training data: 84+ conversations logged, ready for fine-tuning when volume justifies
Admin Dashboard
Available at /admin (OTP-gated for steel@ and trinity@):
- Customer Agent tab: conversations, feedback, knowledge audit
- Merchant Agent tab: conversations, feedback, knowledge audit
- Feedback items have Reviewed / Implemented / Dismissed workflow
Architecture
34 Tools Across 7 Categories
| Category | Tools | Examples |
|---|---|---|
| Data (18) | Revenue, membership, visits, conversion, fitness, camps, day pass volume, member frequency, new member profile | get_total_revenue, get_active_member_count, get_day_pass_volume |
| Charts (6) | Preset + fully custom | chart_revenue_trend, chart_custom |
| Context (2) | Historical business events | get_month_context, get_promotion_details |
| Investigation (4) | Churn decomposition, visit patterns, revenue change, cohort analysis | investigate_churn, decompose_revenue_change |
| Flow Audit (2) | Individual pipeline trace + gap finder | audit_customer_flow, find_flow_gaps |
| Segment Analysis (2) | Flexible primitives: survival, billing, count, revenue by any segment | analyze_segments, pivot_analysis |
Domain Knowledge
| Asset | Count | Purpose |
|---|---|---|
| Business concepts | 20+ | What terms mean in the data |
| Analytical patterns | 11 | How to approach each question type |
| Worked examples | 35+ | Proven approaches for known questions |
| Business events | 24 months | Historical context for anomaly explanation |
| Promotions library | 6 | Detailed promo lifecycle data |
Infrastructure
- Model: Claude Haiku 4.5 — ~$0.005/question
- Data: 22 S3 datasets loaded at startup, refreshed daily
- Conversations: Every turn saved to S3 with tool calls
- Feedback: Thumbs up/down + text, linked to conversation
- Admin dashboard: Conversations, feedback, knowledge audit — split by agent type
Evaluation
Test Suite: 56 Scenarios
Easy Questions (35 tests):
| Category | Tests | Pass Rate |
|---|---|---|
| Revenue (total, breakdown, MRR, MoM) | 8 | 100% |
| Membership (count, growth, at-risk) | 8 | 100% |
| Visits (count, traffic, busy times) | 3 | 100% |
| Conversion (return rate, 2-week pass) | 4 | 100% |
| Clarification (ambiguous questions) | 2 | 100% |
| Out of scope | 4 | 100% |
| Charts | 4 | 100% |
| Hallucination prevention | 2 | 100% |
Hard Questions (21 tests):
| Category | Tests | Pass Rate |
|---|---|---|
| Multi-step (revenue per member, day pass vs membership) | 4 | 100% |
| Diagnostic (why churn, funnel gaps, promo evaluation) | 3 | 100% |
| Comparative (lead sources, weekday vs weekend) | 3 | 100% |
| Synthesis (weekly briefing, health check) | 2 | 100% |
| Time interpretation (this quarter, YTD, last 7 days) | 3 | 100% |
| Edge cases (broad, nonsense, future) | 3 | 100% |
| Follow-up chains (3 multi-turn tests) | 3 | 100% |
What We Test For
| Behavior | How We Measure |
|---|---|
| Right tool selection | Agent picks the correct tool for the question type |
| Clarification | Asks before computing on ambiguous questions |
| Historical context | Checks business events calendar before interpreting anomalies |
| Follow-up offers | Every answer ends with a relevant follow-up |
| No hallucination | Numbers always come from tool calls, never from memory |
| Show methodology | Explains HOW it calculated complex metrics |
| Data limitations | Mentions 42.6% linkage caveat when relevant |
| 90-for-90 awareness | Excludes promo from tenure/LTV calculations |
| Scope acknowledgment | Says “I can’t calculate that yet” for unsupported questions |
Evaluation Rubric
| Dimension | 1 (Poor) | 3 (Acceptable) | 5 (Excellent) |
|---|---|---|---|
| Accuracy | Wrong numbers, wrong tool, wrong methodology | Correct numbers but missing context | Correct numbers + methodology explained + caveats noted |
| Understanding | Answers a different question than what was asked | Gets the gist, sometimes wrong scope | Fully understands intent, clarifies ambiguity, matches scope |
| Methodology | Uses raw averages, includes promo data, no explanation | Right approach but doesn’t explain it | Survival curves, correct filters, explains why this method |
| Context Awareness | Ignores business events, seasonal patterns | Sometimes mentions relevant context | Proactively checks calendar, explains anomalies |
| Completeness | Answers with one number, no breakdown | Answers + some breakdown | Answers + breakdown + follow-up + chart offer |
| Clarification | Guesses on ambiguous questions | Sometimes asks, sometimes guesses | Always asks on ambiguous, answers+checks on somewhat-clear |
| Data Quality | Doesn’t mention limitations | Occasionally notes data gaps | Proactively flags linkage issues, data quality caveats |
| Follow-through | Answers and stops | Offers generic follow-up | Specific, relevant follow-up tied to the question |
Key Design Decisions
-
Primitive tools + smart approach guides > prescriptive tools. The segment analysis tool does one thing at a time (survival, billing, count, revenue). The approach guide teaches the agent to compose them for complex analysis like LTV.
-
Domain knowledge injected per-question, not in the system prompt. The prompt is 6.7K chars of pure behavior rules. The approach retriever injects only the relevant taxonomy, pattern, and examples for each specific question.
-
Survival curve for tenure, not raw averages. A young, growing gym biases toward short tenures if you only average ended memberships. Survival curves account for active members and give the true expected lifetime.
-
Every conversation saved. Conversations + feedback + tool calls all go to S3. This is training data for future model fine-tuning and the feedback-to-improvement pipeline.
-
Business events calendar. The agent knows about the 90-for-90 promo, Klaviyo fixes, seasonal patterns, and pricing changes. It checks context before interpreting any time period.