Cliff Analytics — Merchant-Facing AI Agent

Basin Climbing and Fitness

What It Is

An analytics agent that helps gym owners and GMs understand their business through natural conversation. Instead of navigating dashboards and running reports, the owner asks questions in plain English and gets answers with charts, tables, and investigation when needed.

How It Works

Three-Layer Approach System

Every question goes through an automatic retrieval pipeline before the agent responds:

Business Taxonomy — 20+ concepts that map business terms to data. “Fitness” ≠ “Team” ≠ “Programming.” “Family” = family + duo. The agent knows what each term means in the data.
Analytical Patterns — 11 patterns that teach the agent HOW to approach each type of question. Aggregation, time series, comparison, decomposition, ratio, funnel, temporal, frequency, methodology check.
Worked Examples — 35+ proven question-answer pairs. The agent retrieves the most relevant examples and follows their approach. “For LTV questions, use survival curve not raw average.”

This means the agent doesn’t wing it — it follows tested approaches while still being flexible enough to handle novel questions.

Understand First, Compute Second

The core principle. Before calling any tool, the agent considers:

Clear question → answer directly
Somewhat clear → answer + “I interpreted this as X — is that what you meant?”
Very ambiguous → clarify first (“Conversion from what to what?”)

The agent also checks historical context before interpreting numbers — knows about the 90-for-90 promo, seasonal patterns, Klaviyo flow fixes, and other events that explain anomalies.

What It Can Answer

Revenue & Financial

Total revenue, by category, month-over-month comparison
MRR (monthly recurring revenue) adjusted for billing frequency
Revenue per category per month grids
Fitness-specific revenue (correctly excludes team and camps)

Membership

Active member count (people, not plans)
Net growth (new - cancelled), breakdown by type
At-risk member list with risk categories
Cohort retention and survival curves
LTV analysis by segment (solo vs multi-person)
New member profile (who is joining, by source/type)

Visits & Traffic

Total checkins, by entry method (member/day pass/guest/free/event)
Day of week and hour of day patterns
Member visit frequency (avg visits/member/month + distribution)
Member vs non-member traffic split

Conversion & Funnel

Full funnel: first visit → return → frequent → member (with drop-off rates)
Return rate for first-time visitors
2-week pass performance and conversion rate
Day pass volume (count, not just revenue)

Diagnostics

“Why did churn spike?” → decomposes by segment, tests hypotheses
“Where are we losing people?” → funnel with live data
“How did the 90-for-90 perform?” → full promo lifecycle analysis

Flow Audit

“Why isn’t [person] in a Klaviyo flow?” → traces full pipeline
“Who’s falling through the cracks?” → finds customers with visits but no flows

Charts & Visualization

Revenue trend, breakdown, stacked composition
Membership growth over time
Check-in trends (member vs non-member)
Custom charts: any metric × any filter × any time grain

Memory System

The merchant agent has tenant-scoped memory via the /remember command:

Owner types /remember MRR excludes team dues for our gym
Stored as JSONL in S3: cliff_merchant_memory/{tenant_id}/memories.jsonl
Loaded into the system prompt on every new conversation
Accumulates over time — the agent gets smarter about this specific gym

Feedback Loop

Every conversation is stored with:

Question, response, tool calls used, timestamp, user
Thumbs up/down + text feedback from the admin UI

This enables:

Offline analysis: Review what questions are being asked, where the agent struggles
Improvement iteration: Negative feedback → add to worked examples / fix taxonomy
Future training data: 84+ conversations logged, ready for fine-tuning when volume justifies

Admin Dashboard

Available at /admin (OTP-gated for steel@ and trinity@):

Customer Agent tab: conversations, feedback, knowledge audit
Merchant Agent tab: conversations, feedback, knowledge audit
Feedback items have Reviewed / Implemented / Dismissed workflow

Architecture

34 Tools Across 7 Categories

Category	Tools	Examples
Data (18)	Revenue, membership, visits, conversion, fitness, camps, day pass volume, member frequency, new member profile	get_total_revenue, get_active_member_count, get_day_pass_volume
Charts (6)	Preset + fully custom	chart_revenue_trend, chart_custom
Context (2)	Historical business events	get_month_context, get_promotion_details
Investigation (4)	Churn decomposition, visit patterns, revenue change, cohort analysis	investigate_churn, decompose_revenue_change
Flow Audit (2)	Individual pipeline trace + gap finder	audit_customer_flow, find_flow_gaps
Segment Analysis (2)	Flexible primitives: survival, billing, count, revenue by any segment	analyze_segments, pivot_analysis

Domain Knowledge

Asset	Count	Purpose
Business concepts	20+	What terms mean in the data
Analytical patterns	11	How to approach each question type
Worked examples	35+	Proven approaches for known questions
Business events	24 months	Historical context for anomaly explanation
Promotions library	6	Detailed promo lifecycle data

Infrastructure

Model: Claude Haiku 4.5 — ~$0.005/question
Data: 22 S3 datasets loaded at startup, refreshed daily
Conversations: Every turn saved to S3 with tool calls
Feedback: Thumbs up/down + text, linked to conversation
Admin dashboard: Conversations, feedback, knowledge audit — split by agent type

Evaluation

Test Suite: 56 Scenarios

Easy Questions (35 tests):

Category	Tests	Pass Rate
Revenue (total, breakdown, MRR, MoM)	8	100%
Membership (count, growth, at-risk)	8	100%
Visits (count, traffic, busy times)	3	100%
Conversion (return rate, 2-week pass)	4	100%
Clarification (ambiguous questions)	2	100%
Out of scope	4	100%
Charts	4	100%
Hallucination prevention	2	100%

Hard Questions (21 tests):

Category	Tests	Pass Rate
Multi-step (revenue per member, day pass vs membership)	4	100%
Diagnostic (why churn, funnel gaps, promo evaluation)	3	100%
Comparative (lead sources, weekday vs weekend)	3	100%
Synthesis (weekly briefing, health check)	2	100%
Time interpretation (this quarter, YTD, last 7 days)	3	100%
Edge cases (broad, nonsense, future)	3	100%
Follow-up chains (3 multi-turn tests)	3	100%

What We Test For

Behavior	How We Measure
Right tool selection	Agent picks the correct tool for the question type
Clarification	Asks before computing on ambiguous questions
Historical context	Checks business events calendar before interpreting anomalies
Follow-up offers	Every answer ends with a relevant follow-up
No hallucination	Numbers always come from tool calls, never from memory
Show methodology	Explains HOW it calculated complex metrics
Data limitations	Mentions 42.6% linkage caveat when relevant
90-for-90 awareness	Excludes promo from tenure/LTV calculations
Scope acknowledgment	Says “I can’t calculate that yet” for unsupported questions

Evaluation Rubric

Dimension	1 (Poor)	3 (Acceptable)	5 (Excellent)
Accuracy	Wrong numbers, wrong tool, wrong methodology	Correct numbers but missing context	Correct numbers + methodology explained + caveats noted
Understanding	Answers a different question than what was asked	Gets the gist, sometimes wrong scope	Fully understands intent, clarifies ambiguity, matches scope
Methodology	Uses raw averages, includes promo data, no explanation	Right approach but doesn’t explain it	Survival curves, correct filters, explains why this method
Context Awareness	Ignores business events, seasonal patterns	Sometimes mentions relevant context	Proactively checks calendar, explains anomalies
Completeness	Answers with one number, no breakdown	Answers + some breakdown	Answers + breakdown + follow-up + chart offer
Clarification	Guesses on ambiguous questions	Sometimes asks, sometimes guesses	Always asks on ambiguous, answers+checks on somewhat-clear
Data Quality	Doesn’t mention limitations	Occasionally notes data gaps	Proactively flags linkage issues, data quality caveats
Follow-through	Answers and stops	Offers generic follow-up	Specific, relevant follow-up tied to the question

Key Design Decisions

Primitive tools + smart approach guides > prescriptive tools. The segment analysis tool does one thing at a time (survival, billing, count, revenue). The approach guide teaches the agent to compose them for complex analysis like LTV.
Domain knowledge injected per-question, not in the system prompt. The prompt is 6.7K chars of pure behavior rules. The approach retriever injects only the relevant taxonomy, pattern, and examples for each specific question.
Survival curve for tenure, not raw averages. A young, growing gym biases toward short tenures if you only average ended memberships. Survival curves account for active members and give the true expected lifetime.
Every conversation saved. Conversations + feedback + tool calls all go to S3. This is training data for future model fine-tuning and the feedback-to-improvement pipeline.
Business events calendar. The agent knows about the 90-for-90 promo, Klaviyo fixes, seasonal patterns, and pricing changes. It checks context before interpreting any time period.