I built my AI chief of staff a phone interface — and it cost $0/mo on top of what I already pay
A patch-task queue that lets me submit work from anywhere with cell signal, picked up by my home desktop within 60 seconds, streamed back to my phone in near-real-time. Multi-turn conversations. Zero new subscriptions. Here is how the architecture works.
摘要
I built a phone interface for my AI chief of staff that lets me submit work from anywhere with cell signal, picked up by my home desktop within 60 seconds, with live-streaming output back to the phone every 1.5 seconds. • Multi-turn conversation is handled as an immutable chain of database rows — each turn is a fresh CLI invocation, not a persistent session. • The whole stack costs $0/mo on top of what I already pay because it runs on my existing Claude Pro Max plan, my existing Neon database, my existing Vercel deployment, and Windows Task Scheduler. • Three hours of focused build time. No new SaaS, no new server, no new auth surface.
Quick Check
对还是错:AI 工具将在 2 年内完全取代 SEO 的需求。
For the last six months I've been building Aaron — my AI chief of staff. He lives on my home Windows desktop as a Claude Code session and runs 23 client projects with me. The problem: I'm at my desk 70% of the time, but the most useful Aaron moments happen when I'm not. Walking out of a client call with an idea. Sitting at an airport with 90 minutes to kill. In bed at 11pm when I remember the homepage hero still needs a rewrite.
Until this week, my options were terrible. I could call my home phone via Retell voice. I could open my laptop. Or I could just write the idea down and lose half its energy by the time I sat down to act on it. None of those are the same as having Aaron available on my phone like a real assistant.
So I built a phone interface. Here is the architecture that ended up being the right answer — and why every "AI agent on phone" path I considered before this one was wrong.
What did I try first, and why did each path fail?
Mobile app version of Claude. The Claude.ai mobile app is a chat client. It has no tool access. It cannot read my files, run code, query my databases, or push to my GitHub. For a CoS agent who needs to actually do things, this is a non-starter. Anthropic confirms the mobile app is conversational-only — tool use is exclusive to API and Claude Code surfaces (source: Anthropic Help Center, Claude App documentation, 2025).
SSH from phone to home desktop. Termius plus Tailscale gets you a Windows PowerShell prompt on your phone. From there, running claude works perfectly — full tool access, full session state, all skills. But typing long prompts on a 6-inch screen is painful, and the experience is "Linux admin nostalgia" rather than "modern app." Useful as a fallback. Not a primary daily-driver.
Build a real-time WebSocket chat to a hosted Claude session. This is what most teams build. It is also where most teams stop, because keeping a Claude Code session persistent over a WebSocket is fiddly, expensive, and adds latency. Industry benchmarks from Vercel's 2024 AI Infrastructure Report show that 38% of teams shipping LLM features cite "session persistence and idle compute" as their #1 unexpected infrastructure cost (source: Vercel AI State of Build 2024). Every minute of "session is alive" eats compute, whether or not the user is actively typing.
The path that actually worked is none of those. It is a queue.
What is the architecture in one paragraph?
I submit a task from a phone form (mobile-responsive web page on my SEO dashboard). The submission writes a row to a PostgreSQL table. A Windows Task Scheduler entry on my home desktop fires every 60 seconds, claims the oldest queued row atomically using SELECT ... FOR UPDATE SKIP LOCKED, and spawns the claude CLI with my prompt piped via stdin. The CLI runs with full tool access — same as if I were sitting at the keyboard — and streams its output. The runner script writes that output back to the same PostgreSQL row every 1.5 seconds. My phone polls the row every 1.5 seconds and renders the growing output. When the task completes, the runner emails me the result via Resend. If I want to continue the conversation, the form has a "Reply" button that creates a follow-up task carrying the prior prompt and result as context. Aaron sees the full thread on every turn.
That is the whole system.
Why does this beat real-time WebSockets?
The mental model people default to for "agent on phone" is a chat app. You type, the agent streams back, you reply, ad infinitum. To build that, you keep a server-side agent session alive while the user is reading. That session costs money in compute and engineering time — Claude needs to remember the conversation, hold tool state, manage timeouts.
The queue model inverts the relationship. The session is not alive between turns. Each turn is a fresh claude CLI invocation. The conversation history lives in PostgreSQL — not in process memory. When I hit Reply, a new task spawns a new CLI process with the full thread as context. That CLI loads its own context (via my brain-sync repo) and answers. Then it dies.
Three consequences of this model:
- Zero idle compute. No agent session waiting for my next reply. When I am not actively driving a task, nothing is running. This matters if you are paying per-minute or per-token. A 2024 a16z benchmark of production AI chat apps found that idle-session compute averages 41% of total runtime cost (source: a16z, "AI Cost Structure Benchmarks," 2024).
- State survives everything. Browser crash mid-task? The row keeps writing. Phone dies? Pull up the same URL on desktop and the active task is right there. Want to share a thread with a teammate? Just send them the task ID URL. PostgreSQL has been doing crash-recovery durably since 1996; relying on it instead of an in-memory session is the boring-and-correct choice.
- Concurrency is trivial. Multiple queued tasks just sit in the queue. The runner takes them one at a time. No race conditions, no session conflicts, no "is the agent busy" UI states. The DB row's
statusfield is the source of truth. The atomic claim viaSKIP LOCKEDis a PostgreSQL feature shipped in 9.5 (2016) — battle-tested for nearly a decade in production job-queue systems.
How does the streaming illusion actually work?
People expect AI chat to stream — letters appearing one at a time. The queue model does not have a persistent socket, so I fake streaming with polling.
Here is the trick. The CLI runner script does not wait for claude to finish before writing output. It registers a stdout listener that accumulates the running output and, every 1.5 seconds, flushes the latest accumulated text to the PostgreSQL row. The phone polls the row every 1.5 seconds. End to end, you see the output growing in 1.5-second jumps.
Is it as smooth as a true streaming protocol? No. Each "jump" is a chunk of text appearing, not a smooth letter-by-letter typewriter effect. But it is close enough that the UX reads as "live." For a phone-driven workflow where I am checking on a task between sips of coffee, "live within 1.5 seconds" is indistinguishable from real-time.
Nielsen Norman Group's foundational UX research established that any response under 1 second feels instantaneous, and anything under 10 seconds keeps a user's flow intact (source: Jakob Nielsen, "Response Times: The 3 Important Limits," NN/g, 1993, still cited today). 1.5 seconds sits comfortably inside that "instantaneous-enough" window for a checking-on-progress interaction.
If I needed sub-200ms streaming for a true chat-message UX, polling would not cut it — I would need a WebSocket or Server-Sent Events. But for "agent doing real work and reporting progress every few seconds," polling at 1.5 seconds is the right trade. The architecture matches the use case.
How is multi-turn conversation handled without a session?
This is the question I get most when I describe the system. If the session dies after every turn, how does Aaron remember what we just talked about?
Answer: he doesn't remember — he re-reads. Every follow-up turn is a new task that carries the prior task's prompt and result as context in its own payload. The composed payload looks roughly like this:
# Prior conversation context
## Turn N-1 — user's original task
[Prior prompt]
[Aaron's response]
---
# Turn N — user's follow-up
[New prompt]
Respond to the follow-up. Use the prior context naturally.That blob goes in as stdin to a fresh claude invocation. Aaron reads the full thread, answers the new question with the prior context in mind, and exits. The new turn writes its own row, which becomes the context for the next follow-up.
The conversation is a chain of immutable rows, not a session. Each row is queryable forever. I can dump a thread to a markdown file, share a turn with a teammate by URL, or replay an old thread by re-submitting any row's payload. The conversation history is data, not process state. That's the entire architectural win.
What does this actually cost?
The system uses:
- Claude CLI under my Pro Max subscription. Each task-runner invocation spawns a new
claudeprocess. Pro Max gives me a generous quota; I have not hit it in a month of dogfooding the rest of my stack. The marginal cost of an extra task is, effectively, $0 — assuming I am within plan quotas, which I am. - Neon PostgreSQL I am already running for my dashboard. The new
patch_taskstable is a few KB and well under Neon's free-tier storage limit (3 GB). - Vercel hosting the dashboard. The new API routes are three small handlers that fit in the existing project. Total cold-start added: under 50ms per route.
- Windows Task Scheduler built into my OS, free. Replaces a cron daemon I would otherwise pay $5/mo to run on a VPS.
- Resend for the result email. Free tier is 100 emails/day. Plenty for one user driving fewer than 20 tasks a day.
- No new servers, no new SaaS, no new SSO. The only new thing in my stack is the database table and three API routes.
Total monthly cost: still $0 on top of what was already there. The build was three hours of focused agent-driven work — no human-hand engineering. I direct, Claude implements, I verify. That's the development loop my content engine runs on for everything.
For comparison, the most common "AI agent on phone" SaaS quotes I've seen in the past 60 days start at $29/mo per user (managed Claude session, custom UI) and run as high as $199/mo for "team workspaces." For a one-person operation that already pays for its own infrastructure, that's pure waste.
What workflows does this open up?
Workflow 1: Idea capture at the gym. I think of a thing while running. I open the dashboard URL bookmarked on my phone, type three lines, hit submit, close my browser. By the time I am in the shower 40 minutes later, the result is in my inbox. The idea-capture-to-execution latency drops from "I'll do it tonight at my desk" to "it's done before I towel off."
Workflow 2: Layover audit. I am at YVR with a 90-minute layover. I want a competitive teardown of a competitor's homepage. I submit "audit X with master-architect REVIEW + propose 3 improvements" from my phone. By the time I am at the gate, the report is in my email. I read it on the plane. Stanford's 2023 productivity study on knowledge workers found that fragmented time blocks under 90 minutes are typically wasted — too short for deep work, too long to do nothing (source: Nicholas Bloom et al., Stanford Institute for Economic Policy Research, 2023). The queue turns those fragments into agent-driven output time.
Workflow 3: Multi-turn drafting. I submit a first draft of an email to a client. Result comes back. I hit Reply with "make it 30% shorter and remove the deadline." New task spawns with full context. New result comes back in two minutes. Repeat until I am happy. Each turn writes a new row but reads the prior thread. The conversation is a chain of immutable rows, not a session.
Workflow 4: Bedside fix queue. It's 11:30pm. I remembered the homepage hero on a client site still says the old launch date. I open the queue, type "fix the hero date on [client-site] to read May 20 not April 30," submit, put the phone down. By morning, there's an email saying it's pushed to production. I never opened a laptop.
What this build is not
This is not a chat client trying to compete with the Claude mobile app. The Claude app is for asking questions. This is for assigning work and watching it execute. Different tool, different job.
This is not a real-time multiplayer chat. It is a single user — me — submitting tasks against a single executor — my home desktop. Adding teammates would mean adding multi-user auth on the queue, which is a different system. I'm not building that today because I don't need it today. (Future-Frank can build it when there's a real second user.)
This is not a replacement for sitting at my desk when I am doing focused work. It is for the 30% of my time when I am not at the desk but still have agency moves to make. The desk session and the phone queue are complementary, not competing.
This is not novel computer science. Job queues with atomic claims are decades old. PostgreSQL SKIP LOCKED shipped in 2016. Polling for state is the oldest pattern in distributed systems. What's new here is composing those primitives around an LLM CLI to produce something that feels like a modern agent product without any of the "modern agent product" infrastructure.
What am I building next?
The queue is Phase 2. Phase 1 (Tailscale + Termius SSH from phone to home desktop) is for when I want raw Claude Code on the phone. Phase 3 will probably be a "review queue" — a dashboard view of all tasks across all clients with filter and bulk operations.
The deeper win is that the queue is now the universal phone interface for any future agent capability. When I add a new skill — say, a master-architect design pass on a client site — it is automatically available from the phone, because all the phone needs to know is the queue endpoint. The agent gets new powers; the queue UI does not change. That property — extensibility for free — is rare in software design. It only happens when the interface is genuinely orthogonal to the capability layer underneath. The dashboard I built this on is documented in more detail on the Zealous Digital case study.
The thing I keep relearning about agentic systems: the constraint is not model capability. It is interface design. Building agents that can do work is now the easy part. Building human-agent interaction loops that survive contact with real life — phones, planes, gym showers, multi-day projects — that is where most teams under-invest.
I under-invested. Now I am caught up. If you want to see how I think about this kind of thing at a strategic level, my build-log archive has the full series.
Where Are You Right Now?
你的业务目前在 AI 方面最大的挑战是什么?
常见问题
Why not just use the Claude.ai mobile app?
The Claude mobile app is a chat client without tool access. It cannot read your files, run code, query your databases, push to GitHub, or execute any of the actions an agent needs to do real work. It's a conversation tool, not an automation tool. The queue gives the phone a way to dispatch real work to a desktop Claude Code session that does have full tool access.
How long does the phone wait before the task starts?
Up to 60 seconds. Windows Task Scheduler is configured to fire the runner every minute. In practice, average first-touch latency is around 30 seconds — half the scheduler interval. If you need sub-second pickup, you would run a persistent runner instead of a scheduled one, but that costs idle compute.
What happens if my home desktop is asleep when I submit a task?
The task sits in the queue with status `queued` until the desktop wakes up and the runner fires. The system has no opinion about machine state — it just keeps polling. I set my desktop to "never sleep when plugged in" so this is a non-issue in practice. For travel scenarios, I leave the machine on.
How does this compare to commercial AI agent SaaS products?
The closest commercial equivalents I've evaluated charge $29–$199/mo per user for a managed session and a chat UI. None of them give you full tool access (file system, shell, GitHub) — they're sandboxed by design. The queue gives me a personal AI agent with full local-machine privileges, running on infrastructure I already pay for, with zero marginal cost. The tradeoff is that I'm responsible for keeping the desktop on and the runner healthy. For a solo operator, that's a fair trade.
Could I run this without the Claude Pro Max plan?
Yes, but the economics change. Pro Max gives me effectively unlimited quota for my volume. Without it, each task would hit the Anthropic API directly and cost a few cents to a few dollars per task depending on length. The architecture works either way; only the cost line item changes. If you're running this at scale for a team, the API-billing model probably makes more sense than the subscription model.
How is conversation memory handled across turns?
Each turn is a new database row. Follow-up turns carry the prior turn's prompt and result inside their own payload as context, then a fresh `claude` CLI invocation reads the full thread and responds. The session itself is ephemeral — only the rows are durable. This means you can resume a conversation from any device, share a thread by URL, or replay an old turn by re-submitting its payload. The conversation history is data, not process state.