--- title: "Building Clarissa: Learning How AI Agents Actually Work" description: "A deep dive into building an AI-powered terminal assistant from scratch. Learn about the ReAct pattern, tool execution, context management, and what it takes to build a real AI agent." date: 2025-12-07T00:00:00.000Z tags: ["ai", "typescript", "bun", "mcp", "agents", "terminal", "cli"] author: "Cameron Rye" canonical_url: https://rye.dev/blog/building-clarissa-ai-terminal-assistant/ --- Building Clarissa started as a learning exercise to understand how AI agents actually work under the hood. After using tools like Claude, ChatGPT, and various coding assistants, I wanted to demystify the magic. What I discovered was both simpler and more nuanced than I expected. This post shares what I learned building a terminal AI assistant from scratch, the architectural patterns that emerged, and the practical challenges of creating an agent that can reason about tasks and take action. ## Why Build a Terminal AI Agent? Existing AI interfaces felt disconnected from my actual workflow. I spend most of my day in the terminal, and switching to a browser or GUI to ask an AI for help created friction. More importantly, I wanted to understand: - How do AI agents decide when to use tools versus just respond? - How do you manage context windows that can hold millions of tokens? - What makes tool execution safe and reliable? - How does the Model Context Protocol actually work? The best way to learn was to build. ## The ReAct Pattern: Reasoning + Acting The core of Clarissa is the ReAct (Reasoning + Acting) pattern. This isn't some complex neural architecture; it's a surprisingly simple loop: ```typescript async run(userMessage: string): Promise { this.messages.push({ role: "user", content: userMessage }); for (let i = 0; i < maxIterations; i++) { // Get LLM response const response = await llmClient.chatStreamComplete( this.messages, toolRegistry.getDefinitions() ); this.messages.push(response); // Check for tool calls if (response.tool_calls?.length) { for (const toolCall of response.tool_calls) { const result = await toolRegistry.execute( toolCall.function.name, toolCall.function.arguments ); this.messages.push({ role: "tool", tool_call_id: toolCall.id, content: result.content }); } continue; // Loop back for next response } // No tool calls = final answer return response.content; } } ``` The LLM doesn't "decide" to use tools in some mysterious way. You send it available tool definitions, and it responds with either a message or a request to call specific tools. You execute those tools, feed the results back, and repeat until it responds without tool calls. ![A diagrammatic visualization of the ReAct (Reasoning + Acting) loop, showing the cyclical nature of the LLM deciding to use a tool, getting results, and looping back.](/images/blog/generated/building-clarissa-ai-terminal-assistant-a-diagrammatic-visualization-o-1765150787749.jpg) This loop is the entire agent. Everything else is infrastructure around it. ## What I Learned About Tool Design The most interesting challenge was designing tools that are both useful and safe. Early versions had tools that were too granular (read a single line) or too powerful (execute arbitrary code). The sweet spot required iteration. ### Tool Confirmation Potentially dangerous operations need confirmation. But what's "dangerous"? I settled on this heuristic: - **No confirmation**: Reading files, listing directories, viewing git status - **Confirmation required**: Writing files, executing shell commands, making commits ```typescript interface Tool { name: string; description: string; ### The Tool Registry Pattern Rather than hardcoding tools, I built a registry that tools register themselves into: ```typescript class ToolRegistry { private tools: Map = new Map(); register(tool: Tool): void { this.tools.set(tool.name, tool); } getDefinitions(): ToolDefinition[] { return Array.from(this.tools.values()).map(toolToDefinition); } async execute(name: string, args: string): Promise { const tool = this.tools.get(name); const parsedArgs = JSON.parse(args); const validatedArgs = tool.parameters.parse(parsedArgs); return await tool.execute(validatedArgs); } } ``` This pattern made MCP integration trivial. When connecting to an MCP server, I just convert its tools to my format and register them: ```typescript const tools = mcpTools.map((mcpTool) => ({ name: `mcp_${serverName}_${mcpTool.name}`, description: mcpTool.description, parameters: jsonSchemaToZod(mcpTool.inputSchema), execute: async (input) => client.callTool({ name: mcpTool.name, arguments: input }), requiresConfirmation: true // MCP tools are external })); toolRegistry.registerMany(tools); ``` ## Context Management: The Underrated Challenge Context windows are measured in tokens, but managing them well requires more than counting. Here's what I learned: ### Token Estimation You can't send requests to the API just to count tokens. You need local estimation: ```typescript estimateTokens(text: string): number { // Rough approximation: ~4 chars per token for English return Math.ceil(text.length / 4); } estimateMessageTokens(message: Message): number { let tokens = 0; if (message.content) tokens += this.estimateTokens(message.content); if (message.tool_calls) { for (const tc of message.tool_calls) { tokens += this.estimateTokens(tc.function.name); tokens += this.estimateTokens(tc.function.arguments); } } return tokens + 4; // Role overhead } ``` ![A conceptual illustration of token management and smart truncation, visualizing how older messages fade away while keeping atomic groups of data intact.](/images/blog/generated/building-clarissa-ai-terminal-assistant-a-conceptual-illustration-of-t-1765150803838.jpg) ### Smart Truncation When approaching the limit, you can't just drop the oldest messages. Tool calls and their results must stay together, or the LLM gets confused: ```typescript truncateToFit(messages: Message[]): Message[] { // Group messages into atomic units // User message -> Assistant response -> Tool results const messageGroups: Message[][] = []; // Keep system prompt, add groups from newest to oldest // until we hit the limit for (const group of reversedGroups) { const groupTokens = group.reduce((sum, msg) => sum + this.estimateMessageTokens(msg), 0); if (totalTokens + groupTokens <= availableTokens) { toAdd.unshift(...group); totalTokens += groupTokens; } } } ``` This was one of those bugs that took hours to track down. The LLM would suddenly start hallucinating tool results because it could see a tool call but not the corresponding result. ## Building with Ink: React for the Terminal Choosing Ink (React for CLIs) was initially just curiosity, but it proved invaluable. Terminal UIs have the same state management challenges as web UIs: ```tsx function App() { const [messages, setMessages] = useState([]); const [isThinking, setIsThinking] = useState(false); const [streamContent, setStreamContent] = useState(''); const handleSubmit = async (input: string) => { setIsThinking(true); await agent.run(input, { onStreamChunk: (chunk) => setStreamContent(prev => prev + chunk), onToolCall: (name) => setMessages(prev => [...prev, { type: 'tool', name }]) }); setIsThinking(false); }; return ( {messages.map(msg => )} {isThinking && } {streamContent && } ); } ``` The streaming response visualization was particularly satisfying. Tokens appear as they arrive, giving users immediate feedback that something is happening. ## The Memory System: Persistent Context Sessions persist conversation history, but users also wanted to tell the agent facts it should always remember: ```typescript class MemoryManager { async add(content: string): Promise { const memory = { id: this.generateId(), content: content.trim(), createdAt: new Date().toISOString(), }; this.memories.push(memory); await this.save(); return memory; } async getForPrompt(): Promise { if (this.memories.length === 0) return null; const lines = this.memories.map((m) => `- ${m.content}`); return `## Remembered Context\n${lines.join("\n")}`; } } ``` Memories get injected into the system prompt. Simple, but it transforms the experience. Tell Clarissa once that you prefer TypeScript over JavaScript, and it remembers across every session. ## MCP Integration: Extending Without Modifying The Model Context Protocol was the final piece. Rather than building every possible tool, Clarissa can connect to external MCP servers: ```bash /mcp npx -y @modelcontextprotocol/server-filesystem /path/to/directory ``` The integration was straightforward once the tool registry pattern was in place. The challenge was converting JSON Schema (what MCP uses) to Zod (what I use internally): ```typescript function jsonSchemaToZod(schema: unknown): z.ZodType { const s = schema as Record; if (s.type === "object" && s.properties) { const shape: Record = {}; for (const [key, propSchema] of Object.entries(s.properties)) { shape[key] = jsonSchemaToZod(propSchema); } return z.object(shape); } if (s.type === "string") return z.string(); if (s.type === "number") return z.number(); if (s.type === "boolean") return z.boolean(); if (s.type === "array") return z.array(jsonSchemaToZod(s.items)); return z.unknown(); } ``` ## Key Learnings Building Clarissa taught me several things that weren't obvious from using AI tools: **Agents are loops, not magic.** The ReAct pattern is elegant in its simplicity. The complexity is in the infrastructure around it: streaming, context management, tool safety. **Tool design is UX design.** The tools you provide shape what the agent can do. Too few and it's limited. Too many and it gets confused. The sweet spot requires iteration. **Context windows are precious.** Even with million-token windows, you can exhaust them quickly. Smart truncation and memory systems extend useful context far beyond raw limits. **Streaming matters.** Users hate staring at a blank screen. Showing tokens as they arrive transforms the experience from "is this broken?" to "I can see it thinking." **Confirmation builds trust.** Letting users approve dangerous operations doesn't just prevent mistakes; it changes how they interact with the agent. They're more willing to ask for ambitious tasks. ## Try It Yourself Clarissa is open source and available on npm: ```bash bun install -g clarissa # or npm install -g clarissa ``` Set your OpenRouter API key and you're ready to go: ```bash export OPENROUTER_API_KEY=your_key_here clarissa ``` The source code is at [github.com/cameronrye/clarissa](https://github.com/cameronrye/clarissa), and the documentation at [clarissa.run](https://clarissa.run) covers everything from basic usage to MCP integration. --- *Building Clarissa was one of the most educational projects I've undertaken. If you're curious about how AI agents work, I encourage you to build one yourself. The gap between "using AI tools" and "understanding AI tools" is smaller than you might think.*