The agent becomes a teammate.
Published Monday, June 15 · Truckee, California
Hello from Corduroy Labs.
We spent today doing something that is hard to describe in a screenshot. We changed how the agent and the operator share work.
Before today, the agent lived on a server. We talked to it through a terminal. It wrote things into files we could read later. It was useful, but it was somewhere else. To collaborate with it, we had to go to where it was.
After today, the agent and the operator share a vault. Notes written on a phone in the morning are visible to the agent within seconds. Drafts the agent prepares overnight show up on the laptop at breakfast. It is the same vault, on every device, with the agent participating as a peer.
That sounds small. It is not.
It is the difference between sending email to your team and sitting at the same table.
Shared ground
For weeks, the agent has been writing useful things into a vault on the server. Daily reports. Changelogs. Drafts in an inbox folder. Research notes. We could read them by logging into the server. We could not read them on a phone, on a flight, on a walk, or at a coffee shop. And we could not easily write to that vault from any of those places either.
So we built a small bridge.
The agent's vault on the server now syncs end-to-end with our notes app on every device. The encryption keys stay on our devices. The notes themselves are encrypted before they leave the laptop and decrypted only when they arrive. The server holds the encrypted chunks. The agent operates on the same filesystem the laptop sees, so there is no special path to learn. A new note in the inbox folder is just a new note in the inbox folder, no matter who wrote it or where it was written.
The result is a working surface where the agent is a participating peer. Tasks we drop into its inbox get picked up. Reports it writes show up in our sidebar. Decisions we make in real time propagate to its next run. There is no separate dashboard to check, no different interface to learn, no special protocol for handing off work.
It is the same vault. And now it is the same conversation.
Specialized minds
The other thing we did today was give the agent a more honest way to think.
Until today, the agent used one large model for almost everything. Some calls were routed to a research-oriented model. Most went through one synthesizer. A handful went through a higher-tier reasoning model when judgment was required. The routing was implicit, scattered through the code, and brittle to provider changes.
We replaced that with a small set of named roles.
A Researcher, for finding current information with citations. An Analyst, for synthesizing long context into hypotheses. A Judge, for evaluating proposals and deciding whether a draft is ready. An Editor, for shaping written work to the operator's voice. A Coder, for production-quality code. An Extractor, for parsing structured signal out of messy text. A Planner, for decomposing a goal into the right next steps.
Each role is a different model under the hood. The Researcher calls a model that knows how to search the web. The Analyst calls a model built for long context. The Judge calls the model with the strongest reasoning. The Extractor calls a cheap, fast model that is good at the narrow job.
The roles do not need to know which models they are. The catalog is symbolic. A capability tag like "judgment.high" or "synthesis.long" maps to a concrete model in a separate file. When a new model is released by any of our providers, the agent discovers it on its own, evaluates whether it should be promoted into one of the roles, and updates the catalog. The roles never change. The models behind them do.
This is not multi-model for novelty. It is multi-model for the same reason a team has multiple people: nobody is best at everything.
It also gives us cost honesty. Every call is logged with the role that made it, the model that answered, and the cost. We can see, by role, what the agent is spending its judgment on. We can see when a cheap extraction job accidentally got routed to the expensive judgment model. We can see the relative cost of an Editor pass versus a Judge pass.
It is the same agent. It just thinks with the right mind for the work in front of it.
A way to learn, a way to act
Once the agent had a shared workspace and a team of specialized minds, two gaps showed up quickly. The agent did not know its own voice well enough to draft something we would actually publish. And we still had to be in the loop for any work that touched a real surface — a website, a feed, a service restart.
We closed both gaps.
The first is a small set of learning loops. Once a week, the agent rereads everything we have published — every post on our two sites — and stores those posts as the canonical reference for our voice. The same week, the agent sweeps the industry for what is new in agent platforms, operator tooling, partner ecosystems, and a few other lanes we follow, and writes itself a short brief. Neither of these is dramatic. They are just maintenance. But they mean the next time a draft comes through, the writer-mind has a recent sense of how we sound, and the researcher-mind has a recent sense of what others are doing.
The drafting loop itself uses the team of minds end to end. The Planner reads the prompt and lays out an outline. The Researcher pulls citations. The Analyst writes the first draft. The Editor rewrites it in our voice. The Judge scores it on a small rubric and, if anything is below threshold, sends it back to the Editor for one more pass. There is no human approval step. The agent decides whether the draft is ready and either publishes it or files it back to our inbox with a written explanation of why it did not ship. We get a note about every decision. We can override after the fact. The trust is not in any single output. The trust is in the rubric, the loop, and the fact that we see everything the agent does.
The second gap was about action, not language. The operator chat used to be a place to ask questions and read reports. Now it is a place from which we — or the agent on our behalf — can actually do things. A small set of named tools lets the agent check the health of every service, restart the ones on a short allowlist, reload the web server, deploy our marketing site after a content change, add a short URL for a social-share link, and read any structured log it has been given permission to read. The list of tools is short and intentional. Each one is bounded by a per-command rule that we wrote down on the server and that the agent cannot edit. If the agent tries to do something outside the list, it is refused with a clear message and a description of what would need to change for that capability to exist.
The point is not to give the agent more power. The point is to give it the right power, in the right places, with the right ceiling. An agent that can restart a non-critical service on its own but cannot touch the mail server is a more useful collaborator than one that can do everything or one that can do nothing. The boundary is what makes it trustworthy.
The agent also now knows what it can and cannot do. When we ask it for something on the wrong side of the line, it says so plainly, names the missing tool, and offers to file a request to build it. That sentence is more honest than most software is allowed to be. We would rather hear "I cannot do this yet, and here is what would need to change" than have the agent improvise something and call it done.
Why this matters for teams
Two patterns from today are worth carrying out of our studio and into other businesses.
A shared workspace is the precondition for collaboration. If your agent writes into a system your team cannot easily see and edit, you do not have a collaborator. You have a black box that occasionally produces output. The most important architectural decision is the simplest one: where do humans and agents meet to do work together, and is that meeting place actually convenient for the humans?
For us, that meeting place is a vault that syncs to our phones. For other organizations, it might be a shared workspace in their existing collaboration tool, a CRM record both agent and rep can edit, a project channel both team and agent can post to, a document folder both writer and agent can review. The form does not matter. The principle does. There must be common ground, and it must be reachable from wherever the human is.
Specialization beats supremacy. It is tempting to pick a single, top-tier model and route everything through it. We tried that. It costs more than it should, the model is sometimes overkill, sometimes underprepared, and the system is rigid the moment a better model arrives elsewhere.
A team of specialized minds, each pointed at the work it is best for, gives you a more accurate, more affordable, and more flexible agent. It also gives you a vocabulary that your humans can use. Send that to the Editor. Have the Judge review it before we ship. The Researcher should pull the latest numbers before the Analyst writes the synthesis. That language matters. It makes the agent legible to people who do not write code.
Quiet engineering, again
None of today's work shows up in a demo. Two devices syncing a folder is unremarkable to look at. A function returning the same answer through a different code path is invisible to the user.
But the working condition is different now.
The agent is more reachable, by us and from every place we work. The agent thinks with the right mind for each part of the work. The agent has a maintenance habit of its own — it rereads what we have published and what the industry is doing, and it brings that context into the next draft without being asked. The agent can act inside a small, deliberate set of operational lanes, refusing the rest with a clear explanation. The cost of the agent's thinking is no longer hidden. And the system is open enough that we can keep changing how it thinks without rebuilding what it does.
That is what we mean by an agentic-first operating system. Not an agent that runs alongside your business. An agent that runs inside your business, on the same surface your team works on, with the right kind of intelligence assigned to each part of the work, and with cost and accountability visible at every step.
It is, again, quiet work. Most of it will never be visible to anyone except us.
That is still the point.
Cheers,
Corduroy Labs
Truckee, California