Tab completion is the cheap half: how skillterm decouples expensive generation from interactive lookup

skillterm / neullabs · May 10, 2026 ·

shell-integrationperformancearchitecture

There is a natural tension in any LLM-backed CLI tool. Generation is expensive: it costs seconds, dollars, and a network round-trip. Interactive use is supposed to be invisible: pressing Tab in the shell has a budget measured in tens of milliseconds before the user notices a hesitation. If you wire the LLM directly into the Tab path, you lose the interactive feel; if you do not, you have to invent a way for cheap Tab presses to benefit from expensive generation.

skillterm’s answer is a deliberate split between generation time and Tab time, with a written-to-disk SKILL.md as the contract between them. This post walks through how that split is wired in the codebase, why each side gets the cost shape it has, and what the resulting user experience feels like.

Two different call paths

The skillterm CLI has two subcommands that look superficially similar but live on opposite sides of the generation/Tab divide.

skillterm generate <tool> is the expensive path. It loads the bootstrap skills, invokes the headless agent runtime (Claude Code or Codex), and waits for the agent to do its multi-step reasoning: classify the tool, search the web, fetch documentation, parse the result, write a SKILL.md, return. The documented latency budget here is generous — 30 to 60 seconds for a simple skill, 60 to 120 seconds for a SaaS skill that requires web fetches, up to 300 seconds for complex generations. The timeout_secs field in the [agent.claude-code] block of config.toml defaults to 300 and is meant to be raised, not lowered.

skillterm complete <line> [cursor_pos] is the cheap path. It is called by the shell hook every time you press Tab. Its job is to figure out which command is being typed, find the matching SKILL.md, and return contextual completions. It has no LLM in its critical path. It is a file read, a parse, and a relevance match.

The two subcommands meet on disk, at ~/.skillterm/skills/<tool>/SKILL.md.

How the shell hook actually attaches

skillterm init <shell> emits the shell integration script. For zsh and bash, the output is meant to be eval’d:

eval "$(skillterm init zsh)"

For fish, it is sourced:

skillterm init fish | source

The emitted script binds Tab to a function that calls skillterm complete <full-line> <cursor-pos> with the current shell context. The hook is small on purpose: the heavy lifting happens in the complete subcommand, not in shell script. That keeps the shell-specific code minimal and the cross-shell behaviour consistent.

The --shell flag on skillterm complete is auto-detected but can be overridden, which matters because the shell determines tokenisation rules. zsh and bash differ in how they split arguments around spaces, quotes, and equals signs; fish has its own conventions. skillterm has to know which shell is calling it so completion suggestions parse back into the line correctly when the user accepts one.

A --timeout <ms> flag on complete defaults to 5000ms. That ceiling is generous for a file-read-and-parse path, but it is a safety net for pathological cases: a corrupted skill file, a slow disk, a misconfigured registry lookup. The hook never blocks the shell indefinitely.

What the cheap path does not do

It does not call the LLM. It does not hit the network. It does not invoke the agent runtime. Every one of those would re-introduce latency that breaks the interactive feel.

This is a load-bearing constraint. The temptation to “just call the LLM if no skill matches” is real, and it is the wrong move. The cost of generation is high enough — and the variance high enough — that putting it on the Tab path would make completion feel broken at the median and unusable at the tail. Better to fall through to the shell’s default completion than to introduce a 5-second pause.

That is why the docs are explicit about offline behaviour. Tab completion does not require network. The skills cached locally work without internet; the only operation that requires connectivity is generation, and even then only when web search or web fetch are enabled in the permission block.

What the expensive path does not do

It does not run on every Tab. It runs when you explicitly ask for it: skillterm generate kubectl. Or when you install a skill from a registry (skillterm install kubectl), which is just downloading a pre-generated SKILL.md from someone else’s expensive path. Or, in a future flow described in the Vision document, on-demand when an agent encounters an unknown tool and asks skillterm to generate one.

This means the generation flow has time to do good work. It can afford to load both bootstrap skills as context. It can afford to make multiple web requests. It can afford to retry. It can afford to take 90 seconds. None of that latency reaches the interactive surface.

Why the on-disk artefact matters

The split would not work if the artefact between generation and Tab were an in-memory cache. Caches expire, processes restart, and the user has to re-pay generation cost on the next cold start. The artefact has to live on disk.

That is why SKILL.md is a file and not a data structure. Once skillterm generate kubectl has written ~/.skillterm/skills/kubectl/SKILL.md, the cost of generation is sunk. Every subsequent Tab on kubectl reads the same file. The file survives reboots. It survives skillterm upgrades. It survives shell changes. The expensive operation happens once; the cheap operation happens forever.

The skill-storage docs make the location strategy explicit: skillterm looks in ./.skillterm/skills/ (project-local), then ~/.skillterm/skills/ (user-level), then /usr/share/skillterm/skills/ (system-level). The project-local path takes precedence, which is useful: a repository can pin a specific skill version by checking it into the project tree. The user-level path is the default destination for skillterm generate. The system-level path is for packaged distributions to pre-seed common skills.

A consequence: skills become a shareable artefact

Because the expensive output is a single Markdown file with a published format, skills can be shared. The CLI has skillterm install <name> to pull from a registry, skillterm install gh:user/repo to pull from GitHub, and skillterm publish to push your own. The reference docs describe versioned installs (kubectl@1.2.0) and GitHub subdirectory installs (gh:org/skills/aws-deploy@v1.0.0).

This is a direct consequence of the split. If the cheap path needed something more than a file — a service, a database, a custom binary — then sharing would be hard. Because the cheap path just reads a Markdown file, sharing is git clone or curl. The same property that makes the on-disk artefact survive a reboot makes it survive a transfer.

The user experience this produces

The end-state user experience: you spend 60 seconds once to teach skillterm about a tool. From then on, every Tab press in any shell on that command is contextual, fast, and free. There is no model in the loop at interactive time. There is no network in the loop at interactive time. The expensive LLM call has been amortised across every future invocation, indefinitely.

The polished version of this idea is the one the Vision document gestures at: skills come from a registry pre-generated, you skillterm install kubectl and skip your own expensive path entirely, and the expensive operation has happened on someone else’s machine. From your side, all the work feels free.

Which it should. The cheap half is supposed to be the half you live in.