"Laptop stolen"
Revoke session, rotate SSH key, rotate API key. Time yourself end-to-end. If you can't do it in 15 minutes from memory, your runbook is incomplete.
Stolen laptop. Locked Anthropic account. Corrupted ~/.claude/. Dead droplet. Blown MCP auth. An overzealous edit that wiped a memory file. Each of these has a procedure; this page is the catalogue.
Read once now. Practice the lock-down checklist next time you're bored on a flight. The middle of an actual incident is the wrong moment to learn what your backup strategy was.
| Disaster | Likelihood | Blast radius | Practiced response time |
|---|---|---|---|
| Laptop stolen / lost | Once a decade | SSH key + auth.json compromised; need to rotate | 15 min |
| Account locked / suspended | Rare | Claude.ai access; managed agents; mobile | Days (Anthropic timeline) |
Corrupted ~/.claude/ | A few times a year | Settings/skills broken on one machine | 5 min |
| Lost API key | Annual | Local CLI on servers; CI | 10 min |
| MCP auth expired | Monthly-ish | One MCP tool stops working | 2 min |
| Dead droplet | Once per provider lifetime | All public sites; remote claude | 1–2 hours rebuild |
| Dead NAS / failed RAID | Every 5–8 years | Backup chain; queue files | Hours to days |
| Memory file overwritten | Rare but real | Lost a memory; subtle behaviour drift | 2 min if versioned |
| Sync conflict on settings.json | Common | Hook silently stops firing | 10 min |
The point isn't that all of these will happen. The point is that none of them feel survivable when they happen and you don't have a plan. The procedures below are the plans.
~/.claude/WholeTech already runs a 6-leg backup pattern for the websites tree (droplet, B2, GitHub, Drive, HS NAS, CC NAS). Apply the same pattern to your Claude config and you get disaster recovery basically for free.
| Leg | What lives there | How fresh |
|---|---|---|
Primary PC ~/.claude/ | Live, edited daily | Now |
| GitHub (private repo) | Whatever's been pushed | Last commit (target: same day) |
| Secondary PCs | Last git pull | Last nightly pull (within 24h) |
| B2 bucket | Compressed snapshot | Nightly |
| Google Drive | Rclone mirror | Nightly |
| HS NAS + CC NAS | Local mirror | Nightly |
The hierarchy is:
~/.claude/ to a private GitHub repo. That single leg covers 95% of recovery scenarios — corrupted config, lost machine, mistake-rolling-back. The other five legs are insurance for the bad 5%.
Treat as compromised. The threat isn't "they might find the files" — full-disk encryption handles that. The threat is "they might be holding a logged-in machine right now." Lock down accounts the laptop had access to, in order of value.
~/.ssh/authorized_keys, delete the laptop's public key. From any other machine in the fleet that still has access. Verify: ssh -i missing-key root@wholetech.com should now fail.~/.secrets/godaddy.env on the remaining PCs.claude login, replace the per-PC secrets from your secrets vault).If every machine has its own SSH key (laptop has laptop-ed25519, desk PC has desk-ed25519, etc.), rotation is one line in ~/.ssh/authorized_keys on the droplet. If they all share one key, you're rotating one key on every machine. Always per-machine.
Rare. Happens if Anthropic flags unusual activity, payment failure, or a violation. The frustrating part: there's no instant fix; you're on Anthropic's support timeline. But you can keep working in the meantime.
codex login --api-key or ANTHROPIC_API_KEY=.... The CLI keeps working even if your ChatGPT-style web account is locked.A bad JSON edit; a Dropbox conflict; a permissions glitch; an OS upgrade that scrambled file ownership. The most common shape: Claude starts and immediately errors on parsing settings.json, or hooks silently stop firing.
claude --version. If this works, the CLI binary is fine. If not, reinstall.cat ~/.claude/settings.json | jq .. If jq errors, your settings.json is invalid JSON. That's by far the most common cause.claude --debug. The verbose logs usually point at the broken file.~/.claude/backups/settings.json.<timestamp> before saving. Roll back to the most recent.$ cd ~/.claude $ git status # see what changed $ git checkout HEAD settings.json # revert the file $ git log -p settings.json # see history; pick a commit to roll to
# on another machine in the fleet, copy the file across $ scp other-pc:.claude/settings.json ~/.claude/settings.json # or pull from your B2 snapshot $ rclone copy b2:walhus-backups/claude-config/latest/settings.json ~/.claude/
Move the whole ~/.claude/ aside (don't delete — you might need to copy memory back later), re-run claude login to create a fresh one, then layer your dotfiles back on top.
$ mv ~/.claude ~/.claude.bak $ claude login # creates a fresh ~/.claude/ $ rm -rf ~/.claude $ git clone <repo> ~/.claude $ cp ~/.claude.bak/auth.json ~/.claude/ # or just re-login
laptop-2026-05, droplet-cron, ci-builds.ANTHROPIC_API_KEY env var, each CI job's secret, the droplet's /etc/environment, any rclone-driven config sync.<host>-<purpose>-<year-month>. droplet-cron-2026-05. laptop-cli-2026-05. Makes audit and rotation trivial.
Almost always auth-related. Google OAuth tokens refresh quietly until they don't; GitHub PATs expire; database passwords change. Symptoms: Claude says it can't reach that MCP tool, or the tool list is shorter than usual.
/mcp inside Claude — the status of each is shown. Red/yellow = auth issue.claude mcp auth <server-name> opens the auth flow. For token-based ones (GitHub, Postgres), update the env var or the secret file.~/.claude/mcp-needs-auth-cache.json — Claude tracks which servers are pending auth here. If it's stale, delete it; it regenerates.| MCP server type | Token lifetime | Renewal |
|---|---|---|
| Google OAuth (Drive, Gmail, Calendar) | Refresh tokens last months; access tokens auto-refresh | Mostly silent; re-auth every 6-12 months |
| Slack OAuth | Long-lived | Rarely needs intervention |
| GitHub PAT (classic) | You set it; up to 1 year | Calendar this; rotate on a schedule |
| GitHub PAT (fine-grained) | Up to 1 year, default 90 days | Renew via console; update env vars |
| Postgres / DB | Until someone rotates the password | Update DATABASE_URL env var |
DigitalOcean (or whoever) is having a bad day; the droplet was rebuilt accidentally; the disk filled and nginx OOM'd. Distinguish "down" (will come back) from "lost" (need to rebuild).
journalctl -u nginx -n 100.df -h → journalctl --vacuum-size=200M to free /var/log; trim old /var/www/*/sessions/ dirs.rclone copy b2:walhus-backups/var-www/ /var/www/.apt install nginx certbot python3-certbot-nginx. Restore vhosts from the backup of /etc/nginx/sites-available/.certbot --nginx -d <domain> for each — DNS already points at the new IP, so the HTTP-01 challenge works.npm i -g @anthropic-ai/claude-code. Set ANTHROPIC_API_KEY in /etc/environment./etc/cron.d/ or the user crontab dump.curl -sI https://<each-site> → all 200. ssh root@wholetech.com 'claude --version'.Drive failure; controller failure; power surge. Less acute than a dead droplet (no public services lose access) but slower to fully recover (drives to source, RAID to rebuild, hours to days of resync).
rclone copy b2:walhus-backups/ /mnt/nas/restore/.The Claude angle: if your task queue lives on this NAS, queue work pauses until restore. Workers should keep heartbeating; new tasks queue at the orchestrator until the queue layer is back.
A particularly insidious failure mode: a memory file gets overwritten or deleted, and you don't notice for a week — by which time Claude has been operating without that context. By then the symptom is "Claude keeps making the mistake I corrected last month."
cd ~/.claudegit log -p projects/<slug>/memory/ — see what changed, when.git checkout <good-commit> -- projects/<slug>/memory/<file>.mdMEMORY.md still references the restored file. If you'd already removed the index entry, restore that too.Both services keep file version history. OneDrive: right-click the file in Explorer → Version history. Dropbox: right-click → Version history (or web UI → the file → Version history). Restore the version from before the mistake.
This is why I keep saying "version your memory." If you've lost a memory and have no version history, reconstruct from:
~/.claude/projects/<slug>/history.jsonl — find the session where the memory was first written; Claude probably summarised it back to you at the time.The cheapest disaster is the one you catch a minute after it happens, not a week. Three lightweight monitors that pay back hundreds of times their setup cost:
Every machine writes $(hostname) $(date) to a shared file (NAS or droplet) every 10 minutes. A separate process checks the file every hour; any host whose heartbeat is >30 min old fires an alert. Catches "the scheduled task on the laptop quit working" within an hour.
# cron entry, every 10 minutes */10 * * * * echo "$(hostname) $(date -Iseconds) alive" >> /mnt/nas/fleet-heartbeats.log
Once a day, on each machine, validate that ~/.claude/settings.json parses. If it doesn't, ping the notification channel.
*/60 * * * * jq . ~/.claude/settings.json > /dev/null 2>&1 || curl -d "settings.json broken on $(hostname)" ntfy.sh/walhus-claude
The droplet running out of disk has been the cause of more weekend incidents than anything else. df -h at 09:00 daily; warn if /var is >80%.
Once a quarter, practise. Pick a scenario, set a timer, run the procedure. You'll find out which steps are stale, which secrets you can't actually locate, which backups didn't actually back up.
Revoke session, rotate SSH key, rotate API key. Time yourself end-to-end. If you can't do it in 15 minutes from memory, your runbook is incomplete.
On a spare VM: wipe ~/.claude/, restore from scratch using only the backups. Identify which files needed manual recovery (auth.json, secrets) and document the gap.
Spin up a new droplet, restore the websites tree from B2, re-issue certs, verify all sites. If a domain doesn't come back, you know about a hole in your backup before you really needed it.
The scheduled task disabled itself silently after a permission change. Your "nightly" backup is six weeks stale. Heartbeat your backup jobs — not just their existence, their last-success time.
You restored everything except the godaddy.env file because it was correctly excluded from git. Now you can't change DNS. Maintain a per-PC secrets manifest (1Password, Bitwarden) separate from the git-tracked config.
Your settings.json from six months ago references MCP servers and skills that have since been renamed. Restore-and-fix is sometimes faster than restore-and-pray.
The B2 access key was on the laptop that died. Always keep at least two paths to your backup storage — primary access key on machine A, recovery key in 1Password.
The MCP auth has been broken for three weeks; you've been working around it. By the time you fix it, you've forgotten what the working state looked like. Fix soon; document while it's fresh.
Don't restore the entire ~/.claude/ when only one file is corrupt. Surgical restores are faster and don't risk reverting unrelated good changes.