About

Who you're hiring.

Operator-built, not agency. I run the same routing stack on my own consumer app — Bailar, a Latin dance discovery app on iOS, Android, and bailar.site. The audit is the same playbook that cut my LLM bill ~60% versus the OpenAI-only baseline I started with.

Paul Plawin

I'm a solo founder. Bailar classifies ~40,000 events per month with text LLMs and runs ~10,000 vision-model calls per month for photo validation, hero ranking, and OCR. The whole stack is built and maintained by one person, which means every cent on the LLM bill is mine to feel.

The current routing — Groq → Cerebras → Anthropic for plain text, Vertex Gemini Flash for vision, Anthropic Sonnet for brand-visible writing, with retry/fallback on every chain — is the result of a year of trial and error. The audit is that learning, productized for other AI startups.

Why this audit exists

I burned a long weekend last spring on a Google Cloud free-tier suspension that was entirely my fault — multi-account quota stacking on Vertex during a scrape run. Re-architecting around it forced me to actually understand how the multi-provider routing worked: what quality you keep, what you concede, where the failover thresholds should land.

The pattern I came out with cuts 40–60% on most seed-to-Series-A AI startups. The blockers aren't technical — they're "we haven't had time to look at it." That's exactly the gap this audit fills: one engineer-week of someone else's time, fixed price, refund-backed.

What I won't do

Reach me

Email: paul@aimargin.dev

LinkedIn: linkedin.com/in/paulplawin

Bailar: bailar.site (the consumer app whose stack the audit is based on)

Ready when you are

A 15-minute call to confirm fit before either side commits. No deck, no questionnaire — just a conversation about what your stack looks like today.