3.1 Text · $6,600/mo
Current path: OpenAI gpt-4o for classify, extract, and translate.
Recommendation: Route through Groq (Llama 3.3 70B) as primary. Cerebras (Qwen 2.5 72B) as failover. Anthropic Sonnet 4.6 as final fallback.
Quality vetting: Llama 3.3 70B benchmarked within 2-4 points of Sonnet 4 on MMLU and within margin on Customer A's actual classify prompts (94.8% vs 95.6% accuracy across 200 sampled requests).
Cost delta: $6,600 → $1,320/mo · 80% reduction
3.2 Vision · $3,000/mo
Current path: OpenAI gpt-4o vision for image classification, OCR, and hero ranking.
Recommendation: Vertex Gemini 2.5 Flash. Among the cheapest production-grade vision models. Anthropic Sonnet vision as failover.
Quality vetting: Gemini 2.5 Flash classified Customer A's image batches at 95.4% accuracy vs gpt-4o's 96.1% across 500 sampled images. Within margin for the use case.
Cost delta: $3,000 → $900/mo · 70% reduction
3.3 Grounded search · $960/mo
Current path: OpenAI gpt-4o + a hand-rolled web-fetch wrapper.
Recommendation: Move to Vertex Gemini grounded search. It returns Google Search citations natively and the cost premium vs ungrounded is minimal. Don't roll your own with Bing API + RAG.
Cost delta: roughly flat. Major win is reliability, not cost.
3.4 Brand-visible writing · $600/mo
Current path: OpenAI gpt-4o for customer email + marketing copy.
Recommendation: Switch to Anthropic Claude Sonnet 4.6. Tone control matters here and the cost difference is small relative to total bill. Do NOT downshift this category to a cheaper text provider — brand voice is the asset that's most expensive to repair if it drifts.
Cost delta: roughly flat. Quality upgrade, no spend change.
3.5 Embeddings · $840/mo
Current path: OpenAI text-embedding-3-small.
Recommendation: Keep on text-embedding-3-small. It's already on the cheap end and re-embedding the corpus to swap providers costs more than a year of savings. No change.
Cost delta: $0.