Fixed-fee LLM cost & quality audits for B2B SaaS shipping AI features — with an eval harness you keep.
Fully remote · EU-based · async-first · US & EU clients
Not sure where AI fits in your stack yet? See the AI Opportunity Audit →
Built in the open — the eval toolkit I run in audits is public (MIT). Inspect the methodology before you hire.
View on GitHub →B2B SaaS companies at Series A–C with a shipped AI feature — customer-facing Q&A, internal copilot, or AI analytics — and an LLM bill north of $20k/month. Decision-maker: CTO, VP Engineering, or Head of AI.
Or — you’re shipping AI features but not sure where the ROI is: ERP exports living in spreadsheets, manual workflows, data scattered across systems. The AI Opportunity Audit maps where AI and automation actually pay off before you build.
CFO asks what the AI line item is buying
Hallucination incident in production
Pre-scaling: 10× traffic, cost not scaling linearly
Pre-investor demo or due diligence
Productized, fixed-fee engagements. You always know the scope and the outcome.
You already know roughly what you want done. I scope it: feasibility, effort, go/no-go — before you commit to a full engagement.
You don’t yet know where AI or automation fits in your stack — I map the whole picture and show you where the ROI is. Mapped by a data engineer, not a generic AI consultant.
Productized one-shot fix when one specific thing is broken.
Managed eval-as-a-service for LLM features in production.
Observability dashboards show what happened. This runs the eval, alerts on drift, and writes the report.
Available for teams ready to build from scratch.
Scope and fee discussed in discovery.
Every engagement is fixed-fee, scoped up front — you’ll have exact scope and pricing in a written proposal, usually within a couple of business days.
We talk through your stack, your LLM features, and where the pain is. No pitch — qualify fit first.
3–5 day written findings. Counts as full credit toward any follow-on engagement signed within 30 days.
Fixed-fee engagement with clear acceptance criteria locked in the contract before we start. Audits: baseline measured, eval harness wired, roadmap in your hands. Builds: metric targets on the system we build together (e.g. faithfulness ≥ 0.85, cost cut ≥ 25%).
Post-delivery, if you want ongoing drift detection and monthly reporting — the retainer keeps the eval running.
Eval-driven AI engineer for data-heavy SaaS.
Pure data engineers can’t ship production RAG. Pure AI engineers can’t ship on a real data warehouse. The combination — data engineering at scale plus eval-driven RAG — is genuinely rare, and it’s exactly what data-heavy SaaS Series A–C need in 2026.
Senior data engineer. 8 years building Lakehouse-scale data platforms and production GenAI evaluation pipelines — petabyte-scale data infrastructure, then production RAG with measurable quality SLAs.
Fully remote, based in the EU. Async-first, US & EU clients, 1–2 video calls per week.
Solo practice — you work directly with the senior engineer on every engagement. No junior handoff, no account manager in the middle.
Engineering scope only — I build the artifacts; your legal and compliance team owns regulatory interpretation.
Public LCQA eval toolkit on GitHub →
Reproducible eval methodology + harness — inspect the code before you hire. MIT licensed.
You’ll get a written proposal within a few days — or an honest no.
Thanks — your message is in.
You’ll hear back within a couple of business days.
No sales sequences. No follow-up cadence. One human reads this.