전체 프롬프트
인포그래픽 & 차트

langchain and fireworks just shipped the eval move worth ste

Real prompt shared by @rohit4verse on X

langchain and fireworks just shipped the eval move worth ste
Prompt
langchain and fireworks just shipped the eval move worth stealing: a fine-tuned qwen judge that flags "perceived error" on every production trace and runs up to 100x cheaper than opus. the cost number gets the attention. the transfer result matters more. they trained the judge on one app, their docs q&a agent. then they pointed it at fleet, a separate product, with no retraining. it beat every frontier model on that domain. 90.8% against opus at 90.2%. most evaluators break the second you move them to a new app, because the rubric is app-specific. "perceived error" travels because the signal is behavioral: the user corrects you, or repeats the request. that pattern holds across every product. one design choice stands out. they fed the judge human and ai messages only and dropped every tool call. their bet is that the correction signal lives in the conversation itself. anyone can rent the model in your loop. a judge trained on your own traces, cheap enough to run on all of them, is the moat they cannot buy.
Share
Originally by
Rohit

Rohit

@rohit4verse · on X

Jun 15, 2026EN
View original source
113likes17reposts15replies30.2Kviews169bookmarksas of Jun 19, 2026

Referenced with attribution — all rights remain with the original creator. Sources & removal

Model
GPT Image 2
Aspect ratio
1:1
카테고리
인포그래픽 & 차트
에디토리얼교육미니멀플랫 디자인텍스트추상

나만의 버전 만들기

  1. 1프롬프트 복사하기
  2. 2나만의 소재와 디테일로 바꾸기
  3. 3생성하기

관련 레시피

View all prompts