全部提示词
信息图与图表

langchain and fireworks just shipped the eval move worth ste

Real prompt shared by @rohit4verse on X

langchain and fireworks just shipped the eval move worth ste
提示词
langchain and fireworks just shipped the eval move worth stealing: a fine-tuned qwen judge that flags "perceived error" on every production trace and runs up to 100x cheaper than opus. the cost number gets the attention. the transfer result matters more. they trained the judge on one app, their docs q&a agent. then they pointed it at fleet, a separate product, with no retraining. it beat every frontier model on that domain. 90.8% against opus at 90.2%. most evaluators break the second you move them to a new app, because the rubric is app-specific. "perceived error" travels because the signal is behavioral: the user corrects you, or repeats the request. that pattern holds across every product. one design choice stands out. they fed the judge human and ai messages only and dropped every tool call. their bet is that the correction signal lives in the conversation itself. anyone can rent the model in your loop. a judge trained on your own traces, cheap enough to run on all of them, is the moat they cannot buy.
Share
Originally by
Rohit

Rohit

@rohit4verse · on X

Jun 15, 2026EN
View original source
113likes17reposts15replies30.2Kviews169bookmarksas of Jun 19, 2026

Referenced with attribution — all rights remain with the original creator. Sources & removal

模型
GPT Image 2
宽高比
1:1
分类
信息图与图表
杂志风教育极简扁平设计文字抽象

打造你的专属版本

  1. 1复制提示词
  2. 2换上你自己的主题与细节
  3. 3生成