Tất cả prompt
Infographic & biểu đồ

langchain and fireworks just shipped the eval move worth ste

Real prompt shared by @rohit4verse on X

langchain and fireworks just shipped the eval move worth ste
Prompt
langchain and fireworks just shipped the eval move worth stealing: a fine-tuned qwen judge that flags "perceived error" on every production trace and runs up to 100x cheaper than opus. the cost number gets the attention. the transfer result matters more. they trained the judge on one app, their docs q&a agent. then they pointed it at fleet, a separate product, with no retraining. it beat every frontier model on that domain. 90.8% against opus at 90.2%. most evaluators break the second you move them to a new app, because the rubric is app-specific. "perceived error" travels because the signal is behavioral: the user corrects you, or repeats the request. that pattern holds across every product. one design choice stands out. they fed the judge human and ai messages only and dropped every tool call. their bet is that the correction signal lives in the conversation itself. anyone can rent the model in your loop. a judge trained on your own traces, cheap enough to run on all of them, is the moat they cannot buy.
Share
Originally by
Rohit

Rohit

@rohit4verse · on X

Jun 15, 2026EN
View original source
113likes17reposts15replies30.2Kviews169bookmarksas of Jun 19, 2026

Referenced with attribution — all rights remain with the original creator. Sources & removal

Model
GPT Image 2
Aspect ratio
1:1
Danh mục
Infographic & biểu đồ
Biên tậpGiáo dụcTối giảnThiết kế phẳngChữTrừu tượng

Tạo phiên bản của riêng bạn

  1. 1Sao chép prompt
  2. 2Thay bằng chủ thể và chi tiết của riêng bạn
  3. 3Tạo ảnh

Công thức liên quan

View all prompts