Ask HN: How do you know if a tweak to your AI skill made it better?

Curious how people here evaluate changes when they tweak an AI skill / prompt / workflow.

A lot of the time, a tweak might feel better in one or two cases, but it’s hard to tell if it actually improved the skill overall or just changed its behavior in a way that looks better for a bit.

Do you mostly go by intuition, or do you have some lightweight way to check if a tweak really helped?

6 points | by yo103jg 1 day ago

2 comments

  • sdevonoes 8 hours ago
    In general, you don’t know. Sure thing if you have a specific code base in which you already had a bunch of tests (non ai generated ) and the code you are regenerating is always touching the logic behind those tests, sure you can assess to some extent your skills/prompt changes. But in general you just don’t know. You havr a bunch of skills md files that who knows how they work if changed a little bit here a little bit there. People who claim they know are selling snake oil
  • Areena_28 8 hours ago
    The way we handle it is keeping a small set of fixed test cases that we never change. Like same inputs, same expected outputs. so when we tweak a prompt we run it against those first. if it passes the fixed cases and feels better on the new ones, we keep it.
    • gtirloni 5 hours ago
      How you get deterministic output though? t=0? Pydantic AI outputs?