Representation Engineering (2024)

(vgel.me)

30 points | by kqr 3 days ago

2 comments

  • mock-possum 40 minutes ago
    That last experiment, where the LLM with its honesty vector increased is tasked with judging whether a user asking an example question has honest intentions, is interesting. It looks like it doesn’t quite grasp the ask, and is instead just equivocating about the definition of ‘honest.’

    I wonder what a response with the ‘thoroughness’ vector turned up might have answered in that a case - would it have pointed out that it’s impossible to know intention from words, because people can lie, but it’s possible to at least guess - and even then, judging the honesty of intention could be interpreted several different ways?

  • k__ 3 hours ago
    Somehow hidden state reminds me of DNA.