Representation Engineering (2024)

(vgel.me)

30 points | by kqr 3 days ago

2 comments

mock-possum 40 minutes ago
That last experiment, where the LLM with its honesty vector increased is tasked with judging whether a user asking an example question has honest intentions, is interesting. It looks like it doesn’t quite grasp the ask, and is instead just equivocating about the definition of ‘honest.’
I wonder what a response with the ‘thoroughness’ vector turned up might have answered in that a case - would it have pointed out that it’s impossible to know intention from words, because people can lie, but it’s possible to at least guess - and even then, judging the honesty of intention could be interpreted several different ways?
k__ 3 hours ago
Somehow hidden state reminds me of DNA.