Hugi_R

joined 10 months ago
[โ€“] Hugi_R@alien.top 1 points 10 months ago

3.5 never suspect the 6th playing chess

https://chat.openai.com/share/b7e6b24d-44db-4abf-9a81-5325f836bca5 (the === are artifacts of the custom system prompt, 3.5 sucks at following it)

I asked it for candidate activity, and mostly offered different ones. It's weird, I would expect a LLM to list activities that were already mentioned in the conversation. Maybe the repetition penalty is set too high?

[โ€“] Hugi_R@alien.top 1 points 10 months ago (3 children)

Open-ended question are the best for evaluating LLM, because they require common sense/world knowledge/doxa/human like behavior.

Saying "I don't know" is just a cop out response. At least it should say something like "It could be X but ...", be a little creative.

Another (less?) open-ended question with the same premise would be "Where are they?" and I expect the answer to be "In a garden".

GPT-4 Turbo (with custom instruction) answer very well https://chat.openai.com/share/c305568e-f89e-4e71-bb97-79f7710c441a