LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Another test of logical ability for LLMs? (alien.top)

submitted 2 years ago by laca_komputilulo@alien.top to c/localllama@poweruser.forum

14 comments fedilink hide all child comments

Found this in a children's book of riddles:

Six brothers were spending their time together.

The first brother was reading a book.
The second brother was playing chess.
The third brother was solving a crossword.
The fourth brother was watering the lawn.
The fifth brother was drawing a picture.

Question: what was the sixth brother doing?

I cant get ChatGPT to answer correctly with the usual tricks, even after hinting to consider one and two-person activities and emphasizing the word "together".

After a bunch of CoT turns we arrive to a conclusion that this is an open ended question and not a riddle :)

After trying 3 times with fresh promots, I got a correct response once, but when prompted to provide supporting reasoning the model backtracked and started apologizing.

Cant test gpt 4 r/n...

you are viewing a single comment's thread
view the rest of the comments

[–] Be-Kind_Always-Learn@alien.top 1 points 2 years ago (7 children)

This seems, to me, a terrible riddle. Not only can you play chess online, not only can you play chess against a computer, but you can literally play chess alone.

GPT is correct: this is an open-ended question and there's not enough information to actually answer it beyond a clever guess.

[–] Hugi_R@alien.top 1 points 2 years ago (3 children)

Open-ended question are the best for evaluating LLM, because they require common sense/world knowledge/doxa/human like behavior.

Saying "I don't know" is just a cop out response. At least it should say something like "It could be X but ...", be a little creative.

Another (less?) open-ended question with the same premise would be "Where are they?" and I expect the answer to be "In a garden".

GPT-4 Turbo (with custom instruction) answer very well https://chat.openai.com/share/c305568e-f89e-4e71-bb97-79f7710c441a

[–] laca_komputilulo@alien.top 1 points 2 years ago (1 children)

Thank you, bud Mind trying the same prompt on the cheapo 3.5 model? I suspect it will hit it on the nail with your custom instructions, given that it was hit and miss for me with my weaker prompting judjitsu

[–] Hugi_R@alien.top 1 points 2 years ago

3.5 never suspect the 6th playing chess

https://chat.openai.com/share/b7e6b24d-44db-4abf-9a81-5325f836bca5 (the === are artifacts of the custom system prompt, 3.5 sucks at following it)

I asked it for candidate activity, and mostly offered different ones. It's weird, I would expect a LLM to list activities that were already mentioned in the conversation. Maybe the repetition penalty is set too high?

load more comments (1 replies)

load more comments (4 replies)