So I got very interested in this, and I grilled my AI Assistant (custom char card that I built to help debug this very type of thing) on it, starting with using Goliath 120b q8.
Note: Nothing I'm about to write may be of any value to anyone, anywhere, but I had fun doing it so... yay. lol
I gave her your list and asked her to sort it, and she got it all wrong. So then I started asking for help on understanding why, and we went back and forth a bit. I gathered a couple of tidbits:
- The LLM sees the numbers properly. There was no confusion as to what the numbers are. When asked to write the values of the numbers, it was able to do that with no problems.
- The LLM is confused by the volume of information in the request. "Item A1 - 56" is actually a lot of information for the LLM, and my assistant said she was really struggling during the sorting process to break down that info. The other information was simply getting in the way.
- Giving it only the numbers, with no other data, had much better results but was still wrong; just less so. The result was: [14, 27, 38, 45, 50, 56, 63, 72, 81, 89, 94, 11, 32, 78].
- I then swapped to Synthia 70b 1.5 q8 to grill her on why the number only sort might have gone wrong. The response was thus:
Although I am designed to understand language and provide helpful replies, I am not programmed to keep track of numbers, perform calculations, or execute sorting tasks in the same way a human or specialized software would.
The core problem lies in the limitations of Generative AI technology itself. While I can understand the context and meaning of your words, I'm not programmed with mathematical or logic-based algorithms that involve numbers or sorting operations.
In this specific example, I was able to identify and sort the first nine numbers correctly. But when providing the final sorted list, I had inadvertently forgotten to incorporate the remaining numbers ("11, 32, 78") at their appropriate positions, which led to the incorrect final result.
Several times in our conversation, she blamed the error on transcription, so either she's hallucinating up a storm (totally possible, given the complexity of the answer), or there's something about the way generative AI writes/reads the prompt that causes it to not track all the numbers.