LocalLLaMA

4 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Xwin-Math: A Series of Powerful SFT Math LLMs and Evaluation Toolkit (alien.top)

submitted 2 years ago by Left_Beat210@alien.top to c/localllama@poweruser.forum

6 comments fedilink hide all child comments

Hi, everyone. Xwin-Math is intended to promote the mathematical reasoning capabilities of LLMs. Now we release the first version, which is a series of Llama 2 SFT models with CoT prompt.

GitHub link: Xwin-LM/Xwin-Math at main · Xwin-LM/Xwin-LM (github.com)

Model link: Xwin-LM (Xwin-LM) (huggingface.co)

Gradio Demo: Gradio (70B model)

Math capability on GSM8K and MATH benchmark

The Xwin-Math-70B-V1.0 model achieves 31.8 pass@1 on MATH benchmark and 87.0 pass@1 on GSM8K benchmark. This performance places it first amongst all open-source CoT models.

The Xwin-Math-7B-V1.0 and Xwin-Math-13B-V1.0 models achieve 66.6 and 76.2 pass@1 on GSM8K benchmark, ranking as top-1 among all LLaMA-2 based 7B and 13B open-source models, respectively.

We also evaluate Xwin-Math on other benchmarks such as SVAMP and MAWPS. Xwin-Math-70B-V1.0 approaches or surpasses the performance of GPT-35-Turbo (8-shot) on most benchmarks.

In addition, it also includes an evaluation toolkit that better converts LaTeX formulas into SymPy objects, enabling more accurate assessment of the mathematical abilities. We found that due to evaluation constraints, the results of GPT-4 were previously underestimated.

More information can be found in our GitHub repo. We SFT on Llama 2 with standard setting, using GPT-4 to augment the training set of MATH and GSM8K to approximately 100K in total. Our paper is still in the progress, so more training details and further results will be updated soon.

Any suggestions or comments greatly welcome! Thanks! =)

top 6 comments

sorted by: hot top controversial new old

[–] uti24@alien.top 1 points 2 years ago (1 children)

Prompt:

I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. How long I can fight with monsters?

Ansver:

To determine how long you can fight with monsters, we need to consider the net loss of hit points per minute. You lose 7 hit points per minute but also regenerate 2 hit points per minute.

Net loss per minute = Loss per minute - Regeneration per minuteNet loss per minute = 7 - 2 = 5 hit points

Now, we need to find out how many minutes it will take for you to lose all 100 hit points with a net loss of 5 hit points per minute.

Let's use the formula:

Time (in minutes) = Total hit points / Net loss per minute

Time (in minutes) = 100 / 5 = 20 minutes

So, you can fight with monsters for 20 minutes before losing all your hit points.

The answer is: 20.

Actually, .. I am impressed!

But same time:

I have 100 hit points, in 1 minute of fighting with monsters I lose 7 hit points, but also every minute I regenerate 2 hit points. I also kill 1 monster every 4 minutes I kill a monster and my current hit points increase by 1. I also need to have 20% of my hit points remain, so I can return to the base. How long I can fight with monsters?

It can not solve it by a long shot. Tried multiple times. GPT-3.5 solved it flawlessly.

[–] leelweenee@alien.top 1 points 2 years ago (2 children)

It can not solve it by a long shot.

I tried your prompt, and interestingly enough it got the correct answer (16 mins) but the reasoning was very weird, using logs and whatnot

[–] uti24@alien.top 1 points 2 years ago

Well it gave me very weird results on my updated prompt.

Like it said result is 4 or something also using logs.

[–] pseudonerv@alien.top 1 points 2 years ago (1 children)

because 80/5=16 and the rest are noise

[–] uti24@alien.top 1 points 2 years ago (1 children)

rest are noise

But why? If you increase health restoring parameter it would matter on result.

Also, why it used logs then, don't seem to be right.

[–] pseudonerv@alien.top 1 points 2 years ago

of course not. I meant the LLM only needs to compute based on that equation alone. The rest, meh, it may hallucinate as it loves.