You can try MLC-LLM (https://llm.mlc.ai/), it has tools for inference of quantized models on the web
kristaller486
joined 1 year ago
This is awesome! What training parameters are you using?
Full-weight fine-tuning should add new knowledge. Not LoRa.
They trained their model using synthetic GPT-3.5-turbo data + a mix of their data. It is normal that V7 says "I am gpt-3.5", but it is not normal that Phind uses synthetic OpenAI GPT data because it violates OpenAI terms.
Is there a code for distillation?