madmax_br5

joined 1 year ago
[–] madmax_br5@alien.top 1 points 11 months ago

I don’t know if there’s much value there when LORA’s are easily portable — you can just select the right lora as needed. One base model instance on one machine, many potential experts. This has been demonstrated.

 

https://huggingface.co/TheBloke/MistralLite-7B-GGUF

This is supposed to be a 32k context finetune of mistral. I’ve tried the recommended Q5 version in both GPT4all and LMStudio, and it works for normal short prompts but hangs and produces no output when I crank up the context length to 8k+ for data cleaning. I tried it cpu only (machine has 32GB of RAM so should be plenty) and hybrid with the same bad outcomes. Curious if there’s some undocumented ROPE settings that need to he adjusted.

Anyone get this to work with long prompts? Otherwise, what do y’all recommend for 32k+ context with good performance on data augmentation/cleaning, with <20B params for speed?

 

I have some old engineering textbooks and wanted to try taking pictures of the pages, extracting the text with a vision model, and using this data to fine-tune an LLM. I may need to fine-tune the vision model first in order to parse the text into a markdown format. But my question is which base vision model to use, especially given the dense nature of the text. These models are not well documented in terms if what input resolutions they support. Nougat? Bakllava? Tesseract? Would appreciate advice on a good starting point to avoid burning too much time down the wrong path.

In summary:

  • Goal is to extract text from pictures of textbook pages into markdown format.
  • Photos will be normal ~12MP images captured with my phone camera, one page per photo