this post was submitted on 24 Nov 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

I have a large corpus of notes humans wrote to summarize articles. As is they will give you the gist but are not very readable. I would like to use a gen model and ask “please write a short sentence that will be nice to read describing the following facts” and feed it the notes, to obtain a brief readable summary.

Language is Italian.

Suggestions on models or workflows?

Thanks

top 5 comments
sorted by: hot top controversial new old
[–] vatsadev@alien.top 1 points 11 months ago

RWKV v5 7b, its only half trained rn, but the model surpasses Mistral on all multilingual benchmarks, cause the is meant to be multilingual.

[–] Kimononono@alien.top 1 points 11 months ago
  1. you’d probably want to embed the notes and then use cosine similarity to find similar notes given your input query (“please write a short sentence describing the following facts…”, use the facts in the cosine similarity search ).

  2. Then pass the similar notes into a llm with a instruction like “please write a short sentence describing the following facts using the notes”

I don’t know how well embeddings work for italian so you may want to translate them to english and keep them in pairs (italian version, english version) then use the english for the cosine similarity search ( step 1. ) and the italian version for the summarization ( step 2. )

[–] AI_Trenches@alien.top 1 points 11 months ago (2 children)

You can try taking a picture of the notes and having a multimodal model try and read and extracts its text. You can either use chatgpt4(paid but probably more accurate) or run llama.cpp llava multimodal function with a llava model locally(free but might hallucinate).

Maybe scanning your notes into PDF format and trying a RAG approach might yield some results too. You can upload the PDF to GPT/Claude or run a local RAG project like h2oGPT or privategpt and see how well they can transcribe your notes.

[–] shreydanfr@alien.top 1 points 11 months ago

I doubt gpt4V will be perfect at reading detailed handwritten notes that too in Italian. For proper results, google lens is good at handwritten structured OCR, but needs manual work.

[–] olddoglearnsnewtrick@alien.top 1 points 11 months ago

I apologize for my misleading english, but they are not handwritten, but regular text in a mongodb array of strings