Kimononono

joined 10 months ago
[–] Kimononono@alien.top 1 points 9 months ago

I haven’t used it in awhile but I remember it being able to extract my headers which were in merged cells. It’s fairly high level so it’s worth a try

[–] Kimononono@alien.top 1 points 9 months ago (2 children)

i use pdfplumber

[–] Kimononono@alien.top 1 points 10 months ago
  1. you’d probably want to embed the notes and then use cosine similarity to find similar notes given your input query (“please write a short sentence describing the following facts…”, use the facts in the cosine similarity search ).

  2. Then pass the similar notes into a llm with a instruction like “please write a short sentence describing the following facts using the notes”

I don’t know how well embeddings work for italian so you may want to translate them to english and keep them in pairs (italian version, english version) then use the english for the cosine similarity search ( step 1. ) and the italian version for the summarization ( step 2. )

[–] Kimononono@alien.top 1 points 10 months ago

would a service like runpod work for you? It sells you GPU power by the hour instead of by token

[–] Kimononono@alien.top 1 points 10 months ago

depending on the size of the model your fine-tuning your going to want to limit the amount of context not pertaining to a code vulnerability. The major issue I see is that code vulnerabilities will probably deal with multiple functions spread across different files.

So you could pass in just snippets of different functions relating to the vulnerability report but that isn’t very helpful for identifying vulnerabilities given a code file. You would have to pass in a specific function and all functions it references (and so on) for this format to work then it would write a vulnerability report on that. So you’d probably also want to include some reports which don’t include vulnerabilities or just be prepared for the tuned model to think every function you pass in to contain a vulnerability.

I strongly believe just referencing the line number will not build a strong enough attention link between the actual code and the vulnerability report.

My 2 cents

[–] Kimononono@alien.top 1 points 10 months ago

tortoiseTTS using the voice-ai-cloning repository. Had a dataset of 20 minutes, 5 minutes of footage along with a hour of tweaking the hyper parameters and i have a voice which sounds pretty damn human. I tried training for a long time but just sounds worse after the first few epochs