this post was submitted on 04 Dec 2023
1 points (100.0% liked)

LocalLLaMA

3 readers
1 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 1 year ago
MODERATORS
 

hi folks,

simple question really - what model (finetuned or otherwise) have you found that can extract data from a bunch of text.

I'm happy to finetune, so if there are any successes there, would really appreciate some pointers in the right direction.

Really looking for a starting point here. I'm aware of the DETR class of models and how Microsoft trained table-transformers on DETR. Wondering if that can be done on llama2,etc models ?

P.S. cannot use GPT because of sensitive PII data.

you are viewing a single comment's thread
view the rest of the comments
[–] georgejrjrjr@alien.top 1 points 11 months ago (1 children)

I’ve wondered this, and hope you get better answers.

One thing you could do if it fit your use-case: align GDELT entries and news stories in realnews dataset on huggingface, train a model to output the extracted info from the article.

Another is have GPT-4 so some examples on lightly faked / anonymized data and then distill that into a model that does well on information extraction evals (which are a thing iirc).

[–] sandys1@alien.top 1 points 11 months ago

What is the information extraction evals ? Do u have a link ?