LocalLLaMA

11 readers

4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago

MODERATORS

communick@poweruser.forum

Keyword Labeling/Classification System (alien.top)

submitted 2 years ago by oinkyDoinkyDoink@alien.top to c/localllama@poweruser.forum

2 comments fedilink hide all child comments

Hello all, I'd love your help to think through this problem. Will briefly describe the problem followed by possible solutions. Would love thoughts/feedback

Problem: I have a bunch of keywords about a product I want to classify into a certain set of categories. I can provide a description of the product and give examples of all the categories too. Specifically I want to identify the irrelevant keywords.

Now, I have a lot of products (let's say 500) and 100000 keywords/product.

Solutions I'm considering:

Fancy prompt engineering with either function calls/parsing with Gpt4 giving few shot examples. Feel it can become expensive to pass a large prompt (so might need to pass several keywords at a time)
Use embeddings cosine distance to help me classify keywords
Finetune a smaller opensource model on this where I reach a "keyword in, label out"

If the 3rd is suitable would love some direction, such as:

which model and size is best to finetune
do I train the model on each product or will it generalise well across products?
what dataset size would I require (keyword <> label pairs, ie)
resources/libraries/tools I should refer to?

TIA!

you are viewing a single comment's thread
view the rest of the comments

[–] phree_radical@alien.top 1 points 2 years ago (1 children)

Just a classifier like this?

[–] oinkyDoinkyDoink@alien.top 1 points 2 years ago

Yes something like this works, but the prompt is very large to run on 1000s of keywords. Hence looking for something better