this post was submitted on 23 Nov 2023
1 points (100.0% liked)

LocalLLaMA

11 readers
4 users here now

Community to discuss about Llama, the family of large language models created by Meta AI.

founded 2 years ago
MODERATORS
 

Hello all, I'd love your help to think through this problem. Will briefly describe the problem followed by possible solutions. Would love thoughts/feedback

Problem: I have a bunch of keywords about a product I want to classify into a certain set of categories. I can provide a description of the product and give examples of all the categories too. Specifically I want to identify the irrelevant keywords.

Now, I have a lot of products (let's say 500) and 100000 keywords/product.

Solutions I'm considering:

  1. Fancy prompt engineering with either function calls/parsing with Gpt4 giving few shot examples. Feel it can become expensive to pass a large prompt (so might need to pass several keywords at a time)
  2. Use embeddings cosine distance to help me classify keywords
  3. Finetune a smaller opensource model on this where I reach a "keyword in, label out"

If the 3rd is suitable would love some direction, such as:

  • which model and size is best to finetune
  • do I train the model on each product or will it generalise well across products?
  • what dataset size would I require (keyword <> label pairs, ie)
  • resources/libraries/tools I should refer to?

TIA!

you are viewing a single comment's thread
view the rest of the comments
[โ€“] phree_radical@alien.top 1 points 2 years ago (1 children)
[โ€“] oinkyDoinkyDoink@alien.top 1 points 2 years ago

Yes something like this works, but the prompt is very large to run on 1000s of keywords. Hence looking for something better