kazza789

joined 2 years ago

[D] Why are ML model outputs not tested regarding statistical significance? in c/machinelearning@academy.garden

[–] kazza789@alien.top 1 points 2 years ago

One big reason for this is that there is a difference between prediction and inference. Most machine learning papers are not testing a hypothesis.

That said - ML definitely does get applied to inference as well, but in those cases the lack of p-values is often one of the lesser complaints.

[P] Fine-grained semantic search and clustering with interpretable multi-feature text embeddings in c/machinelearning@academy.garden

[–] kazza789@alien.top 1 points 2 years ago

Could you use a traditional embedding, and then somehow search for a vector that represents the semantic feature you are interested in? What I mean is that, since LLMs can understand the concept of numbers, and this is a pretty fundamental part of language, presumably (but not necessarily) there is a vector in the high dimensional embedding space that represents the concept of "how many". I'm thinking, of course, along the lines of the traditional example of "king" - "male" + "female" = "queen", where you could, for example, define a "gender" vector based on "male", "female" and perhaps a set of other related words.

I'm not sure how feasible that is at all - I'm just curious if it's something you explored or read about as you were doing this?