this post was submitted on 23 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 2 years ago
MODERATORS
 

Hi all.

I was researching generative model evaluation and found this post interesting: https://deepsense.ai/evaluation-derangement-syndrome-in-gpu-poor-genai

A lot of it kind of corresponds to what I see happening in the industry and feels like a good fit here

top 10 comments
sorted by: hot top controversial new old
[–] Holiday-Union-6750@alien.top 1 points 2 years ago

Vibes with what I've seen in my job and the industry in general. Sadly, the greatest fun is only for huge corporations. Worth reading, definitely!

[–] robibok@alien.top 1 points 2 years ago

Love the graphic :)

[–] vikigenius@alien.top 1 points 2 years ago

It's kind of weird that they use HFRL as the initialism instead of the much more common RLHF.

[–] new_name_who_dis_@alien.top 1 points 2 years ago (1 children)

Well it depends on what you are building. If you are actually doing ML research, i.e. you want to publish papers, people are doing evaluation and you won't get published without it. There's a bunch of tricks that have been used to evaluate generative models that you can find in these papers. I remember in grad school our TA made us read a paper and then in the discussion he said that he thought the method they proposed was not good at all, he wanted us to read it to learn about their evaluation metric which he deemed "very clever".

[–] currentscurrents@alien.top 1 points 2 years ago (1 children)

you won't get published without doing proper evaluation

Idk man, I've seen some pretty sketchy papers this year.

[–] new_name_who_dis_@alien.top 1 points 2 years ago (1 children)

Like what?

I mean there's always sketchy papers because of p-hacking. But I doubt that there's papers that don't have a proper evaluation at all.

[–] obolli@alien.top 1 points 2 years ago (1 children)

i mean the evaluation process itself is an active field of research...

[–] new_name_who_dis_@alien.top 1 points 2 years ago

That's kind of what my original comment was all about.

[–] martianunlimited@alien.top 1 points 2 years ago

The typical measure for most ML conferences is the Fréchet inception distance (FID) but I have seen a number of generative AI papers, and what those values actually mean practically can be extremely obtuse, I appreciate papers that reports both the FID as a metric and also produce some representative examples of the output. (in the suplementary material if space is an issue)

[–] No_Land9521@alien.top 1 points 2 years ago

Quite insightful and interesting comments there!