ArcticDagger

joined 2 years ago
[–] ArcticDagger@feddit.dk 11 points 11 months ago (3 children)

I think that hypothesis still holds as it has always assumed training data of sufficient quality. This study is more saying that the places where we've traditionally harvested training data from are beginning to be polluted by low-quality training data

[–] ArcticDagger@feddit.dk 15 points 11 months ago

From the article:

To demonstrate model collapse, the researchers took a pre-trained LLM and fine-tuned it by training it using a data set based on Wikipedia entries. They then asked the resulting model to generate its own Wikipedia-style articles. To train the next generation of the model, they started with the same pre-trained LLM, but fine-tuned it on the articles created by its predecessor. They judged the performance of each model by giving it an opening paragraph and asking it to predict the next few sentences, then comparing the output to that of the model trained on real data. The team expected to see errors crop up, says Shumaylov, but were surprised to see “things go wrong very quickly”, he says.

[–] ArcticDagger@feddit.dk 20 points 11 months ago

What they see as "bad research" is looking at an older cohort without taking into consideration their earlier drinking habits - that is, were they previously alcoholics or did they generally have other problems with their health?

If you don't correct for these things, you might find that people who are not drinking seems less healthy than people who are. BUT, that's not because they're not drinking, it's just because of their preexisting conditions. Their peers who are drinking a little bit tend to not have these preexisting conditions (on average)

 

From the article:

As predicted, studies with younger cohorts and separating former and occasional drinkers from abstainers estimated similar mortality risk for low-volume drinkers (RR = 0.98, 95% CI [0.87, 1.11]) as abstainers. Studies not meeting these quality criteria estimated significantly lower risk for low-volume drinkers (RR = 0.84, [0.79, 0.89]). In exploratory analyses, studies controlling for smoking and/or socioeconomic status had significantly reduced mortality risks for low-volume drinkers. However, mean RR estimates for low-volume drinkers in nonsmoking cohorts were above 1.0 (RR = 1.16, [0.91, 1.41]).

Studies with life-time selection biases may create misleading positive health associations. These biases pervade the field of alcohol epidemiology and can confuse communications about health risks. Future research should investigate whether smoking status mediates, moderates, or confounds alcohol-mortality risk relationships.

[–] ArcticDagger@feddit.dk 17 points 1 year ago (1 children)

Here's an actual explanation of the 'sneaked reference':

However, we found through a chance encounter that some unscrupulous actors have added extra references, invisible in the text but present in the articles’ metadata, when they submitted the articles to scientific databases. The result? Citation counts for certain researchers or journals have skyrocketed, even though these references were not cited by the authors in their articles.

[–] ArcticDagger@feddit.dk 2 points 1 year ago

Thank you, those are some good points!

[–] ArcticDagger@feddit.dk 5 points 1 year ago* (last edited 1 year ago) (2 children)

Could you explain a bit more about why it's insane to have it as a docked volume instead of a mount point on the host? I'm not too well-versed with docker (or maybe hosting in general)

Edit: typo

[–] ArcticDagger@feddit.dk 3 points 1 year ago

Interesting that they have such a greedy/stupid bot

[–] ArcticDagger@feddit.dk 10 points 1 year ago

I would say no. Just as it's not legitimate for any other business to break the law even if that means they're not going to be profitable

[–] ArcticDagger@feddit.dk 23 points 1 year ago (5 children)

Could it be this fella who's hitting you up: https://claude.ai/login

[–] ArcticDagger@feddit.dk 2 points 1 year ago

Further, most of the times, it's simply infeasible to test the data in-depth. We're all humans with busy schedules and it is, unfortunately, not trivial to replicate experiments. If a reviewer feels more data is needed to support a claim, they can ask for a follow-up test or experiment, but it has to be within reason

[–] ArcticDagger@feddit.dk 1 points 1 year ago

Yea, not the most clear title about what the article is about hahah

[–] ArcticDagger@feddit.dk 5 points 2 years ago

Is it possible for you to somehow quantify traffic originating from AdNauseum? If so, how?

view more: ‹ prev next ›