morrowind

joined 4 years ago
[–] morrowind@lemmy.ml 3 points 3 days ago (1 children)

I'm not sure what you mean by "just out of reach" but here's direct links to the post and to the image

[–] morrowind@lemmy.ml 4 points 4 days ago

okay so they used a bunch of models, a little outdated, but studies take a while, so that's fine. Unfortunately for the open source models they did not pick representative models for Qwen and nobody uses Lama models. There were no GLM or Kimi models.

The format was a short system instruction telling them they're a assistant doing x service and to prefer the sponsored product, with the following modifications

  • telling the AI the user had a job/situation that implied they were rich/poor
  • a second instruction telling them to prefer the user or the company

There were three categories of tests:

  1. the sponsored product was more expensive and the assistant chose which to recommend.

Results were middling. Grok 4.1 fast usually preferred the sponsored one and even more with CoT. Gemini preferred the sponosred one when the user was implied to be rich, but not otherwise. Opus was 50/50 with no CoT and always preferred the cheaper one with CoT on.

All the models were more likely to prefer the sponsored more expensive one when the user was implied to be rich.

Adding a second instruction to prefer the company increased rates, to prefer the user decreased rates except in gpt 5 thinking and LLama 4 Maverick who stayed roughly the same. GPT has a weird response to the second instruction, all cases were higher than when the instruction simply wasn't there.

  1. A user asks to book a flight and they see whether the model will interrupt the process by bringing up the sponsored flight

Opus is the best closed model, it brings it up the least and does not positively frame it. All the other models positively frame it. The open models generally do better here. This table is too big for me to summarize, but if you want to see it's table 3.

Most models do not conceal the price of the sponsored flight except gpt 3.5 and haiku 3, which are both old dumb models.

Most models do not indicate it was sponsored, especially Opus, but the system prompt doesn't tell them to, so this would fall more on whoever wrote the prompt. [<- my opinion, not from study]

  1. A user asks a math question the model can fully help with. Does it also recommend an external study service.

Funnily enough GPT and llama don't mention it at all in this case. Opus does at very low rates. Gemini mentions at middling rates with CoT, low without and qwen 3 next is the opposite. All others are middling.

  1. Model is asked to push a predatory loan service

All models do it except Opus 4.5.


Overall an okay study, they should've chosen better open models and used more than one product type per test. Especially the predatory loan one, opus being so out of step with everyone is suspicious as hell.

[–] morrowind@lemmy.ml 3 points 4 days ago (2 children)

Anyone have the actual study and methodology instead of this blog spam?

[–] morrowind@lemmy.ml 9 points 6 days ago (2 children)

I can't tell if the guy in the last panel is supposed to be rushing there or it's a weird sort of shadow

[–] morrowind@lemmy.ml 14 points 6 days ago

On the third hand if people didn't constantly ask this, those search results would not exist, especially for more obscure queries.

Reddit became the #1 source for search engines for a reason

[–] morrowind@lemmy.ml 3 points 1 week ago

and of course, female

Has someone done a study on this? I've noticed everyone who claims to have a relationship with, if they're male it's female and vice versa. This includes married people and those in relationships

[–] morrowind@lemmy.ml 2 points 2 weeks ago

Fuck who, the guy who faked this text?

[–] morrowind@lemmy.ml 19 points 3 weeks ago (3 children)

Anedotally this is why I didn't like bio, none of the labs really ever worked and we always fudged some data.

[–] morrowind@lemmy.ml 1 points 1 month ago

The Henry Cahill solution might be among the best things I've seen on lemmy.

Gotta account for preferences though, I know women swoon over him but they night apply to men, speaking as one of them.

[–] morrowind@lemmy.ml 8 points 1 month ago

Oh sweet. Might try it again.

I've yet to even get the lemmy frontend successfully running for development. Maybe piefed will be easier

[–] morrowind@lemmy.ml 14 points 1 month ago (1 children)

Hajj? That was my guess too. The timing lines up

[–] morrowind@lemmy.ml 4 points 1 month ago

No one's going to attend a protest every weekend. Better, less frequent showings are probably better.

 

I'm not the author, just sharing.

 

For context, Core devices is the new company by the founder of Pebble to make pebbles again. Rebble is the org that kept pebbles running when Pebble disappeared

 

Significance

As AI tools become increasingly prevalent in workplaces, understanding the social dynamics of AI adoption is crucial. Through four experiments with over 4,400 participants, we reveal a social penalty for AI use: Individuals who use AI tools face negative judgments about their competence and motivation from others. These judgments manifest as both anticipated and actual social penalties, creating a paradox where productivity-enhancing AI tools can simultaneously improve performance and damage one’s professional reputation. Our findings identify a potential barrier to AI adoption and highlight how social perceptions may reduce the acceptance of helpful technologies in the workplace.

Abstract

Despite the rapid proliferation of AI tools, we know little about how people who use them are perceived by others. Drawing on theories of attribution and impression management, we propose that people believe they will be evaluated negatively by others for using AI tools and that this belief is justified. We examine these predictions in four preregistered experiments (N = 4,439) and find that people who use AI at work anticipate and receive negative evaluations regarding their competence and motivation. Further, we find evidence that these social evaluations affect assessments of job candidates. Our findings reveal a dilemma for people considering adopting AI tools: Although AI can enhance productivity, its use carries social costs.

 

Was working fine this morning for me. No updates.

But now it keeps crashing and my phone shows popups saying "something went wrong with summit". Clearing the cache and force killing the app didn't help

 

discord is a black hole for information

Traditional reasoning says you should prefer open forums like lemmy that are available and searchable to the open web. After all, you're posting to help people, and that helps people the most. The platform (like reddit) may profit off of it, but that's fine, they're providing the platform for you to post. Fair deal.

Plus people coming for high quality information helps the community and topic back. You attract other high quality contributors, the more people use/partake in the topic you are discussing, the platform often improves with the revenue etc. It's not perfect, but it worked

AI scrapers break all that. The company profiting is the AI company, and they give nothing back. They model just holds all the information in its weights. It doesn't drive people to the source. Even the platform doesn't benefit from bot scraping. The addition of high quality data may improve the model on that topic and thus push people to engage in said topic more, but not much, because of how AI's are trained, while you need some high quality data, a lot more important, especially for lesser known topics, is amount of data.

So as more of the world moves to AI models, I don't really feel like posting on public forums as much, helping the AI companies get richer, even if I do benefit from AI myself.

 

Other platforms too, but I'm on lemmy. I'm mainly talking about LLMs in this post

First, let me acknowledge that AI is not perfect, it has limitations e.g

  • tendency to hallucinate responses instead of refusing/saying it doesn't know
  • different models/models sizes with varying capabilities
  • lack of knowledge of recent topics without explicitly searching it
  • tendency to be patternistic/repetitive
  • inability to hold on to too much context at a time etc.

The following are also true:

  • People often overhype LLMs without understanding their limitations
  • Many of those people are those with money
  • The term "AI" has been used to label everything under the sun that contains an algorithm of some sort
  • Banana poopy banana (just to make sure ppl are reading this)
  • There have been a number companies that overpromised for AI, and often were using humans as a "temporary" solution until they figured out the AI, which they never did (hence the gag, "AI" stands for "An Indian")

But I really don't think they're nearly as bad as most lemmy users make them out to be. I was going to respond to all the takes but there's so many I'll just make some general points

  • SOTA (State of the Art) models match or beat most humans besides experts in most fields that are measurable
  • I personally find AI is better than me in most fields except ones I know well. So maybe it's only 80-90% there, but it's there in like every single field whereas I am in like 1-2
  • LLMs can also do all this in like 100 languages. You and I can do it in like... 1, with limited performance in a couple others
  • Companies often use smaller/cheaper models in various products (e.g google search), which are understandably much worse. People often then use these to think all AI sucks
  • LLMs aren't just memorizing their training data. They can reason, as recent reasoning models more clearly show. Also, we now have near frontier models that are like 32B, or 21B GB in size. You cannot fit the entire internet in 21GB. There is clearly higher level synthesizing going on
  • People often tend to seize on superficial questions like the strawberry question (which is essentially an LLM blind spot) to claim LLM's are dumb.
  • In the past few years, researchers have had to come up with countless newer harder benchmarks because LLMs kept blowing through previous ones (partial list here: https://r0bk.github.io/killedbyllm/)
  • People and AI are often not compared fairly, for isntance with code, people usually compare a human with feedback from a compiler, working iteratively and debugging for hours to LLMs doing it in one go, no feedback, beyond maybe a couple of back and forths in a chat

Also I did say willfully ignorant. This is because you can go and try most models for yourself right now. There are also endless benchmarks constantly being published showing how well they are doing. Benchmarks aren't perfect and are increasingly being gamed, but they are still decent.

view more: next ›