this post was submitted on 30 Oct 2025
17 points (100.0% liked)

Ask Lemmy

35398 readers
1934 users here now

A Fediverse community for open-ended, thought provoking questions


Rules: (interactive)


1) Be nice and; have funDoxxing, trolling, sealioning, racism, and toxicity are not welcomed in AskLemmy. Remember what your mother said: if you can't say something nice, don't say anything at all. In addition, the site-wide Lemmy.world terms of service also apply here. Please familiarize yourself with them


2) All posts must end with a '?'This is sort of like Jeopardy. Please phrase all post titles in the form of a proper question ending with ?


3) No spamPlease do not flood the community with nonsense. Actual suspected spammers will be banned on site. No astroturfing.


4) NSFW is okay, within reasonJust remember to tag posts with either a content warning or a [NSFW] tag. Overtly sexual posts are not allowed, please direct them to either !asklemmyafterdark@lemmy.world or !asklemmynsfw@lemmynsfw.com. NSFW comments should be restricted to posts tagged [NSFW].


5) This is not a support community.
It is not a place for 'how do I?', type questions. If you have any questions regarding the site itself or would like to report a community, please direct them to Lemmy.world Support or email info@lemmy.world. For other questions check our partnered communities list, or use the search function.


6) No US Politics.
Please don't post about current US Politics. If you need to do this, try !politicaldiscussion@lemmy.world or !askusa@discuss.online


Reminder: The terms of service apply here too.

Partnered Communities:

Tech Support

No Stupid Questions

You Should Know

Reddit

Jokes

Ask Ouija


Logo design credit goes to: tubbadu


founded 2 years ago
MODERATORS
 

I have set languages for posts as persian and english and un-selected undetermined.

but I still get post in other languages and after open setting I see that undetermined option is pre-selected. it seems that I can't disable it. any idea why?

top 5 comments
sorted by: hot top controversial new old
[–] LemmyKnowsBest@lemmy.world 4 points 6 days ago* (last edited 6 days ago)

Whenever I see a post in foreign language I visit the community and manually block it. One of these days I will have eventually conquered them all.

[–] SnokenKeekaGuard@lemmy.dbzer0.com 3 points 6 days ago (1 children)
[–] tal@lemmy.today 0 points 6 days ago (1 children)

Odd. I could have sworn that I had the opposite problem


I inadvertently unselected undetermined language some time back and couldn't figure out why I couldn't see most posts for a while. It was because most content is undetermined rather than explicitly marked as English language. So unless there was a regression, I'd be surprised for it not to work.

thinks

I might have been using Kbin at the time. Maybe that was it.

[–] rezad@lemmy.world 2 points 6 days ago (1 children)

for the life of me I can't make it be unselected. try it for your self. from web page select setting and then de-select the undetermined and select a language (for example english) and then press save at the bottom. come back to setting page and see undetermined be selected again. the English or other languages you selected are saved correctly but for the life of me I cant disable undetermined.

it pollutes my feed with german french and other languages that I don't understand.

[–] tal@lemmy.today 3 points 6 days ago* (last edited 6 days ago)

Honestly, it might be better to change the feature from how it works today, where humans select the language type, to do something like having either the instance or client try to infer the language type and do the filtering there. I can tell you that a huge amount of the content that I want to see doesn't have people explicitly marking the language. Heck, the comment I responded to isn't marked as English.

There's some Linux utility or library that does statistical guessing of language based on characters seen. Probably also more sophisticated stuff out there. Lemme see if I can dig it up.

hunts around a bit

Well, this isn't it, but here's a Python module. On Debian trixie:

$ sudo apt install python3-venv
$ mkdir langtest
$ cd langtest
$ python3 -m venv venv
$ . venv/bin/activate
$ pip install langdetect
$ python -q
>>> import langdetect
>>> langdetect.detect_langs('رضا')
[ar:0.9999953370247615]

So it'd be 99.999% confident that your username is Arabic. Something like PieFed or Lemmy or a client could make use of that. Maybe use some heuristics a bit to default to assuming that the language is the same as the language of the parent comment or post or community average language or something, since very short comment texts might be unclear or ambiguous.

That's not perfect, because sometimes people will quote stuff in other languages or something like that, but I'd wager that it'd be more accurate than manually-tagged stuff.