this post was submitted on 05 Jun 2024
819 points (98.8% liked)

linuxmemes

21192 readers
554 users here now

Hint: :q!


Sister communities:


Community rules (click to expand)

1. Follow the site-wide rules

2. Be civil
  • Understand the difference between a joke and an insult.
  • Do not harrass or attack members of the community for any reason.
  • Leave remarks of "peasantry" to the PCMR community. If you dislike an OS/service/application, attack the thing you dislike, not the individuals who use it. Some people may not have a choice.
  • Bigotry will not be tolerated.
  • These rules are somewhat loosened when the subject is a public figure. Still, do not attack their person or incite harrassment.
  • 3. Post Linux-related content
  • Including Unix and BSD.
  • Non-Linux content is acceptable as long as it makes a reference to Linux. For example, the poorly made mockery of sudo in Windows.
  • No porn. Even if you watch it on a Linux machine.
  • 4. No recent reposts
  • Everybody uses Arch btw, can't quit Vim, and wants to interject for a moment. You can stop now.

  • Please report posts and comments that break these rules!

    founded 1 year ago
    MODERATORS
    819
    submitted 5 months ago* (last edited 5 months ago) by ordellrb@lemmy.world to c/linuxmemes@lemmy.world
     
    you are viewing a single comment's thread
    view the rest of the comments
    [–] R00bot@lemmy.blahaj.zone 21 points 5 months ago (3 children)

    I can't imagine it'd be that hard to write some code that does that using an existing AI model.

    [–] not_amm@lemmy.ml 9 points 5 months ago

    I found a small command to run KDE Spectacle (screenshot software) with Tesseract so I can OCR a screenshot if I want to, I only had to install Tesseract and a main language, you could easily do the same with an API and/or a local AI.

    [–] JackGreenEarth@lemm.ee 5 points 5 months ago

    You're probably right.

    [–] MacNCheezus@lemmy.today 3 points 5 months ago

    Llava and Bakllava are two Ollama models than can not only extract text but also describe what's happening on screen.

    Using tesseract-ocr, as the other guy suggested, is probably simpler and less resource intensive though.