this post was submitted on 14 Feb 2024
548 points (97.4% liked)

Technology

59402 readers
2816 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] CubitOom@infosec.pub 10 points 9 months ago* (last edited 9 months ago) (7 children)

So what is Home Assistant using for this?

If I were to build it myself I'd probably over complicate it by using multiple llm agents on a local server. Probably use whisper to do the speech to text and then Mistral fine tuned on the Rosetta code dataset to send the API calls to HA. However that wouldnt keep it from always listening to me and trying to interpret what I say into a command for HA. Is that just a prompting issue for whisper or would I need another agent to turn on whisper?

I could maybe get this to run without specialized hardware like a GPU but it would be better to have something for the llms to be a bit more responsive.

[–] redcalcium@lemmy.institute 7 points 9 months ago* (last edited 9 months ago) (6 children)

There is no LLM, it just used to recognize simple commands such as "turn on kitchen light". What the "conversation agent" can do is very limited, though you can extend it to recognize custom commands. It's not comparable to Google Assistant/Siri, let alone ChatGPT.

[–] CubitOom@infosec.pub 1 points 9 months ago

Ok, hmm I wonder how much work it would be to implement it using open source models. I think the hardest part would be translating the voice instructions to an API call that HA can use correctly.

Then there is the whole hardware issue to fix too. I do know that some SOCs are getting good at running 7B parameter models locally but the cost is still probably going to be prohibitive.

load more comments (5 replies)
load more comments (5 replies)