cross-posted from: https://lemmit.online/post/225981
This is an automated archive made by the Lemmit Bot.
The original was posted on /r/homeassistant by /u/janostrowka on 2023-07-19 12:49:02.
Hopefully this will come in handy for our Year of the Voice.
TL;DR: Justin Alvey replaces Google Nest Mini PCB with ESP32 custom PCB which he’s open-sourcing. Shows demo of running LLM voice assistant paired with Beeper to send and receive messages.
Tweet text thread (I would also highly recommend checking out the video demos on Twitter):
I “jailbroke” a Google Nest Mini so that you can run your own LLM’s, agents and voice models. Here’s a demo using it to manage all my messages (with help from @onbeeper) 📷 on, and wait for surprise guest! I thought hard about how to best tackle this and why
After looking into jailbreaking options, I opted to completely replace the PCB. This let’s you use a cheap ($2) but powerful & developer friendly WiFi chip with a highly capable audio framework. This allows a paradigm of multiple cheap edge devices for audio & voice detection…
& offloading large models to a more powerful local device (whether your M2 Mac, PC server w/ GPU or even "tinybox"!) In most cases this device is already trusted with your credentials and data so you don’t have to hand these off to some cloud & data need never leave your home
The custom PCB uses @EspressifSystem's ESP32-S3 I went through 2 revisions from a module to a SoC package with extra flash, simplifying to single-sided SMT (< $10 BOM) All features such as LED’s, capacitive touch, mute switch are working, & even programmable from Arduino (/IDF)
For this demo I used a custom “Maubot” with my @onbeeper credentials (a messaging app which securely bridges your messaging clients using the Matrix protocol & e2e encryption) which runs locally serving an API
I’m then using GPT3.5 (for speed) with function calling to query this
Fro the prompt I added details such as family & friends, current date, notification preferences & a list additional character voices that GPT can respond in. The response is then parsed and sent to @elevenlabsio
I've been experimenting with multiple of these, announcing important messages as they come in, morning briefings, noting down ideas and memos, and browsing agents. I couldn’t resist - here's a playful (unscripted!) video of two talking to each other prompted to be AI’s from "Her
I’m working on open sourcing the PCB design, build instructions, firmware, bot & server code - expect something in the next week or so. If you don't want to source Nest Mini's (or shells from AliExpress) it's still a great dev platform for developing an assistant! Stay tuned!
Makes me wonder how much of Tesla’s “Full Self Driving” is just some dude playing GTA VR with you in the passenger seat.