this post was submitted on 09 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

I'm a data engineer who somehow ended up as a software developer. So many of my friends are working now with the OpenAI api to add generative capabilities to their product, but they lack A LOT of context when it comes to how LLMs actually works.

This is why I started writing popular-science style articles that unpack AI concepts for software developers working on real-world application. It started kind of slow, honestly I wrote a bit too "brainy" for them, but now I've found a voice that resonance with this audience much better and I want to ramp up my writing cadence.

I would love to hear your thoughts about what concepts I should write about next?
What get you excited and you find hard to explain to someone with a different background?

you are viewing a single comment's thread
view the rest of the comments
[–] General_Service_8209@alien.top 1 points 10 months ago (2 children)

State space models and their derivatives.

They have demonstrated better performance that Transformers on very long sequences, and that with linear instead of quadratic Computational costs, and on paper also generalize better to non-NLP tasks.

However, training them is more difficult, so they perform worse in practice outside of these few very long sequence tasks. But with a bit more development, they could become the most impactful AI technology in years.

[–] BEEIKLMRU@alien.top 1 points 10 months ago (1 children)

do you have anything in particular you think is worth sharing? I‘m trying to implement model predictive control in matlab and i‘m working on a lstm surrogate model. Just last week i‘ve found matlab tools for neural state space models, and i‘ve been wondering if i just uncovered a big blindspot of mine.

[–] General_Service_8209@alien.top 1 points 10 months ago

It sounds like you‘ve come across exactly what I meant.

I have a couple of papers on the topic if you’re interested in those. There’s also a PyTorch implementation of a neural state space model by the authors of the original paper: https://github.com/HazyResearch/state-spaces