this post was submitted on 24 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Hi everyone,

I recently saw a post about Comgra in this sub-reddit (check it out, it's very cool!), which inspired me to share a similar pet project of mine - mine has a slightly different goal.

I've been very interested in understanding how LLMs work behind the scenes, and I've started reading a bunch of cool papers like:

I find it amazing that we have powerful models today, but we are still learning to unveil the covers from the magical black boxes they are.

To this end, I've been working on a library/framework that aims to allow you to perform these analyses (and more) in a general fashion on any model (striving to include modern LLMs like Llama 2 too)

Here's an example result showing the attention head clustering as showcased by the BERT paper, but this time for LLama 2 (in the future I hope to add the ability to name each attention head according to its function, perhaps using LLMs too):

Attention head clustering in Llama 2: Each point is an attention head, each color represents a layer

The library implements dynamic instrumentation of PyTorch functions/tensors that allows concise code like:

with fmrai.fmrai() as fmr:
    m = instrument_model(model)
    with fmr.track() as tracker:
        m(**tokenizer("Hello World", return_tensors="pt"))
        g = tracker.build_graph()

    g.save_dot('graph.dot')

which gives:

โ€‹

Computation graph of BERT

You can then use the tools in library to find where the attention is, extract the tensors, run analyses, etc...

In the future, I hope to add support for image models and more.

Please note that this project is very early-state and not very stable at the moment, I hope someone can find it useful/interesting.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] CatalyzeX_code_bot@alien.top 1 points 11 months ago

Found 1 relevant code implementation for "What Does BERT Look At? An Analysis of BERT's Attention".

Ask the author(s) a question about the paper or code.

If you have code to share with the community, please add it here ๐Ÿ˜Š๐Ÿ™

--

Found 2 relevant code implementations for "Transformer Feed-Forward Layers Are Key-Value Memories".

Ask the author(s) a question about the paper or code.

If you have code to share with the community, please add it here ๐Ÿ˜Š๐Ÿ™

--

To opt out from receiving code links, DM me.