Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 1 year ago

MODERATORS

communick@academy.garden

[P] fMRaI: Neural network interpretability and explainability library (alien.top)

submitted 11 months ago by Big-Page6926@alien.top to c/machinelearning@academy.garden

1 comments fedilink hide all child comments

Hi everyone,

I recently saw a post about Comgra in this sub-reddit (check it out, it's very cool!), which inspired me to share a similar pet project of mine - mine has a slightly different goal.

I've been very interested in understanding how LLMs work behind the scenes, and I've started reading a bunch of cool papers like:

What Does BERT Look At? An Analysis of BERT's Attention: It shows how attention heads learn very specific functions (like finding the direct object, coreference, etc...)
Transformer Feed-Forward Layers Are Key-Value Memories: Shows how memories are encoded in transformer feed-forward layers and how to extract them, very cool!!

I find it amazing that we have powerful models today, but we are still learning to unveil the covers from the magical black boxes they are.

To this end, I've been working on a library/framework that aims to allow you to perform these analyses (and more) in a general fashion on any model (striving to include modern LLMs like Llama 2 too)

Here's an example result showing the attention head clustering as showcased by the BERT paper, but this time for LLama 2 (in the future I hope to add the ability to name each attention head according to its function, perhaps using LLMs too):

Attention head clustering in Llama 2: Each point is an attention head, each color represents a layer

The library implements dynamic instrumentation of PyTorch functions/tensors that allows concise code like:

with fmrai.fmrai() as fmr:
    m = instrument_model(model)
    with fmr.track() as tracker:
        m(**tokenizer("Hello World", return_tensors="pt"))
        g = tracker.build_graph()

    g.save_dot('graph.dot')

which gives:

Computation graph of BERT

You can then use the tools in library to find where the attention is, extract the tensors, run analyses, etc...

In the future, I hope to add support for image models and more.

Please note that this project is very early-state and not very stable at the moment, I hope someone can find it useful/interesting.

you are viewing a single comment's thread
view the rest of the comments

[–] CatalyzeX_code_bot@alien.top 1 points 11 months ago

Found 1 relevant code implementation for "What Does BERT Look At? An Analysis of BERT's Attention".

Ask the author(s) a question about the paper or code.

If you have code to share with the community, please add it here 😊🙏

Found 2 relevant code implementations for "Transformer Feed-Forward Layers Are Key-Value Memories".

Ask the author(s) a question about the paper or code.

If you have code to share with the community, please add it here 😊🙏

To opt out from receiving code links, DM me.