Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] How do you deal with LLM observability? What tools do you guys use? (alien.top)

submitted 2 years ago by Ok_Cartographer5609@alien.top to c/machinelearning@academy.garden

10 comments fedilink hide all child comments

I want to know the tools and methods you use for the observability and monitoring of your ML (LLM) performance and responses in production.

top 10 comments

sorted by: hot top controversial new old

[–] thegratefulshread@alien.top 1 points 2 years ago (1 children)

Bruh. Gpt api, ide of ur choice into a terminal.

[–] synthphreak@alien.top 1 points 2 years ago

TBH I didn’t completely understand the question, but it is clear that you didn’t either.

[–] kennysong@alien.top 1 points 2 years ago

If you're open to using an open source library, you can use LangCheck to monitor and visualize text quality metrics in production.

For example, you can compute & plot toxicity of users prompts and LLM responses from your logs. (A very simple example here.)

(Disclaimer: I'm one of the contributors of LangCheck)

[–] scorpfromhell@alien.top 1 points 2 years ago (1 children)

Just trying to understand the term "observability" here, in the context of LLMs.

What is observed?

Why is it observed?

[–] manninaki@alien.top 1 points 2 years ago

I would say hallucinations mostly

[–] Traditional_Swan_326@alien.top 1 points 2 years ago

Hi there, Langfuse founder here. We're building observability & analytics in open source (MIT). You can instrument your LLM via our SDKs (JS/TS & Python) or integrations (e.g. LangChain) and collect all the data you want to observe. The product is model-agnostic & customizable.

We've pre-built dashboards you can use to analyze e.g. cost, latency and token usage in detailled breakdowns.

Now, we're starting to build (model-based) evaluations right now to get a grip on quality. You can manually ingest scores via our SDKs, too. + export as .csv and via get API.

Would love to hear feedback from folks on this reddit on what we've built and feel free to message me here or at contact at langfuse dot com

We have an open demo so you can have a look around a project with sample data.

[–] Serquet1@alien.top 1 points 2 years ago

Hey, we recently rolled out Nebuly, a tool focused on LLM observability in production. Thought it might be of interest to some here, and potentially useful for your needs. Here are some highlights:
- Deep User Analytics: More insightful than thumbs up/down, it delves into LLM user interactions.
- Easy Integration: Simply include our API key and a user_id parameter in your model call.
- User Journeys: Gain insights into user interactions with LLMs using autocapture.
- FAQ Insights: Identifies the most frequently asked questions by LLM users.
- Cost Monitoring: Strives to find the sweet spot between user satisfaction and ROI.

For a deeper dive, here's our latest blog post on the topic: What is User Analytics for LLMs.

[–] glitch83@alien.top 1 points 2 years ago

The responses are insane. LLMs are out of control…

[–] Commercial_Baker_463@alien.top 1 points 2 years ago

I am a data scientist at Fiddler. Fiddler AI (https://www.fiddler.ai) provides a nice set of tools for LLMOps and MLOps observability. It supports pre-production and post-production monitoring for both predictive models and generative AI.
Specifically, Fiddler Auditor (https://github.com/fiddler-labs/fiddler-auditor) is an open source package that can be used to evaluate LLMs and NLP Models. In additional to that, Fiddler provides helpful tools for monitoring and visualization of NLP data (eg text embeddings) which can be used for data drift detection, user/model feedback analysis, and evaluation of safety metrics as well as custom metrics.

[–] catkage@alien.top 1 points 2 years ago

honeyhive.ai is great!