Blaed

joined 1 year ago
 

cross-posted from: https://lemmy.world/post/6399678

🤖 Happy FOSAI Friday! 🚀

Friday, October 6, 2023

HyperTech News Report #0003

Hello Everyone!

This week highlights a wave of new papers and frameworks that expand upon LLM functionalities. With a tsunami of applications on the horizon I foresee a bedrock of tools to preceed. I'm not sure what kits and processes will end up part of this bedrock, but I hope some of these methods end up interesting or helpful to your workflow!

Table of Contents

Community Changelog

Image of the Week

This image of the week comes from one of my own projects! I hope you don't mind me sharing.. I was really happy with this result. This was generated from an SDXL model I trained and host on Replicate. I use an mock ensemble approach to generate various game assets for an experimental roguelike I'm making with a colleague.

My current method is not at all efficient, but I have fun. Right now, I have three SDXL models I interact with, each generating art I can use for my project. Andraxus takes care of wallpapers and in-game levels (this image you're seeing here), his in-game companion Biazera imagines characters and entities of this world, while Cerephelo tinkers and toils over the machinations within - crafting items, loot, powerups, etc.

I've been hesitant self-promoting here. But if there's genuine interest in this project I would be more than happy sharing more details. It's still in pre-alpha development, but there were plans releasing all of the models we use as open-source (obviously). We're still working on the engine though. Let me know if you want to see more on this project.


News


  1. Arxiv Publications Workflow: A new workflow has been introduced that allows users to scrape search topics from Arxiv, converting the results into markdown (MD) format. This makes it easier to digest and understand topics from Arxiv published content. The tool, available on GitHub, is particularly useful for those who wish to delve deeper into research papers and run their own research processes.

  2. Texting LLMs from Your Phone: A guide has been shared that enables users to communicate with their personal assistants via simple text messages. The process involves setting up a Twilio account, purchasing and registering a phone number, and then integrating it with the Replicate platform. The code, available on GitHub, makes it possible to send and receive messages from LLMs directly on one's phone.

  3. Microsoft's AutoGen: Microsoft has released AutoGen, a tool designed to aid in the creation of autonomous LLM agents. Compatible with ChatGPT models, AutoGen facilitates the development of LLM applications using multiple agents that can converse with each other to solve tasks. The framework is customizable and allows for seamless human participation. More details can be found on GitHub.

  4. Promptbench and ACE Framework: Promptbench is a new project focused on the evaluation and benchmarking of models. Stemming from the DyVal paper, it aims to provide reliable insights into model performance. On the other hand, the ACE Framework, designed for autonomous cognitive entities, offers a unique approach to agent tooling. While still in its early stages, it promises to bring about innovative implementations in the realms of personal assistants, game world NPCs, autonomous employees, and embodied robots.

  5. Research Highlights: Several papers have been published that delve into the intricacies of LLMs. One paper introduces a method to enhance the zero-shot reasoning abilities of LLMs, while another, titled DyVal, proposes a dynamic evaluation protocol for LLMs. Additionally, the concept of Low-Rank Adapters (LoRA) ensembles for LLM fine-tuning has been explored, emphasizing the potential of using one model and dynamically swapping the fine-tuned QLoRA adapters.


Tools & Frameworks


Keep Up w/ Arxiv Publications

Due to a drastic change in personal and work schedules, I've had to shift how I research and develop posts and projects for you guys. That being said, I found this workflow from the same author of the ACE Framework particularly helpful. It scrapes a search topic from Arxiv and returns a massive XML that is converted to markdown (MD) to then be used as an injectable context report for a LLM of your choosing (to further break down and understand topics) or as a well of information for the classic CTRL + F search. But at this point, info is aggregated (and human readable) from Arxiv published content.

After reading abstractions you can further drill into each paper and dissect / run your own research processes as you see fit. There is definitely more room for automation and organization here I'm sure, but this has been a big resource for me lately so I wanted to proliferate it for others who might find it helpful too.

Text LLMs from Your Phone

I had an itch to make my personal assistants more accessible - so I started investigating ways I could simply text them from my iPhone (via simple sms). There are many other ways I could've done this, but texting has been something I always like to default to in communications. So, I found this cool guide that uses infra I already prefer (Replicate) and has a bonus LangChain integration - which opens up the door to a ton of other opportunities down the line.

This tutorial was pretty straightforward - but to be honest, making the Twilio account, buying a phone number (then registering it) took the longest. The code itself takes less than 10 minutes to get up and running with ngrok. Super simple and straightforward there. The Twilio process? Not so much.. but it was worth the pain!

I am still waiting on my phone number to be verified (so that the Replicate inference endpoint can actually send SMS back to me) but I ended the night successfully texting the server on my local PC. It was wild texting the Ahsoka example from my phone and seeing the POST response return (even though it didn't go through SMS I could still see the server successfully receive my incoming message/prompt). I think there's a lot of fun to be had giving casual phone numbers and personalities to assistants like this. Especially if you want to LangChain some functions beyond just the conversation. If there's more interest on this topic, I can share how my assistant evolves once it gets full access to return SMS. I am designing this to streamline my personal life, and if it proves to be useful I will absolutely release the project as open-source.

AutoGen

With Agents on the rise, tools and automation pipelines to build them have become increasingly more important to consider. It seems like Microsoft is well aware of this, and thus released AutoGen, a tool to help enable this automation tooling and creation of autonomous LLM agents. AutoGen is compatible with ChatGPT models and is being kitted for local LLMs as we speak.

AutoGen is a framework that enables the development of LLM applications using multiple agents that can converse with each other to solve tasks. AutoGen agents are customizable, conversable, and seamlessly allow human participation. They can operate in various modes that employ combinations of LLMs, human inputs, and tools.

Promptbench

I recently found promptbench - a project that seems to have stemmed from the DyVal paper (shared below). I for one appreciate some of the new tools that are releasing focused around the evaluation and benchmarking of models. I hope we continue to see more evals, benchmarks, and projects that return us insights we can rely upon.

ACE Framework

A new framework has been proposed and designed for autonomous cognitive entities. This appears similar to agents and their style of tooling, but with a different architecture approach? I don't believe implementation of this is ready, but it may be soon and something to keep an eye on.

There are many possible implementations of the ACE Framework. Rather than detail every possible permutation, here is a list of categories that we perceive as likely and viable.

Personal Assistant and/or Companion

  • This is a self-contained version of ACE that is intended to interact with one user.
  • Think of Cortana from HALO, Samantha from HER, or Joi from Blade Runner 2049. (yes, we recognize these are all sexualized female avatars)
  • The idea would be to create something that is effectively a personal Executive Assistant that is able to coordinate, plan, research, and solve problems for you. This could be deployed on mobile, smart home devices, laptops, or web sites.

Game World NPC's

  • This is a kind of game character that has their own personality, motivations, agenda, and objectives. Furthermore, they would have their own unique memories.
  • This can give NPCs a much more realistic ability to pursue their own objectives, which should make game experiences much more dynamic and unpredictable, thus raising novelty. These can be adapted to 2D or 3D game engines such as PyGame, Unity, or Unreal.

Autonomous Employee

  • This is a version of the ACE that is meant to carry out meaningful and productive work inside a corporation.
  • Whether this is a digital CSR or backoffice worker depends on the deployment.
  • It could also be a "digital team member" that primarily interacts via Discord, Slack, or Microsoft Teams.

Embodied Robot

The ACE Framework is ideal to create self-contained, autonomous machines. Whether they are domestic aid robots or something like WALL-E


Papers


Agent Instructs Large Language Models to be General Zero-Shot Reasoners

We introduce a method to improve the zero-shot reasoning abilities of large language models on general language understanding tasks. Specifically, we build an autonomous agent to instruct the reasoning process of large language models. We show this approach further unleashes the zero-shot reasoning abilities of large language models to more tasks. We study the performance of our method on a wide set of datasets spanning generation, classification, and reasoning. We show that our method generalizes to most tasks and obtains state-of-the-art zero-shot performance on 20 of the 29 datasets that we evaluate. For instance, our method boosts the performance of state-of-the-art large language models by a large margin, including Vicuna-13b (13.3%), Llama-2-70b-chat (23.2%), and GPT-3.5 Turbo (17.0%). Compared to zero-shot chain of thought, our improvement in reasoning is striking, with an average increase of 10.5%. With our method, Llama-2-70b-chat outperforms zero-shot GPT-3.5 Turbo by 10.2%.

DyVal: Graph-informed Dynamic Evaluation of Large Language Models

Large language models (LLMs) have achieved remarkable performance in various evaluation benchmarks. However, concerns about their performance are raised on potential data contamination in their considerable volume of training corpus. Moreover, the static nature and fixed complexity of current benchmarks may inadequately gauge the advancing capabilities of LLMs. In this paper, we introduce DyVal, a novel, general, and flexible evaluation protocol for dynamic evaluation of LLMs. Based on our proposed dynamic evaluation framework, we build graph-informed DyVal by leveraging the structural advantage of directed acyclic graphs to dynamically generate evaluation samples with controllable complexities. DyVal generates challenging evaluation sets on reasoning tasks including mathematics, logical reasoning, and algorithm problems. We evaluate various LLMs ranging from Flan-T5-large to ChatGPT and GPT4. Experiments demonstrate that LLMs perform worse in DyVal-generated evaluation samples with different complexities, emphasizing the significance of dynamic evaluation. We also analyze the failure cases and results of different prompting methods. Moreover, DyVal-generated samples are not only evaluation sets, but also helpful data for fine-tuning to improve the performance of LLMs on existing benchmarks. We hope that DyVal can shed light on the future evaluation research of LLMs.

LoRA ensembles for large language model fine-tuning

Finetuned LLMs often exhibit poor uncertainty quantification, manifesting as overconfidence, poor calibration, and unreliable prediction results on test data or out-of-distribution samples. One approach commonly used in vision for alleviating this issue is a deep ensemble, which constructs an ensemble by training the same model multiple times using different random initializations. However, there is a huge challenge to ensembling LLMs: the most effective LLMs are very, very large. Keeping a single LLM in memory is already challenging enough: keeping an ensemble of e.g. 5 LLMs in memory is impossible in many settings. To address these issues, we propose an ensemble approach using Low-Rank Adapters (LoRA), a parameter-efficient fine-tuning technique. Critically, these low-rank adapters represent a very small number of parameters, orders of magnitude less than the underlying pre-trained model. Thus, it is possible to construct large ensembles of LoRA adapters with almost the same computational overhead as using the original model. We find that LoRA ensembles, applied on its own or on top of pre-existing regularization techniques, gives consistent improvements in predictive accuracy and uncertainty quantification.

There is something to be discovered between LoRA, QLoRA, and ensemble/MoE designs. I am digging into this niche because of an interesting bit I heard from sentdex (if you want to skip to the part I'm talking about, go to 13:58). Around 15:00 minute mark he brings up QLoRA adapters (nothing new) but his approach was interesting.

He eventually shares he is working on a QLoRA ensemble approach with skunkworks (presumably Boeing skunkworks). This confirmed my suspicion. Better yet - he shared his thoughts on how all of this could be done. Watch and support his video for more insights, but the idea boils down to using one model and dynamically swapping the fine-tuned QLoRA adapters. I think this is a highly efficient and unapplied approach. Especially in that MoE and ensemble realm of design. If you're reading this and understood anything I said - get to building! This is a seriously interesting idea that could yield positive results. I will share my findings when I find the time to dig into this more.


Author's Note

This post was authored by the moderator of !fosai@lemmy.world - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now... if you found anything about this post interesting, consider subscribing to !fosai@lemmy.world where you can join us on the journey into the great unknown!

Until next time!

Blaed

 

cross-posted from: https://lemmy.world/post/5965315

🤖 Happy FOSAI Friday! 🚀

Friday, September 29, 2023

HyperTech News Report #0002

Hello Everyone!

Welcome back to the HyperTech News Report! This week we're seeing some really exciting developments in futuristic technologies. With more tools and methods releasing by the day, I feel we're in for a renaissance in software. I hope hardware is soon to follow.. but I am here for it! So are you. Brace yourselves. Change is coming! This next year will be very interesting to watch unfold.

Table of Contents

Community Changelog

  • Cleaned up some old content (let me know if you notice something that should be archived or updated)

Image of the Week

This image of the week comes from a DALL-E 3 demonstration by Will Depue. This depicts a popular image for diffusion models benchmarks - the astronaut riding a horse in space. Apparently this was hard to get right, and others have had trouble replicating it - but it seems to have been generated by DALL-E 3 nevertheless. Curious to see how it stacks up against other diffusers when its more widely available.

New Foundation Model!

There have been many new models hitting HuggingFace on the daily. The recent influx has made it hard to benchmark and keep up with these models - so I will be highlighting a hand select curated week-by-week, exploring these with more focus (a few at a time).

If you have any model favorites (or showcase suggestions) let me know what they are in the comments below and I'll add them to the growing catalog!

This week we're taking a look at Mistral - a new foundation model with a sliding attention mechanism that gives it advantages over other models. Better yet - the mistral.ai team released this new model under the Apache 2.0 license. Massive shoutout to this team, this is huge for anyone who wants more options (commercially) outside of Llama 2 and Falcon families.

From Mistralai:

The best 7B, Apache 2.0.. Mistral-7B-v0.1 is a small, yet powerful model adaptable to many use-cases. Mistral 7B is better than Llama 2 13B on all benchmarks, has natural coding abilities, and 8k sequence length. It’s released under Apache 2.0 licence, and we made it easy to deploy on any cloud.

Learn More

Mistralai

TheBloke (Quantized)

More About GPTQ

More About GGUF

Metaverse Developments

Mark Zuckerberg had his third round interview on the Lex Fridman podcast - but this time, in the updated Metaverse. This is pretty wild. We seem to have officially left uncanny valley territory. There are still clearly bugs and improvements to be made - but imagine the possibilities of this mixed reality technology (paired with VR LLM applications).

The type of experiences we can begin to explore in these digital realms are going to evolve into things of true sci-fi in our near future. This is all very exciting stuff to look forward to as AI proliferates markets and drives innovation.

What do you think? Zuck looks more human in the metaverse than in real life.. mission.. success?

Click here for the podcast episode.

NVIDIA NeMo Guardrails

If you haven't heard about NeMo Guardrails, you should check it out. It is a new library and approach for aligning models and completing functions for LLMs. It is similar to LangChain and LlamaIndex, but uses an in-house developed language from NVIDIA called 'colang' for configuration, with NeMo Guardrail libraries in python friendly syntax.

This is still a new and unexplored tool, but could provide some interesting results with some creative applications. It is also particularly powerful if you need to align enterprise LLMs for clients or stakeholders.

Learn More

Tutorial Highlights

Mistral 7B - Small But Mighty 🚀 🚀

Chatbots with RAG: LangChain Full Walkthrough

NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI

Author's Note

This post was authored by the moderator of !fosai@lemmy.world - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

 

cross-posted from: https://lemmy.world/post/5549499

🤖 Happy FOSAI Friday! 🚀

Friday, September 22, 2023

HyperTech News Report #0001

Hello Everyone!

This series is a new vehicle for !fosai@lemmy.world news reports. In these posts I'll go over projects or news I stumble across week-over-week. I will try to keep Fridays consistent with this series, covering most of what I have been (but at regular cadence). For this week, I am going to do my best catching us up on a few old (and new) hot topics you may or may not have heard about already.

Table of Contents

Community Changelog

Image of the Week

A Stable Diffusion + ControlNet image garnered a ton of attention on social media this last week. This image has brought more recognition to the possibilities of these tools and helps shed a more positive light on the capabilities of generative models.

Read More

Introducing HyperTech

HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

HyperTech Workshop (V0.1.0)

I am excited to announce my technology company: HyperTech. The first project of HyperionTechnologies is a digital workshop that comes in the form of a GitHub repo template for AI/ML/DL developers. HyperTech is a for-fun sci-fi company I started to explore AI development (among other emerging technologies I find curious and interesting). It is a satire corpo sandbox I have designed around my personal journey inside and outside of !fosai@lemmy.world with highly experimental projects and workflows. I will be using this company and setting/narrative/thematic to drive some of the future (and totally optional) content of our community. Any tooling, templates, or examples made along the way are entirely for you to learn from or reverse engineer for your own purpose or amusement. I'll be doing a dedicated post to HyperTech later this weekend. Keep your eye out for that if you're curious. The future is now. The future is bright. The future is HYPERION. (don't take this project too seriously).

New GGUF Models

Within this last month or so, llama.cpp have begun to standardize a new model format - the .GGUF model - which is much more optimized than its now legacy (and deprecated predecessor - GGML). This is a big deal for anyone running GGML models. GGUF is basically superior in all ways. Check out llama.cpp's notes about this change on their official GitHub. I have used a few GGUF models myself and have found them much more performant than any GGML counterpart. TheBloke has already converted many of his older models into this new format (which is compatible with anything utilizing llama.cpp).

More About GGUF:

It is a successor file format to GGML, GGMF and GGJT, and is designed to be unambiguous by containing all the information needed to load a model. It is also designed to be extensible, so that new features can be added to GGML without breaking compatibility with older models. Basically: 1.) No more breaking changes 2.) Support for non-llama models. (falcon, rwkv, bloom, etc.) and 3.) No more fiddling around with rope-freq-base, rope-freq-scale, gqa, and rms-norm-eps. Prompt formats could also be set automatically.

Falcon 180B

Many of you have probably already heard of this, but Falcon 180B was recently announced - and I haven't covered it here yet so it's worth mentioning in this post. Check out the full article regarding its release here on HuggingFace. Can't wait to see what comes next! This will open up a lot of doors for us to explore.

Today, we're excited to welcome TII's Falcon 180B to HuggingFace! Falcon 180B sets a new state-of-the-art for open models. It is the largest openly available language model, with 180 billion parameters, and was trained on a massive 3.5 trillion tokens using TII's RefinedWeb dataset. This represents the longest single-epoch pretraining for an open model. The dataset for Falcon 180B consists predominantly of web data from RefinedWeb (~85%). In addition, it has been trained on a mix of curated data such as conversations, technical papers, and a small fraction of code (~3%). This pretraining dataset is big enough that even 3.5 trillion tokens constitute less than an epoch.

The released chat model is fine-tuned on chat and instruction datasets with a mix of several large-scale conversational datasets.

‼️ Commercial Usage: Falcon 180b can be commercially used but under very restrictive conditions, excluding any "hosting use". We recommend to check the license and consult your legal team if you are interested in using it for commercial purposes.

You can find the model on the Hugging Face Hub (base and chat model) and interact with the model on the Falcon Chat Demo Space.

LLama 3 Rumors

Speaking of big open-source models - Llama 3 is rumored to be under training or development. Llama 2 was clearly an improvement over its predecessor. I wonder how Llama 3 & 4 will stack in this race to AGI. I forget that we're still early to this party. At this rate of development, I believe we're bound to see it within the decade.

Meta plans to rival GPT-4 with a rumored free Llama 3- According to an early rumor, Meta is working on Llama 3, which is intended to compete with GPT-4, but will remain largely free under the Llama license.- Jason Wei, an engineer associated with OpenAI, has indicated that Meta possesses the computational capacity to train Llama 3 to a level comparable to GPT-4. Furthermore, Wei suggests that the feasibility of training Llama 4 is already within reach.- Despite Wei's credibility, it's important to acknowledge the possibility of inaccuracies in his statements or the potential for shifts in these plans.

DALM

I recently stumbled across DALM - a new domain adapted language modeling toolkit which is supposed to enable a workflow that trains a retrieval augmented generation (RAG) pipeline from end-to-end. According to their results, the DALM specific training process leads to a much higher response quality when it comes to retrieval augmented generation. I haven't had a chance to tinker with this a lot, but I'd keep an eye on it if you're engaging with RAG workflows.

DALM Manifesto:

A great rift has emerged between general LLMs and the vector stores that are providing them with contextual information. The unification of these systems is an important step in grounding AI systems in efficient, factual domains, where they are utilized not only for their generality, but for their specificity and uniqueness. To this end, we are excited to open source the Arcee Domain Adapted Language Model (DALM) toolkit for developers to build on top of our Arcee open source Domain Pretrained (DPT) LLMs. We believe that our efforts will help as we begin next phase of language modeling, where organizations deeply tailor AI to operate according to their unique intellectual property and worldview.

For the first time in the literature, we modified the initial RAG-end2end model (TACL paper, HuggingFace implementation) to work with decoder-only language models like Llama, Falcon, or GPT. We also incorporated the in-batch negative concept alongside the RAG's marginalization to make the entire process efficient.

DALL-E 3

OpenAI announced DALL-E 3 that will have direct native compatibility within ChatGPT. This means users should be able to naturally and semantically iterate over images and features over time, adjusting the output from the same chat interface throughout their conversation. This will enable many users to seamlessly incorporate image diffusion into their chat workflows.

I think this is huge, mostly because it illustrates a new technique that removes some of the barriers that prompt engineers have to solve (it reads prompts differently than other diffusers). Not to mention you are permitted to sell, keep, and commercialize any image DALL-E generates.

I am curious to see if open-source workflows can follow a similar approach and have iterative design workflows that seamlessly integrate with a chat interface. That, paired with manual tooling from things like ControlNet would be a powerful pairing that could spark a lot of creativity. Don't get me wrong, sometimes I really like manual and node-based workflows, but I believe semantic computation is the future. Regardless of how 'open' OpenAI truly is, these breakthroughs help chart the path forward for everyone else still catching up.

More About DALL-E 3:

DALL·E 3 is now in research preview, and will be available to ChatGPT Plus and Enterprise customers in October, via the API and in Labs later this fall. Modern text-to-image systems have a tendency to ignore words or descriptions, forcing users to learn prompt engineering. DALL·E 3 represents a leap forward in our ability to generate images that exactly adhere to the text you provide. DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph. When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October. As with DALL·E 2, the images you create with DALL·E 3 are yours to use and you don't need our permission to reprint, sell or merchandise them.

Author's Note

This post was authored by the moderator of !fosai@lemmy.world - Blaed. I make games, produce music, write about tech, and develop free open-source artificial intelligence (FOSAI) for fun. I do most of this through a company called HyperionTechnologies a.k.a. HyperTech or HYPERION - a sci-fi company.

Thanks for Reading!

If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world where I do my best to keep you informed about free open-source artificial intelligence as it emerges in real-time.

Our community is quickly becoming a living time capsule thanks to the rapid innovation of this field. If you've gotten this far, I cordially invite you to join us and dance along the path to AGI and the great unknown.

Come on in, the water is fine, the gates are wide open! You're still early to the party, so there is still plenty of wonder and discussion yet to be had in our little corner of the digiverse.

This post was written by a human. For other humans. About machines. Who work for humans for other machines. At least for now...

Until next time!

Blaed

 

cross-posted from: https://lemmy.world/post/3879861

Beating GPT-4 on HumanEval with a Fine-Tuned CodeLlama-34B

Hello everyone! This post marks an exciting moment for !fosai@lemmy.world and everyone in the open-source large language model and AI community.

We appear to have a new contender on the block, a model apparently capable of surpassing OpenAI's state of the art ChatGPT-4 in coding evals (evaluations).

This is huge. Not too long ago I made an offhand comment on us catching up to GPT-4 within a year. I did not expect that prediction to end up being reality in half the time. Let's hope this isn't a one-off scenario and that we see a new wave of open-source models that begin to challenge OpenAI.

Buckle up, it's going to get interesting!

Here's some notes from the blog, which you should visit and read in its entirety:


Blog Post

We have fine-tuned CodeLlama-34B and CodeLlama-34B-Python on an internal Phind dataset that achieved 67.6% and 69.5% pass@1 on HumanEval, respectively. GPT-4 achieved 67% according to their official technical report in March. To ensure result validity, we applied OpenAI's decontamination methodology to our dataset.

The CodeLlama models released yesterday demonstrate impressive performance on HumanEval.

  • CodeLlama-34B achieved 48.8% pass@1 on HumanEval
  • CodeLlama-34B-Python achieved 53.7% pass@1 on HumanEval

We have fine-tuned both models on a proprietary dataset of ~80k high-quality programming problems and solutions. Instead of code completion examples, this dataset features instruction-answer pairs, setting it apart structurally from HumanEval. We trained the Phind models over two epochs, for a total of ~160k examples. LoRA was not used — both models underwent a native fine-tuning. We employed DeepSpeed ZeRO 3 and Flash Attention 2 to train these models in three hours using 32 A100-80GB GPUs, with a sequence length of 4096 tokens.

Furthermore, we applied OpenAI's decontamination methodology to our dataset to ensure valid results, and found no contaminated examples. 

The methodology is:

  • For each evaluation example, we randomly sampled three substrings of 50 characters or used the entire example if it was fewer than 50 characters.
  • A match was identified if any sampled substring was a substring of the processed training example.

For further insights on the decontamination methodology, please refer to Appendix C of OpenAI's technical report. Presented below are the pass@1 scores we achieved with our fine-tuned models:

  • Phind-CodeLlama-34B-v1 achieved 67.6% pass@1 on HumanEval
  • Phind-CodeLlama-34B-Python-v1 achieved 69.5% pass@1 on HumanEval

Download

We are releasing both models on Huggingface for verifiability and to bolster the open-source community. We welcome independent verification of results.


If you get a chance to try either of these models out, let us know how it goes in the comments below!

If you found anything about this post interesting, consider subscribing to !fosai@lemmy.world.

Cheers to the power of open-source! May we continue the fight for optimization, efficiency, and performance.

 

cross-posted from: https://lemmy.world/post/3549390

stable-diffusion.cpp

Introducing stable-diffusion.cpp, a pure C/C++ inference engine for Stable Diffusion! This is a really awesome implementation to help speed up home inference of diffusion models.

Tailored for developers and AI enthusiasts, this repository offers a high-performance solution for creating and manipulating images using various quantization techniques and accelerated inference.


Key Features:

  • Efficient Implementation: Utilizing plain C/C++, it operates seamlessly like llama.cpp and is built on the ggml framework.
  • Multiple Precision Support: Choose between 16-bit, 32-bit float, and 4-bit to 8-bit integer quantization.
  • Optimized Performance: Experience memory-efficient CPU inference with AVX, AVX2, and AVX512 support for x86 architectures.
  • Versatile Modes: From original txt2img to img2img modes and negative prompt handling, customize your processing needs.
  • Cross-Platform Compatibility: Runs smoothly on Linux, Mac OS, and Windows.

Getting Started

Cloning, building, and running are made simple, and detailed examples are provided for both text-to-image and image-to-image generation. With an array of options for precision and comprehensive usage guidelines, you can easily adapt the code for your specific project requirements.

git clone --recursive https://github.com/leejet/stable-diffusion.cpp
cd stable-diffusion.cpp
  • If you have already cloned the repository, you can use the following command to update the repository to the latest code.
cd stable-diffusion.cpp
git pull origin master
git submodule update

More Details

  • Plain C/C++ implementation based on ggml, working in the same way as llama.cpp
  • 16-bit, 32-bit float support
  • 4-bit, 5-bit and 8-bit integer quantization support
  • Accelerated memory-efficient CPU inference
    • Only requires ~2.3GB when using txt2img with fp16 precision to generate a 512x512 image
  • AVX, AVX2 and AVX512 support for x86 architectures
  • Original txt2img and img2img mode
  • Negative prompt
  • stable-diffusion-webui style tokenizer (not all the features, only token weighting for now)
  • Sampling method
    • Euler A
  • Supported platforms
    • Linux
    • Mac OS
    • Windows

This is a really exciting repo. I'll be honest, I don't think I am as well versed in what's going on for diffusion inference - but I do know more efficient and effective methods running those models are always welcome by people frequently using diffusers. Especially for those who need to multi-task and maintain performance headroom.

 

cross-posted from: https://lemmy.world/post/3350022

Incognito Pilot: The Next-Gen AI Code Interpreter for Sensitive Data

Hello everyone! Today marks the first day of a new series of posts featuring projects in my GitHub Stars.

Most of these repos are FOSS & FOSAI focused, meaning they should be hackable, free, and (mostly) open-source.

We're going to kick this series off by sharing Incognito Pilot. It’s like the ChatGPT Code Interpreter but for those who prioritize data privacy.

Project Summary from ChatGPT-4:

Features:

  • Powered by Large Language Models like GPT-4 and Llama 2.
  • Run code and execute tasks with Python interpreter.
  • Privacy: Interacts with cloud but sensitive data stays local.
  • Local or Remote: Choose between local LLMs (like Llama 2) or API (like GPT-4) with data approval mechanism.

You can use Incognito Pilot to:

  • Analyse data, create visualizations.
  • Convert files, e.g., video to gif.
  • Internet access for tasks like downloading data.

Incognito Pilot ensures data privacy while leveraging GPT-4's capabilities.

Getting Started:

  1. Installation:

    • Use Docker (For Llama 2, check dedicated installation).
    • Create a folder for Incognito Pilot to access. Example: /home/user/ipilot.
    • Have an OpenAI account & API key.
    • Use the provided docker command to run.
    • Access via: http://localhost:3030
    • Bonus: Works with OpenAI's free trial credits (For GPT-3.5).
  2. First Steps:

    • Chat with the interface: Start by saying "Hi".
    • Get familiar: Command it to print "Hello World".
    • Play around: Make it create a text file with numbers.

Notes:

  • Data you enter and approved code results are sent to cloud APIs.
  • All data is processed locally.
  • Advanced users can customize Python interpreter packages for added functionalities.

FAQs:

  • Comparison with ChatGPT Code Interpreter: Incognito Pilot offers a balance between privacy and functionality. It allows internet access, and can be run on powerful machines for larger tasks.

  • Why use Incognito Pilot over just ChatGPT: Multi-round code execution, tons of pre-installed dependencies, and a sandboxed environment.

  • Data Privacy with Cloud APIs: Your core data remains local. Only meta-data approved by you gets sent to the API, ensuring a controlled and conscious usage.


Personally, my only concern using ChatGPT has always been about data privacy. This explores an interesting way to solve that while still getting the state of the art performance that OpenAI has managed to maintain (so far).

I am all for these pro-privacy projects. I hope to see more emerge!

If you get a chance to try this, let us know your experience in the comments below!


Links from this Post

 

cross-posted from: https://lemmy.world/post/2219010

Hello everyone!

We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all.

It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI.

The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon!

In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see.


Getting Started With Free Open-Source AI

Have no idea where to begin with AI / LLMs? Try starting with our Lemmy Crash Course for Free Open-Source AI.

When you're ready to explore more resources see our FOSAI Nexus - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology.

If you're looking to jump right in, I recommend downloading oobabooga's text-generation-webui and installing one of the LLMs from TheBloke below.

Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B).

8-bit System Requirements

Model VRAM Used Minimum Total VRAM Card Examples RAM/Swap to Load*
LLaMA-7B 9.2GB 10GB 3060 12GB, 3080 10GB 24 GB
LLaMA-13B 16.3GB 20GB 3090, 3090 Ti, 4090 32 GB
LLaMA-30B 36GB 40GB A6000 48GB, A100 40GB 64 GB
LLaMA-65B 74GB 80GB A100 80GB 128 GB

4-bit System Requirements

Model Minimum Total VRAM Card Examples RAM/Swap to Load*
LLaMA-7B 6GB GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 6 GB
LLaMA-13B 10GB AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 12 GB
LLaMA-30B 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 32 GB
LLaMA-65B 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 64 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

When in doubt, try starting with 3B or 7B models and work your way up to 13B+.

FOSAI Resources

Fediverse / FOSAI

LLM Leaderboards

LLM Search Tools


Large Language Model Hub

Download Models

oobabooga

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. It is highly compatible with many formats.

Exllama

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.


Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke here.


70B


30B


13B


7B


More Models


GL, HF!

Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community.

If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge.

Thank you for reading!

 

cross-posted from: https://lemmy.world/post/2219010

Hello everyone!

We have officially hit 1,000 subscribers! How exciting!! Thank you for being a member of !fosai@lemmy.world. Whether you're a casual passerby, a hobby technologist, or an up-and-coming AI developer - I sincerely appreciate your interest and support in a future that is free and open for all.

It can be hard to keep up with the rapid developments in AI, so I have decided to pin this at the top of our community to be a frequently updated LLM-specific resource hub and model index for all of your adventures in FOSAI.

The ultimate goal of this guide is to become a gateway resource for anyone looking to get into free open-source AI (particularly text-based large language models). I will be doing a similar guide for image-based diffusion models soon!

In the meantime, I hope you find what you're looking for! Let me know in the comments if there is something I missed so that I can add it to the guide for everyone else to see.


Getting Started With Free Open-Source AI

Have no idea where to begin with AI / LLMs? Try starting with our Lemmy Crash Course for Free Open-Source AI.

When you're ready to explore more resources see our FOSAI Nexus - a hub for all of the major FOSS & FOSAI on the cutting/bleeding edges of technology.

If you're looking to jump right in, I recommend downloading oobabooga's text-generation-webui and installing one of the LLMs from TheBloke below.

Try both GGML and GPTQ variants to see which model type performs to your preference. See the hardware table to get a better idea on which parameter size you might be able to run (3B, 7B, 13B, 30B, 70B).

8-bit System Requirements

Model VRAM Used Minimum Total VRAM Card Examples RAM/Swap to Load*
LLaMA-7B 9.2GB 10GB 3060 12GB, 3080 10GB 24 GB
LLaMA-13B 16.3GB 20GB 3090, 3090 Ti, 4090 32 GB
LLaMA-30B 36GB 40GB A6000 48GB, A100 40GB 64 GB
LLaMA-65B 74GB 80GB A100 80GB 128 GB

4-bit System Requirements

Model Minimum Total VRAM Card Examples RAM/Swap to Load*
LLaMA-7B 6GB GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060 6 GB
LLaMA-13B 10GB AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000 12 GB
LLaMA-30B 20GB RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100 32 GB
LLaMA-65B 40GB A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000 64 GB

*System RAM (not VRAM), is utilized to initially load a model. You can use swap space if you do not have enough RAM to support your LLM.

When in doubt, try starting with 3B or 7B models and work your way up to 13B+.

FOSAI Resources

Fediverse / FOSAI

LLM Leaderboards

LLM Search Tools


Large Language Model Hub

Download Models

oobabooga

text-generation-webui - a big community favorite gradio web UI by oobabooga designed for running almost any free open-source and large language models downloaded off of HuggingFace which can be (but not limited to) models like LLaMA, llama.cpp, GPT-J, Pythia, OPT, and many others. Its goal is to become the AUTOMATIC1111/stable-diffusion-webui of text generation. It is highly compatible with many formats.

Exllama

A standalone Python/C++/CUDA implementation of Llama for use with 4-bit GPTQ weights, designed to be fast and memory-efficient on modern GPUs.

gpt4all

Open-source assistant-style large language models that run locally on your CPU. GPT4All is an ecosystem to train and deploy powerful and customized large language models that run locally on consumer-grade processors.

TavernAI

The original branch of software SillyTavern was forked from. This chat interface offers very similar functionalities but has less cross-client compatibilities with other chat and API interfaces (compared to SillyTavern).

SillyTavern

Developer-friendly, Multi-API (KoboldAI/CPP, Horde, NovelAI, Ooba, OpenAI+proxies, Poe, WindowAI(Claude!)), Horde SD, System TTS, WorldInfo (lorebooks), customizable UI, auto-translate, and more prompt options than you'd ever want or need. Optional Extras server for more SD/TTS options + ChromaDB/Summarize. Based on a fork of TavernAI 1.2.8

Koboldcpp

A self contained distributable from Concedo that exposes llama.cpp function bindings, allowing it to be used via a simulated Kobold API endpoint. What does it mean? You get llama.cpp with a fancy UI, persistent stories, editing tools, save formats, memory, world info, author's note, characters, scenarios and everything Kobold and Kobold Lite have to offer. In a tiny package around 20 MB in size, excluding model weights.

KoboldAI-Client

This is a browser-based front-end for AI-assisted writing with multiple local & remote AI models. It offers the standard array of tools, including Memory, Author's Note, World Info, Save & Load, adjustable AI settings, formatting options, and the ability to import existing AI Dungeon adventures. You can also turn on Adventure mode and play the game like AI Dungeon Unleashed.

h2oGPT

h2oGPT is a large language model (LLM) fine-tuning framework and chatbot UI with document(s) question-answer capabilities. Documents help to ground LLMs against hallucinations by providing them context relevant to the instruction. h2oGPT is fully permissive Apache V2 open-source project for 100% private and secure use of LLMs and document embeddings for document question-answer.


Models

The Bloke

The Bloke is a developer who frequently releases quantized (GPTQ) and optimized (GGML) open-source, user-friendly versions of AI Large Language Models (LLMs).

These conversions of popular models can be configured and installed on personal (or professional) hardware, bringing bleeding-edge AI to the comfort of your home.

Support TheBloke here.


70B


30B


13B


7B


More Models


GL, HF!

Are you an LLM Developer? Looking for a shoutout or project showcase? Send me a message and I'd be more than happy to share your work and support links with the community.

If you haven't already, consider subscribing to the free open-source AI community at !fosai@lemmy.world where I will do my best to make sure you have access to free open-source artificial intelligence on the bleeding edge.

Thank you for reading!

 

cross-posted from: https://lemmy.world/post/135600

For anyone following the AI space of technology - this is pretty cool - especially since AMD has fallen behind its NVIDIA CUDA competitors.

(full article for convenience)

Hugging Face and AMD partner on accelerating state-of-the-art models for CPU and GPU platforms

Whether language models, large language models, or foundation models, transformers require significant computation for pre-training, fine-tuning, and inference. To help developers and organizations get the most performance bang for their infrastructure bucks, Hugging Face has long been working with hardware companies to leverage acceleration features present on their respective chips.

Today, we're happy to announce that AMD has officially joined our Hardware Partner Program. Our CEO Clement Delangue gave a keynote at AMD's Data Center and AI Technology Premiere in San Francisco to launch this exciting new collaboration.

AMD and Hugging Face work together to deliver state-of-the-art transformer performance on AMD CPUs and GPUs. This partnership is excellent news for the Hugging Face community at large, which will soon benefit from the latest AMD platforms for training and inference.

The selection of deep learning hardware has been limited for years, and prices and supply are growing concerns. This new partnership will do more than match the competition and help alleviate market dynamics: it should also set new cost-performance standards.

Supported hardware platforms

On the GPU side, AMD and Hugging Face will first collaborate on the enterprise-grade Instinct MI2xx and MI3xx families, then on the customer-grade Radeon Navi3x family. In initial testing, AMD recently reported that the MI250 trains BERT-Large 1.2x faster and GPT2-Large 1.4x faster than its direct competitor.

On the CPU side, the two companies will work on optimizing inference for both the client Ryzen and server EPYC CPUs. As discussed in several previous posts, CPUs can be an excellent option for transformer inference, especially with model compression techniques like quantization.

Lastly, the collaboration will include the Alveo V70 AI accelerator, which can deliver incredible performance with lower power requirements.

Supported model architectures and frameworks

We intend to support state-of-the-art transformer architectures for natural language processing, computer vision, and speech, such as BERT, DistilBERT, ROBERTA, Vision Transformer, CLIP, and Wav2Vec2. Of course, generative AI models will be available too (e.g., GPT2, GPT-NeoX, T5, OPT, LLaMA), including our own BLOOM and StarCoder models. Lastly, we will also support more traditional computer vision models, like ResNet and ResNext, and deep learning recommendation models, a first for us.

We'll do our best to test and validate these models for PyTorch, TensorFlow, and ONNX Runtime for the above platforms. Please remember that not all models may be available for training and inference for all frameworks or all hardware platforms.

The road ahead

Our initial focus will be ensuring the models most important to our community work great out of the box on AMD platforms. We will work closely with the AMD engineering team to optimize key models to deliver optimal performance thanks to the latest AMD hardware and software features. We will integrate the AMD ROCm SDK seamlessly in our open-source libraries, starting with the transformers library.

Along the way, we'll undoubtedly identify opportunities to optimize training and inference further, and we'll work closely with AMD to figure out where to best invest moving forward through this partnership. We expect this work to lead to a new Optimum library dedicated to AMD platforms to help Hugging Face users leverage them with minimal code changes, if any.

Conclusion

We're excited to work with a world-class hardware company like AMD. Open-source means the freedom to build from a wide range of software and hardware solutions. Thanks to this partnership, Hugging Face users will soon have new hardware platforms for training and inference with excellent cost-performance benefits. In the meantime, feel free to visit the AMD page on the Hugging Face hub. Stay tuned!

[–] Blaed@lemmy.world 3 points 1 year ago

FWIW, it's a new term I am trying to coin in FOSS communities (Free, Open-Source Software communities). It's a spin off of 'FOSS', but for AI.

There's literally nothing wrong with FOSS as an acronym, I just wanted to use one more focused in regards to AI tech to set the right expectations for everything shared in /c/FOSAI

I felt it was a term worth coining given the varied requirements and dependancies AI/LLMs tend to have compared to typical FOSS stacks. Making this differentiation is important in some of the semantics these conversations carry.

[–] Blaed@lemmy.world 1 points 1 year ago

Great suggestions! I've actually never interfaced with that first channel (SECourses). Looks like some solid tutorials. Definitely going to check that out. Thanks for sharing!

[–] Blaed@lemmy.world 2 points 1 year ago

Lol, you had me in the first half not gonna lie. Well done, you almost fooled me!

Glad you had some fun! gpt4all is by far the easiest to get going with imo.

I suggest trying any of the GGML models if you haven't already! They outperform almost every other model format at the moment.

If you're looking for more models, TheBloke and KoboldAI are doing a ton for the community in this regard. Eric Hartford, too. Although TheBloke is typically the one who converts these into more accessible formats for the masses.

[–] Blaed@lemmy.world 1 points 1 year ago* (last edited 1 year ago)

Thank you! I appreciate the kind words. Please consider subscribing to /c/FOSAI if you want to stay in the loop with the latest and greatest news for AI.

This stuff is developing at breakneck speeds. Very excited to see what the landscape will look like by the end of this year.

 

cross-posted from: https://lemmy.world/post/76020

Greetings Reddit Refugees!

I hope your migration is going well! If you haven't been here before, Welcome to FOSAI! Your new Lemmy landing page for all things artificial intelligence.

This is a follow-up post to my first Welcome Message

Here I will share insights and instructions on how to set up some of the tools and applications in the aforementioned AI Suite.

Please note that I did not develop any of these, but I do have each one of them working on my local PC, which I interface with regularly. I will plan to do posts exploring each software in detail - but first - let's get a better idea what we're working with. 

As always, please don't hesitate to comment or share your thoughts if I missed something (or you want to understand or see a concept in more detail). 

Getting Started with FOSAI

What is oobabooga?

How-To-Install-oobabooga

In short, oobabooga is a free and open source web client someone (oobabooga) made to interface with HuggingFace LLMs (large language models). As far as I understand, this is the current standard for many AI tinkerers and those who wish to run models locally. This client allows you to easily download, chat, and configure with text-based models that behave like Chat-GPT, however, not all models on HuggingFace are at the same level of Chat-GPT out-of-the-box. Many require 'fine-tuning' or 'training' to produce consistent, coherent results. The benefit using HuggingFace (instead of Chat-GPT) is that you have much more options to choose from regarding your AI model, including the option to choose a censored or uncensored version of a model, untrained or pre-trained, etc. Oobabooga is an interface that let's you do all this (theoretically), but can have a bit of a learning curve if you don't know anything about AI/LLMs.

What is gpt4all?

How-To-Install-gpt4all

gpt4all is the closest thing you can currently download to have a Chat-GPT style interface that is compatible with some of the latest open-source LLM models available to the community. Some models can be downloaded in quantized formats, unquantized formats, and base formats (which typically run GPU only), but there are new model formats that are emerging (GGML), which enable GPU + CPU compute. This GGML format seems to be the growing standard for consumer-grade hardware. Some prefer the user experience of gpt4all over oobabooga, and some feel the exact opposite. For me - I prefer the options oobabooga provides - so I use that as my 'daily driver' while gpt4all is a backup client I run for other tests.

What is Koboldcpp?

How-To-Install-Koboldcpp

Koboldcpp, like oobabooga and gpt4all is another web-based interface you can run to chat with LLMs locally. It enables GGML inference, which can be hard to get running on oobabooga depending on the version of your client and updates from the developer. Koboldcpp, however, is part of a totally different platform and team of developers who typically focus on the roleplaying aspect of generative AI and LLMs. Koboldcpp feels more like NovelAI than anything I've ran locally, and has similar functionality and vibes as AI Dungeon. In fact, you can download some of the same models and settings that they use to emulate something very similar (but 100% local, assuming you have capable hardware).

What is TavernAI?

How-To-Install-TavernAI

TavernAI is a customized web-client that seems as functional as gpt4all in most regards. You can use TavernAI to connect with Kobold's API - as well as insert your own Chat-GPT API key to talk with OpenAI's GPT-3 (and GPT-4 if you have API access).

What is Stable Diffusion?

How-To-Install-StableDiffusion (Automatic1111)

Stable Diffusion is a groundbreaking and popular AI model that enables text to image generation. When someone thinks of "Stable Diffusion" people tend to picture Automatic1111's UI/UX, which is the same interface oobabooga is inspired by. This UI/UX has become the defacto standard for almost all Stable Diffusion workflows. Fun factoid - it is widely believed MidJourney is a highly tuned version of a Stable Diffusion model, but one who's weights, LoRAs, and configurations made closed-source after training and alignment.

What is ControlNet?

How-To-Install-ControlNet

ControlNet is a way you can manually control models of Stable Diffusion, allowing you to have complete freedom over your generative AI workflow. The best example of what this is (and what it can do) can be seen in this video. Notice how it combines an array of tools you can use as pre-processors for your prompts, enhancing the composition of your image by giving you options to bring out any detail you wish to manifest.

What is TemporalKit?

How-To-Install-TemporalKit

This is another Stable Diffusion extension that allows you to create custom videos using generative AI. In short, it takes an input video and chops them into dozens (or hundreds) of frames that can then be batch edited with Stable Diffusion, amassing new key frames and sequences which are stitched back together with EbSynth using your new images, resulting a stylized video that was generated and edited based on your Stable Diffusion prompt/workflow.

Where to Start?

Unsure where to begin? Do you have no idea what you're doing? Or have paralysis by analysis? That's okay, we've all been there.

Start small, don't install everything at once, and instead, ask yourself what sounds like the most fun? Pick one of the tools I've mentioned above and spend as much time as you need to get it working. This work takes patience, cultivation, and motion. The first two parts of that (patience, cultivation) typically take the longest to get over.

If you end up at your wit's end installing or troubleshooting these tools - remind yourself this is bleeding edge artificial intelligence technology. It shouldn't be easy in these early phases. The good news is I have a strong feeling it will become easier than any of us could imagine over time. If you cannot get something working, consider posting your issue here with information regarding your problem.

To My Esteemed Lurkers...

If you're a lurker (like I used to be), please consider taking a popcorn break and stepping out of your shell, making a post, and asking questions. This is a safe space to share your results and interests with AI - or make a post about your epic project or goal. All progress is welcome here, all conversations about this tech are fair and waiting to be discussed.

Over the course of this next week I will continue releasing general information to catch this community up to some of its more-established counterparts.

Consider subscribing if you liked the content of this post or want to stay in the loop with Free, Open-Source Artificial Intelligence.

 

cross-posted from: https://lemmy.world/post/67758

#Hello World!

Welcome to your new Lemmy landing page for AI news, software, applications, research, and development updates regarding all FOSS (free open-source software) designed for AI (FOSAI).  The goal of this community is to be a mixture of r/LocalLLaMA, r/oobabooga, and r/StableDiffusion with the intention of spreading awareness and accessibility for all free open-source artificial intelligence applications, tools, platforms, and strategies from the bleeding edges of AI.

Those of this community should feel welcome treating this place as another casual discussion hub for concepts and topics similar to what r/Singularity and r/Futurology - used to chat about, but the sentiment I want to encourage here is one of optimism and creative empowerment, discovery and exploration.

Whether you like it or not - AI is here (and it's here to stay). What that means, what you do with it, (and why) is completely up to you to figure out within your own life and aspirations. If you want a second opinion about something, let's have a discussion about it with intellectual empathy and open-minded kindness.

Know that I don't have the time and energy for moderating 'me vs you' narratives - or pushing the doom and gloom sentiment that this emerging technology typically gets a bad rap for. I understand the risks we're facing; I leave it up to you to come to your own conclusions and live your life accordingly. At the end of the day, I'm here to help facilitate honest and open discussion, and to share the joy of technology in whatever shape or form it presents itself to me. At the end of this information transaction, I hope you retain a small piece that you find interesting and use it to brighten a life, whether yours or someone else's. 

Those of our community are focused on the silver linings of life, the excitement of our future, the hope of a brighter tomorrow. We are not afraid of anything but the failure of compassion and humanity in its smallest, most defining moments. To innovate is to think, to think is to live. Be mindful of what you say. Be mindful of what you do. Stay patient with others and cultivate a trust within yourself that never forgets to put thoughts into motion. 

I want everyone to know that I am far from being an expert in Machine Learning (and Deep Learning in general). I don't know everything, but I teach myself what I can. That being said, if you know something about AI, LLMs, python, json, data-processing, data-engineering, neural networks, machine learning, deep learning, full-stack development, front-end development, or any DevOps / IaC practices - please consider subscribing and contributing to the discussion. 

Together perhaps we can unite with other FOSS communities to usher in an era of FOSAI that evolves with the rapidly emerging ideas filling this sector. Or perhaps we remain a niche corner of AI development and discussion. Regardless - all technologists are welcome here, from engineers to data scientists, to hobbyists, and complete newbies.

I will do my best to keep everyone here in the loop with the latest and greatest news to hit the space. At the same time, you should voice your opinion if I am doing something wrong, or if you want to be heard.

The more like-minded individuals we gather here the more chances we will have to shape what AI will look like for generations to come. If you know anything about AI, LLMs, or Machine/Deep Learning in general, please consider subbing and contributing to the discussion by sharing your project and/or subject approach!

The Bleeding Edge

The applications I am about to recommend are limited to the suite of AI tools I've collected the last few months and the knowledge I've accumulated up to this point (June 2023). If you notice something missing that is worth sharing, please by all means don't hesitate to add a link or share your thoughts in the comments. I will be expanding on these tools in a subsequent post Getting Started with FOSAI / Your Lemmy Crash Course to Artificial Intelligence

My AI Suite / Workshop / Tech Stack

Generative Text

Generative Image & Video

Hosting

  • Runpod (Rent-a-Server / GPU)
  • Self-Hosting [Windows/Linux/MacOS] (GPU, CPU, RAM)
view more: next ›