this post was submitted on 14 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

https://higgsfield.ai
We have a massive GPU cluster and developed our own infrastructure to manage the cluster and train massive models.

There's how it works:

  1. You upload the dataset with preconfigured format into HuggingFaсe [1].
  2. Choose your LLM (e.g. LLaMa 70B, Mistral 7B)
  3. Place your submission into the queue
  4. Wait for it to get trained.
  5. Then you get your trained model there on HuggingFace.

Essentially, why would we want to do it?

  1. We already have an experience with training big LLMs.
  2. We could achieve near-perfect infrastructure performance for training.
  3. Sometimes GPUs have just nothing to train.

Thus we thought it would be cool if we could utilize our GPU cluster 100%. And give back to Open Source community (already built an e2e distributed training framework [2]).

This is in an early stage, so you can expect some bugs.

Any thoughts, opinions, or ideas are quite welcome!

[1]: https://github.com/higgsfield-ai/higgsfield/blob/main/tutori...

[2]: https://github.com/higgsfield-ai/higgsfield

top 14 comments
sorted by: hot top controversial new old
[–] 0zyman23@alien.top 1 points 10 months ago

Wow, you guys are the best, could you also add estimated time for my run to start, thinking if i ll get something in meaningful time, but the mere fact things like this exist is great

[–] 0zyman23@alien.top 1 points 10 months ago

Giving their gpu for free - this is some iq 200 stuff

[–] ginger_turmeric@alien.top 1 points 10 months ago (1 children)

Do you allow training of other sorts of models? I want to train a TTS model.

[–] higgsfield_ai@alien.top 1 points 10 months ago

We support only large models (starting from 7B).

[–] badabummbadabing@alien.top 1 points 10 months ago (1 children)

By 'training', I assume you mean fine-tuning or LoRA?

[–] higgsfield_ai@alien.top 1 points 10 months ago (1 children)

We only do full fine-tune.

[–] light24bulbs@alien.top 1 points 10 months ago (2 children)

Are you having good luck with adding knowledge to the model? I tried this with llama for a couple weeks when things were just getting going and I just could not find good hyperparameters for fine tuning. I was also doing Lora so...idk.

[–] Thistleknot@alien.top 1 points 10 months ago
[–] higgsfield_ai@alien.top 1 points 10 months ago

From our experience, to get a very good results you need

  1. High quality dataset. It's worth to spend more time on data cleaning. It's way better to have a smaller dataset with high quality points than a huge dataset with garbage.

  2. You need to fully finetune it.

[–] yashdes@alien.top 1 points 10 months ago (1 children)

Don't leave us hanging, what does the cluster look like? (ignore if you're not allowed to share, but I'm a gigantic hardware nerd)

[–] 0zyman23@alien.top 1 points 10 months ago

In terms of their capacity nothing crazy, Its probably a standard H100 or A100 cluster, 32 or 64 gpus

[–] MrEloi@alien.top 1 points 10 months ago

Why are you hiding who you are, and how many GPUs you have ... and if you have legal access to them?

[–] kalakau@alien.top 1 points 10 months ago

What's with the tendency for software engineers to name their libraries after fundamental physics? As a physicist this always bothered me. I'll search for numerical algorithms for doing real physics.. and end up with some garbage blockchain app or a Rust crate that does nothing

[–] TotesMessenger@alien.top 1 points 10 months ago

I'm a bot, bleep, bloop. Someone has linked to this thread from another place on reddit:

 ^(If you follow any of the above links, please respect the rules of reddit and don't vote in the other threads.) ^(Info ^/ ^Contact)