this post was submitted on 27 Nov 2023

1 points (100.0% liked)

Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[D] Do you obsessively watch your models train? (alien.top)

submitted 2 years ago by TehDing@alien.top to c/machinelearning@academy.garden

60 comments fedilink hide all child comments

I find myself watching tensorboard more than working- just wondering if others who have fallen into this pattern have words of advice wrt productivity

top 50 comments

sorted by: hot top controversial new old

[–] LawfulnessOdd5872@alien.top 1 points 2 years ago

Me right now

[–] 3DHydroPrints@alien.top 1 points 2 years ago

Yes

[–] timo_kk@alien.top 1 points 2 years ago (7 children)

I mark the expected duration of my experiments both in a google calendar as well as a project journal. That way, I know when it's "time" to check in on the runs.

I also use Weights & Biases, which sends me a mail if something crashes so I can check up when I really need to.

Curve watching is just a waste of time, you should train yourself to get out of the habit even if it's difficult for you.

[–] deepneuralnetwork@alien.top 1 points 2 years ago

Oooh, the calendar idea is a great tip. I’m going to borrow this from you.

[–] maleits_gavatxos@alien.top 1 points 2 years ago (2 children)

If people wrote documentation instead of watching the training progress chart get updated: *flying cars*

[–] LtFr0st@alien.top 1 points 2 years ago

lol

[–] Weird-Field6128@alien.top 1 points 2 years ago

I am stealing this, sorry

[–] LoyalSol@alien.top 1 points 2 years ago (1 children)

Curve watching in the initial steps of training is important, but once you get stable behavior time to go to the kitchen, grab some coffee, and do something else with the rest of the day.

[–] lumin0va@alien.top 1 points 2 years ago

No it isn’t. Has anyone here ran more than one experiment at a time? Clearly not because you can’t curve watch hundreds of runs at the same time. Anything that can be achieved by curve watching can be easily automated.

[–] Zemeniite@alien.top 1 points 2 years ago

Instead of relying on Weights & Biases sending me an email I’ve implemented alerts being sent to either a Slack or a Discord channel. They are always sent on error with the error message. I also receive a message after a predetermined interval of training with metrics.

Just another idea if someone doesn’t want to rely on W&B

[–] 0ctobogs@alien.top 1 points 2 years ago (1 children)

I use pushover.net to send my phone a push notification when something noteworthy happens. Just drop a little function in my trainer loop. Extremely useful and you can programmatically set the notification message so it'll tell me what my numbers are when I'm at lunch or whatever.

load more comments (1 replies)

load more comments (2 replies)

[–] SnooHesitations8849@alien.top 1 points 2 years ago

Yes. Only when I need debugging. Otherwise, checking after a few hours is not too bad. Sometimes I know the code is correct, I just launch it and forget about it. Enjoying a few hours of doing nothing is better for your mental health than staring at the monitor gaining nothing.

[–] Apathiq@alien.top 1 points 2 years ago (1 children)

- "Checks how his models are training..."
- "Opens Reddit..."
- "Stumbles upon the question 'Do you obsessively watch your models train?'..."

[–] swarmed100@alien.top 1 points 2 years ago

seriously it's literally on my other monitor right now

As a kid I watched the Kazaa/Limewire/torrent downloads obsessively, now it's model backtesting

[–] deepneuralnetwork@alien.top 1 points 2 years ago

It’s pretty unhealthy but yes I do that. A model not improving or eventually converging can literally put me in a bad mood.

I’m trying to break myself of the habit, honestly.

[–] Material_Policy6327@alien.top 1 points 2 years ago

Only to make sure no errors get thrown. Otherwise I let them run and I do something else

[–] Jack_Torcello@alien.top 1 points 2 years ago

Some people intensively watch their model trains choo choo!!!

[–] mr_birkenblatt@alien.top 1 points 2 years ago (2 children)

Start training models
Go watch model trains
Come back to error

[–] swarmed100@alien.top 1 points 2 years ago

Watch model train successfully for hours or even days

it will get done any moment now

some trivial change you made crashes the postprocessing

now the file with your results is corrupted

[–] kgmeister@alien.top 1 points 2 years ago

What about Thomas the tank engine

[–] keepthepace@alien.top 1 points 2 years ago

Oh yes

[–] matigekunst@alien.top 1 points 2 years ago (3 children)

I train when my models train. It's a form of regularisation

[–] xignaceh@alien.top 1 points 2 years ago

L3 regularisation

[–] swarmed100@alien.top 1 points 2 years ago

Not sure why, but this form of regularisation always leads to an underfit model for me :(

[–] muntoo@alien.top 1 points 2 years ago

Gradient Descent by Grad Student (GDGS) v2.

[–] jucestain@alien.top 1 points 2 years ago

Everyone does it, at least at first during a new project.

[–] Witty-Elk2052@alien.top 1 points 2 years ago

yes lol

I imagine it to be identical to the life of a day trader save for the desired direction of the curve

[–] KyxeMusic@alien.top 1 points 2 years ago

Some days yes, others I'm so busy I forget and realize the next day

[–] ProgramPrimary2861@alien.top 1 points 2 years ago

Yep. I do that with plant too. You should definitely consider getting some plants too 😅

[–] the_warpaul@alien.top 1 points 2 years ago

So much wasted time during phd... 😂

Besides the rule is: if the models training, youre being productive, go to the pub.

[–] AGINSB@alien.top 1 points 2 years ago

Only when messing around with deepracer

[–] rwl4z@alien.top 1 points 2 years ago

I do LoRA training for most of my stuff as of late, so most of my experiments are in minutes, or maybe sometimes hours, not days. So yeah, I tend to leave the terminal visible. I've been experimenting with narrowing the LoRA alpha lately with promising results, so I'm even more glued to the eval/loss vs train/loss so I don't waste hours on a doomed experiment.

It kind of feels like I'm baking bread or something, with a childlike excitement when it's ready.

[–] neanderthal_math@alien.top 1 points 2 years ago

When I was young, I used to.

Also, I remember having this feeling that if I didn’t have a model training over the weekend, that I was forgetting something. : )

[–] Tomsen1410@alien.top 1 points 2 years ago

Yes

[–] adventuringraw@alien.top 1 points 2 years ago

I read this 'do you obsessively watch your model train' and I was thinking for a sec this was a weird ADHD hobby post or something, meant for people who build model train dioramas and like to sit and watch them more so than add to them. (Not that I'm subscribed to any subreddit like that...).

Probably thought that because I was just thinking about a timer circuit I'm working on in Factorio for my train logistics system, and... I do like seeing the trains run around, when they're not running my inattentive self over at least.

Anyway. Yes, sometimes it's tempting to watch the machine move, whatever it is you're engineering. But that's part of the challenge of being an engineer I guess. Back to building (for my work, haha. The factory must grow, but only after hours).

[–] PrimaCora@alien.top 1 points 2 years ago

For stable diffusion, not as much. Put the settings right and the training is done in 5 minutes. See the result, alter the settings and go again. Those settings are max possible batch size, previews off and saving checkpoint to off. Otherwise training takes 3 times longer or thrashes an SSD if used.

For voice cloning, extensively. As soon as the loss changes or the loss updates stop it has to be killed. Worse is for newer ones like Style TTS, they have a constant VRAM usage up until a random point where it grows infinitely.

[–] dotpoint7@alien.top 1 points 2 years ago

Luckily not anymore. At the start of my current project I checked the progess quite often, now after a few months of development I'll just let it train for a few days and then check back in.

[–] anything_but@alien.top 1 points 2 years ago

I am usually busy alternating between checking Reddit karma metrics and training metrics.

[–] PicaPaoDiablo@alien.top 1 points 2 years ago

Very guilty pleasure that I know is silly but yah, obsessively.

[–] PeteyMax@alien.top 1 points 2 years ago

Only if there's a bug in the code that needs fixing!

[–] TheInfelicitousDandy@alien.top 1 points 2 years ago

Like a cat watching a laser.

[–] PMMEYOURSMIL3@alien.top 1 points 2 years ago

I wrote a telegram bot that sends me the loss after each epoch so I can stay informed when I'm not home, lol

[–] Unlikely-Loan-4175@alien.top 1 points 2 years ago

No, but I do obsessively watch my model trains.

[–] AllowFreeSpeech@alien.top 1 points 2 years ago

Oh I also obsessively watch my model predict (whenever I can), but not on the primary screen.

[–] Dar7oo@alien.top 1 points 2 years ago

You guys are so cool, I wanna be just like you. Hopefully I'll make it!

[–] pure_whey@alien.top 1 points 2 years ago

What can I say, I like graphs AND progress bars, so if you mix the two ...

[–] the__storm@alien.top 1 points 2 years ago

Yes, because one time (years ago) our rickety scaling system went off the rails and spent like $10k on idle EC2 instances before our equally rickety monitoring caught it.

[–] random_Byzantium@alien.top 1 points 2 years ago

Now I understand the importance of subreddits with this question. The same question can be used in various use cases.

[–] Lalalyly@alien.top 1 points 2 years ago

I can’t help myself. I remote in and visit my tensorboard way too much.

[–] crazymonezyy@alien.top 1 points 2 years ago

If the training time is less than an hour, yes.

load more comments