Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

[P] Model training bottlenecked by CPU. (alien.top)

submitted 2 years ago by AdSignificant9235@alien.top to c/machinelearning@academy.garden

8 comments fedilink hide all child comments

Recently, I've been working on some projects for fun, trying out some things I hadn't worked with before, such as profiling.

But after profiling my code, I found out that my average GPU activity is around 50%. Apparently, the code frequently hangs for a few hundred milliseconds on the dataloader process. I've tried a few things in the dataloader: increasing/decreasing the number of workers, setting pin-memory to true or false, but neither seems to really matter. I have an NVME drive, so the disk is not the problem either. I've concluded that the bottleneck must be the CPU.

Now, I've read that pre-processing the data might help, so that the dataloader doesn't have to decode the images, for example, but I don't really know how to go about this. I have around 2TB of NVME storage, and I've got a couple datasets on the disk (ImageNet and INaturalist are the two biggest ones), so I don't suppose I'll be able to store them on the disk uncompressed.

Is there anything I can do to lighten the load on the CPU during training so that I can take advantage of the 50% of the GPU that I'm not using at the moment?

you are viewing a single comment's thread
view the rest of the comments

[–] btcmx@alien.top 1 points 2 years ago

As other have rightly pointed out, verify you're using the Data Loader the right way. Ideally you need to create a custom dataset (in PyTorch terms) and apply all the transformations in this custom dataset. This might be helpful. Also, have you tried PyTorch Lightning?