Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 1 year ago

MODERATORS

communick@academy.garden

[D] Arbitrary Channel count in network needs to be reduced to 1 channel (alien.top)

submitted 1 year ago by Powerful-Cow7564@alien.top to c/machinelearning@academy.garden

3 comments fedilink hide all child comments

Hi folks!

Been struggling with this problem for a while so I figured I'd solicit suggestions here:

I have created a model architecture similar to AlphaFold2 where the input is very heterogeneous in nature and each input type has a series of transformations before becoming one data "stack" (e.g. 5x1000 tensor) that gets passed through a shallow resnet for the classification task.

The largest structural issue that I'm facing is that one of the input nodes could be anywhere from 1 channel (e.g. shape 1x1000) to 8 channels (e.g. shape 8x1000) at any point in the dataloader. This is largely fine until I need to eventually encode that structure into a single-channel embedding to put it on the pre-resnet data stack.

The things that I've looked at so far:

I could just average them all into one channel (problem: the order of those channels matters quite a bit and it feels like the data lost there would be immense). I could create like 8 different subpaths in the model (problem: not enough training data for correctly training most of the subpaths - 1 channel path would be more heavily trained than the 8 channel path). Do PCA on the transposed vector with n_components=1 and re-transpose the vector (problem: just feels dumb - not sure if that's a legitimate thought).

Any other suggestions? Or are there common practices here that I'm just unaware of?

you are viewing a single comment's thread
view the rest of the comments

[–] pm_me_your_pay_slips@alien.top 1 points 1 year ago

Use a transformer layer for aggregation if you want a learnable way of pooling them. Positional encoding and masking should help you with ensuring that order influences the prediction.