reverendCappuccino

joined 1 year ago
[–] reverendCappuccino@alien.top 1 points 11 months ago (4 children)

Meanwhile people discuss how Google wasn't capable of striking back at OpenAI with a good conversational agent, "thus loosing its status as ML behemoth". It's interesting how LLMs bring out accelerationist and xrisk debates of science fiction/fabrication, and at best debates on economy, while research on materials and climate science warms few minds and arts (at least looks so on X/Mastodon/Reddit)

[–] reverendCappuccino@alien.top 1 points 11 months ago

Well, it's more like a psychological term, and attention is already there to illustrate the intended meaning of a dot product. The analogy holds up, so why doubting the validity of using system 2 attention rather than that of using attention at all?

[–] reverendCappuccino@alien.top 1 points 11 months ago (3 children)

IMHO your time windows are too long for the models to learn. You might try shortening them a lot, and then labeling all those within N minutes from the event as positive, see what happens. Depends also on what the sampling rate is, if your are feedimg raw ECG to models.

As for the clusters, yes, you could try hierarchical clustering with time warping distances, but it will take lots of time.

Simple question: how do medical experts recognize the upcoming event? Can they do it based on several minutes of ECG several minutes before the event?

[–] reverendCappuccino@alien.top 1 points 11 months ago

I don't think there are books specifically focused on that, and probably there's no need for it. Nonetheless, there's much information scattered throughout papers, but the fundamental concepts to keep in mind are not that many, imho. ReLU is piecewise linear, and the pieces are the two halves of its domain. In one half it is just zero, in the other ReLU(x)=x, so it is very easy and fast to compute. It is enough to make it nonlinear, hence allow powerful expressivity and make a neural network potentially a universal approximator. Many or most activations are nil and that sparsity is useful when it's not always the same set of unit having zero output. The drawbacks are related to the same characteristics: units may die (always output zero, never learning by backprop), there's a point (0) where the derivative is undefined even if the function is continuous, and there's no way to differentiate small and large negative values since they all result in a 0.