slashdave

joined 11 months ago
[–] slashdave@alien.top 1 points 9 months ago

To specify a rotation, you need both the axis of rotation, and the angle. Just the angle alone is ill-defined.

[–] slashdave@alien.top 1 points 9 months ago (1 children)

A faithful subsample is a subsample of the current state. The current state cannot be established without a full scan, because you cannot assume that the data has not changed.

As to a solution, just make a subsample and save it in a separate data table. You can use that separate table for development. No reason to be skimpy, a reasonable large (100k) subset will probably be fine.

[–] slashdave@alien.top 1 points 9 months ago (3 children)

Well, a proper sample requires selecting sparsely from the entire dataset. This can be fabulously expensive, because you still have to scan all rows, depending on setup. After all, pySpark cannot generally assume that the data is not changing underneath you.

[–] slashdave@alien.top 1 points 9 months ago (6 children)

I suspect the OP means 1,000M, or 1 billion rows. Nothing else makes sense.

[–] slashdave@alien.top 1 points 9 months ago

Python enables rapid prototyping. Bindings such as numpy make CPU speed less of an issue.

[–] slashdave@alien.top 1 points 10 months ago

Statisticians use nonlinear models all the time

[–] slashdave@alien.top 1 points 10 months ago

Is it using only words?

Clearly not

[–] slashdave@alien.top 1 points 10 months ago

You need a large training set that contains the information relevant with sufficient precision and accuracy for such a prediction. Which, to put it in simple terms, is basically impossible with current technology,

[–] slashdave@alien.top 1 points 10 months ago

These models suffer from a lack of training data in specific domains. If you could provide that for your use case, it could be a significant contribution, especially in bulk.

[–] slashdave@alien.top 1 points 10 months ago

Natural selection selects the model. Training still happens during the lifetime of the individual.

[–] slashdave@alien.top 1 points 10 months ago

Nothing prevents you from loading two models from separate files in one piece of code and to combine them (in any fashion you choose) into a third model, which you can then save into a third file.

view more: next ›