this post was submitted on 24 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 11 months ago
MODERATORS
 

I have a dataset of two column values something like the one shown below. I need to predict the values of y for values of x greater than 60. The curve must follow the increasing trend it is shown till x=60.

I have tried polynomial regression and SVR but it declines for values greater than 60. I have tried to fit the curve y = alnx + b to this curve but the R2score is 0.94. What model can I train for this purpose, or how can I improve the R2score but regressing over an appropriate logarithmic function?

https://preview.redd.it/f9oxc20zga2c1.png?width=1208&format=png&auto=webp&s=b7918c9d9dd2bb930a2e903483d5a230f2dcfce5

you are viewing a single comment's thread
view the rest of the comments
[–] JPyoris@alien.top 1 points 10 months ago

Not a direct answer, but be aware that overfitting will be a thing here too. You might get an R2 of 0.99+ but the extrapolation could be horrendous (for example, using a high-degree polynomial, you already saw that). 0.94 with only two parameters does not sound too bad for me.

Maximizing R2 and eyeballing the extrapolation is not really a valid approach. You should use a goodness of fit test that includes model complexity. You could also implement a simple validation by leaving out the last x% of your data when fitting and then look at the test error.

I also have to agree that it looks somewhat piecewise. Without knowing the generating process the correct continuation could be anything.