this post was submitted on 16 Nov 2023
1 points (100.0% liked)

Machine Learning

1 readers
1 users here now

Community Rules:

founded 1 year ago
MODERATORS
 

Hello all,
I'm new to the data analysis and mining. I have a list of 100k entries in a CSV file having a just single column.
The values are as follows

0
1
1
1
0
0
1
1
0
1
1
1
.
..
...
1
1
0
0
Based on these data, can I predict the 100001 results? Will it be 0 or 1? If So, what is the best method for it? I'm learning Python and trying GradientBoosting, Support Vector Machines (SVM) and Basic Neural Networks. But I'm not able to achieve it.

top 8 comments
sorted by: hot top controversial new old
[–] _Packy_@alien.top 1 points 1 year ago

Is this a sort of logic? LIke 0 1 0 1 0 -> next is 1.

How would you expect that any of these 100k binaries has a predictive power for the next one? If there are sequential patterns predicting the next value, try an LSTM

[–] m98789@alien.top 1 points 1 year ago
[–] Acceptable-Mix-4534@alien.top 1 points 1 year ago

Try plotting the 1 lakh values , graph can be averaged over some window , if you see some pattern, you should be able to predict, if it’s just like a random plot, it would be hard to. But if you see the pattern, you would definitively be able to predict.

[–] StrikePrice@alien.top 1 points 1 year ago

No. Your training will not converge and you will never get better than 50/50.

[–] finite_user_names@alien.top 1 points 1 year ago

There are a lot of ways you _could_ predict this. The simplest might just be to predict the modal value, assuming that these values are IID. If there is some generating process that goes through different phases, and each observation corresponds to a timestep, you might consider fitting an HMM. Others have mentioned sliding windows, and these are fine as well if there is indeed a sequential structure to this data.

How are you trying to use gradient boosting or SVM here? Predicting each in isolation, the best you should be able to do is just predicting the modal value, since you have no further information.

[–] 71_super_beetle@alien.top 1 points 1 year ago

My boss would say to paste the series into ChatGPT.

[–] Beginning-Pool-5906@alien.top 1 points 1 year ago

From ChatGPT

To determine the appropriate predictive model for your dataset, you need to consider the nature of your problem and the type of prediction you want to make. The choice of the predictive model depends on whether you are dealing with a classification or regression problem. Here are a few common types of predictive models, and I'll discuss how to decide which one might be suitable for your scenario:

Logistic Regression:

Type of Problem: Binary classification (0 or 1).

Use Case: If you want to predict the probability that an entry is 1 or 0.

Decision Trees or Random Forests:

Type of Problem: Classification or regression.

Use Case: Decision trees can be used for both classification and regression tasks. Random Forests, which are an ensemble of decision trees, are particularly powerful for classification problems.

Support Vector Machines (SVM):

Type of Problem: Binary classification.

Use Case: If you have a relatively small dataset and want to find a hyperplane that best separates the entries labeled 0 and 1.

Neural Networks:

Type of Problem: Classification or regression.

Use Case: For complex relationships in the data. Deep learning models, such as neural networks, can capture intricate patterns, but they might be overkill for smaller datasets.

K-Nearest Neighbors (KNN):

Type of Problem: Classification or regression.

Use Case: If you want to make predictions based on the similarity of entries.

Time Series Models:

Use Case: If the order of entries matters, and you want to predict the next entry in a sequence.

Here are the steps you can take to choose a model:

Define the Problem:

Is it a classification or regression problem?

Understand the Data:

Explore the characteristics of your dataset. Consider statistical measures, visualize the distribution of values, and check for any patterns.

Choose a Model:

Based on the nature of your problem and data characteristics, select a suitable predictive model.

Split the Data:

Divide your dataset into training and testing sets to evaluate the model's performance.

Train and Evaluate:

Train the chosen model on the training set and evaluate its performance on the testing set.

Adjust and Iterate:

Depending on the results, you may need to adjust hyperparameters, try different models, or preprocess the data differently.

Given your dataset with 0s and 1s, and assuming you want to predict the next entry (which suggests a sequential or time-related aspect), you might also consider time series models or sequence-based models.

[–] PM_ME_YOUR_BAYES@alien.top 1 points 11 months ago

Markov chains or HMM