Machine Learning

1 readers

1 users here now

Community Rules:

Be nice. No offensive behavior, insults or attacks: we encourage a diverse community in which members feel safe and have a voice.
Make your post clear and comprehensive: posts that lack insight or effort will be removed. (ex: questions which are easily googled)
Beginner or career related questions go elsewhere. This community is focused in discussion of research and new projects that advance the state-of-the-art.
Limit self-promotion. Comments and posts should be first and foremost about topics of interest to ML observers and practitioners. Limited self-promotion is tolerated, but the sub is not here as merely a source for free advertisement. Such posts will be removed at the discretion of the mods.

founded 2 years ago

MODERATORS

communick@academy.garden

What type of predictive model should I use for 100k entries to predict next? (alien.top)

submitted 2 years ago by alenathomasfc@alien.top to c/machinelearning@academy.garden

8 comments fedilink hide all child comments

Hello all,
I'm new to the data analysis and mining. I have a list of 100k entries in a CSV file having a just single column.
The values are as follows

0
1
1
1
0
0
1
1
0
1
1
1
.
..
...
1
1
0
0
Based on these data, can I predict the 100001 results? Will it be 0 or 1? If So, what is the best method for it? I'm learning Python and trying GradientBoosting, Support Vector Machines (SVM) and Basic Neural Networks. But I'm not able to achieve it.

you are viewing a single comment's thread
view the rest of the comments

[–] Beginning-Pool-5906@alien.top 1 points 2 years ago

From ChatGPT

To determine the appropriate predictive model for your dataset, you need to consider the nature of your problem and the type of prediction you want to make. The choice of the predictive model depends on whether you are dealing with a classification or regression problem. Here are a few common types of predictive models, and I'll discuss how to decide which one might be suitable for your scenario:

Logistic Regression:

Type of Problem: Binary classification (0 or 1).

Use Case: If you want to predict the probability that an entry is 1 or 0.

Decision Trees or Random Forests:

Type of Problem: Classification or regression.

Use Case: Decision trees can be used for both classification and regression tasks. Random Forests, which are an ensemble of decision trees, are particularly powerful for classification problems.

Support Vector Machines (SVM):

Type of Problem: Binary classification.

Use Case: If you have a relatively small dataset and want to find a hyperplane that best separates the entries labeled 0 and 1.

Neural Networks:

Type of Problem: Classification or regression.

Use Case: For complex relationships in the data. Deep learning models, such as neural networks, can capture intricate patterns, but they might be overkill for smaller datasets.

K-Nearest Neighbors (KNN):

Type of Problem: Classification or regression.

Use Case: If you want to make predictions based on the similarity of entries.

Time Series Models:

Use Case: If the order of entries matters, and you want to predict the next entry in a sequence.

Here are the steps you can take to choose a model:

Define the Problem:

Is it a classification or regression problem?

Understand the Data:

Explore the characteristics of your dataset. Consider statistical measures, visualize the distribution of values, and check for any patterns.

Choose a Model:

Based on the nature of your problem and data characteristics, select a suitable predictive model.

Split the Data:

Divide your dataset into training and testing sets to evaluate the model's performance.

Train and Evaluate:

Train the chosen model on the training set and evaluate its performance on the testing set.

Adjust and Iterate:

Depending on the results, you may need to adjust hyperparameters, try different models, or preprocess the data differently.

Given your dataset with 0s and 1s, and assuming you want to predict the next entry (which suggests a sequential or time-related aspect), you might also consider time series models or sequence-based models.