Unanswered Questions
327 questions with no upvoted or accepted answers
6
votes
0
answers
66
views
What are some popular but outdated or ineffective practices in data science?
I was taught stepwise feature selection (like forward and backward selection) during college, and at the time, it seemed like a really effective way to pick features. But recently i have been reading ...
6
votes
1
answer
218
views
Predicting change of shapes/coordinates
I'm trying to find a way to predict/calculate how a shape (e.g. outline of a glacier) will change in the future—based on its history (previous shape) and additional factors (e.g. Δtemperature).
In my ...
5
votes
2
answers
4k
views
Fix first two levels of decision tree?
I am trying to build a regression tree with 70 attributes where the business team wants to fix the first two levels namely country and product type. To achieve this, I have two proposals:
Build a ...
4
votes
2
answers
279
views
Support Vector Regression trained with data sets
I am now searching for a long time on the internet and on papers for an answers of simple questions. Am I able to train a Support Vector Regression algorithm with different data sets? If yes, how is ...
3
votes
0
answers
31
views
suppose 1 category in a variable create data leakage, can we use other categories in the same variable as dummy to predict?
We are predicting conversion. Conversion means customer converted from paying one-off to paying regular (subscribe)
If one feature is categorical feature "Activity" , consisting 15+ ...
3
votes
0
answers
59
views
How can I link tasks using machine learning / ai based on historical task sequences?
I'm working on an AI model to predict dependency links between tasks for industrial plannifications, based on historical project data. I have two tables:
Task Table (15 sheets, one sheet = one ...
3
votes
0
answers
29
views
History that lead to the word "predict" being used for the application of a model on data
Background
The framework scikit-learn uses "predict" for the application of model on (new) input data and I have seen many people use that term. In the scientific papers that I have read (...
3
votes
1
answer
615
views
Neural Network regression negative performance
I have a problem with the performance of a multi layer perceptron regressor (neural network) and I cannot figure out why.
Task: I am trying to improve a time series prediction. I have predictions of a ...
3
votes
3
answers
305
views
How to decide who to market? Clustering or Decision Tree?
I am working with a dataset that has enough observations and ~ 10 variables,
half of the variables are numeric
another half of the variables are categorical with 2-3 levels (demographics)
one ID ...
3
votes
1
answer
125
views
How to incorporate the uncertainty of the model coefficients in the prediction interval of a multiple linear regression
I'm dealing with modeling small experimental data sets. As most experimental work does not generate thousands of samples, but rather a handful, I need to be inventive about how to deal with this small ...
3
votes
0
answers
46
views
Serializing a trained classification model into a set of actionable insights
I'm looking for ways to convert a trained classification model into a list of insights based on the resulting parameters of the model.
To make an example, let's assume we trained a decision tree to ...
3
votes
1
answer
127
views
How can I improve the accuracy of my model? (Cab Cancellation Prediction)
I am trying to predict based on several parameters like trip type, car type, source of booking, start time, lead time (start- book) and a few other params whether or not a customer will cancel. From ...
3
votes
0
answers
58
views
Improving a simple trig model
I have some data which I know is well approximated as a trig function, and I can fit it with scipy.optimize.curve_fit as follows:
...
3
votes
1
answer
281
views
Model Guardrails
Suppose I am building a machine learning model for an application where I do not need to make a prediction on all new samples, and given a new sample, it is better to make no prediction at all when ...
3
votes
0
answers
56
views
Is linear regression on the trees of XGBoost (rather than taking their mean) useful/popular?
Given training data $(\underline{x}_1, y_1),...,(\underline{x_N}, y_N)$, one can choose a variety of ensemble method for trees. These algorithms output a set of trees $T_1, ..., T_n$, and then the ...