Unanswered Questions
2,918 questions with no upvoted or accepted answers
10
votes
0
answers
2k
views
Difference between Shapley values and SHAP
The Paper regarding die SHAP value gives a formula for the Shapley Values in (4) and for SHAP values apparently (?) in (8)
Still I dont really understand the difference between Shapley and SHAP ...
9
votes
0
answers
353
views
What's up with Neural Stochastic Differential Equations from a practical standpoint?
I've spent a few days reading some of the new papers about Neural SDEs. For example, here is one from Tzen and Raginsky and here is one that came out simultaneously by Peluchetti and Favaro. There are ...
9
votes
1
answer
179
views
Is there a ML or DL tool that can learn to detect periodically occurring patterns in a one dimensional time series?
I am trying to create a tool that labels refrigerator temperature readings. A reading is taken every 5 minutes, and its label identifies whether of not it was taken while the refrigerator was ...
9
votes
0
answers
7k
views
Advantage of RMSProp over Adam?
I've learned from DL classes that Adam should be the default choice for neural network training. However, I've recently seen more and more recent reinforcement learning agents use RMSProp instead of ...
8
votes
0
answers
3k
views
In the attention mechanism, why are there separate weight matrices for the queries and keys?
To perform self attention over a collection of $n$ vectors stacked up into a matrix $X \in \mathbb{R}^{n \times d}$, we first obtain query, key, and value representations of these vectors via three ...
8
votes
0
answers
331
views
What machine learning and deep learning models are used for longitudinal studies (panel data)?
As the title suggests, I have a longitudinal database (also called panel data). (I have over 100.000 observations. The time period is X years. This means that for every year I have the values of the ...
8
votes
0
answers
2k
views
Multi-target Regression Neural Network: Trade Off
Suppose you have a number of input features, for example:
x1 - temperature
x2 - day of the week
x3 - quantity of rainfall
...
You are trying to predict a number of output targets - using neural ...
8
votes
0
answers
1k
views
Compatible Function Approximation Theorem in Reinforcement Learning
In the Compatible Function Approximation Theorem, the following condition is required to make the policy gradient to be exact $\nabla J(\theta) = \mathbb{E}_{\pi_{\theta}}\left [\nabla_{\theta}log\pi_{...
8
votes
0
answers
2k
views
ReLU derivative - second order effects
I am reading the Deep Learning Book, where there is a section on generalisations of the ReLU (section 6.3.1).
It states:
The second derivative of the rectifying operation is 0 almost everywhere, ...
7
votes
1
answer
2k
views
Overfitting a neural network to a single batch as a sanity check - how small a loss value is small enough and long to run for?
I'm currently developing a neural network for a regression task. Following on from the advice given in places like here, here, and here I'm attempting to overfit my model to a single batch of 5 ...
7
votes
0
answers
3k
views
Does Attention Help with standard auto-encoders
I understand the use of attention mechanisms in the encoder-decoder for sequence-to-sequence problem such as a language translator.
I am just trying to figure out whether it is possible to use ...
7
votes
0
answers
745
views
Understanding Object2Vec
AWS released an interesting feature as part of the SageMaker service called Object2Vec that lets you make an embedding for search out of pretty much anything: documents, users, products, ...
7
votes
0
answers
114
views
Choosing the number of hidden layers and nodes in a Deep Belief Network
What are the recent advances and current best practices in choosing the number and size of stacked Restricted Boltzmann Machines in Deep Belief Networks ?
7
votes
0
answers
358
views
Would gradient boosting machines benefit from adaptive learning rates?
In deep learning, a big deal is made about optimizing an adaptive learning rate. There are numerous popular adaptive learning rate algorithms.
The hyperparameters for all of the leading gradient ...
7
votes
0
answers
8k
views
Change image input size of a pre-trained convnet
maybe this question will sound a bit as a newbie one but I'd like to have some clarification.
I'm using a VGG16-like convnet, pre-trained with VGG16 weights and edited top layers to work with my ...