Highly scored unanswered questions

10 votes

0 answers

2k views

Difference between Shapley values and SHAP

The Paper regarding die SHAP value gives a formula for the Shapley Values in (4) and for SHAP values apparently (?) in (8) Still I dont really understand the difference between Shapley and SHAP ...

Quastiat

233

asked Oct 2, 2019 at 15:31

9 votes

0 answers

353 views

What's up with Neural Stochastic Differential Equations from a practical standpoint?

I've spent a few days reading some of the new papers about Neural SDEs. For example, here is one from Tzen and Raginsky and here is one that came out simultaneously by Peluchetti and Favaro. There are ...

jeffery_the_wind

447

modified Mar 15, 2021 at 14:21

9 votes

1 answer

179 views

Is there a ML or DL tool that can learn to detect periodically occurring patterns in a one dimensional time series?

I am trying to create a tool that labels refrigerator temperature readings. A reading is taken every 5 minutes, and its label identifies whether of not it was taken while the refrigerator was ...

EngrStudent

10k

answered Jan 4, 2021 at 18:34

9 votes

0 answers

7k views

Advantage of RMSProp over Adam?

I've learned from DL classes that Adam should be the default choice for neural network training. However, I've recently seen more and more recent reinforcement learning agents use RMSProp instead of ...

Maybe

1,105

asked Nov 12, 2019 at 14:52

8 votes

0 answers

3k views

In the attention mechanism, why are there separate weight matrices for the queries and keys?

To perform self attention over a collection of $n$ vectors stacked up into a matrix $X \in \mathbb{R}^{n \times d}$, we first obtain query, key, and value representations of these vectors via three ...

tddevlin

3,407

asked Dec 2, 2020 at 6:40

8 votes

0 answers

331 views

What machine learning and deep learning models are used for longitudinal studies (panel data)?

As the title suggests, I have a longitudinal database (also called panel data). (I have over 100.000 observations. The time period is X years. This means that for every year I have the values of the ...

kjetil b halvorsen♦

84k

modified Apr 24, 2019 at 21:27

8 votes

0 answers

2k views

Multi-target Regression Neural Network: Trade Off

Suppose you have a number of input features, for example: x1 - temperature x2 - day of the week x3 - quantity of rainfall ... You are trying to predict a number of output targets - using neural ...

user249740

1

modified Jun 2, 2019 at 15:38

8 votes

0 answers

1k views

Compatible Function Approximation Theorem in Reinforcement Learning

In the Compatible Function Approximation Theorem, the following condition is required to make the policy gradient to be exact $\nabla J(\theta) = \mathbb{E}_{\pi_{\theta}}\left [\nabla_{\theta}log\pi_{...

Jiang Xiang

452

asked Jul 12, 2017 at 21:27

8 votes

0 answers

2k views

ReLU derivative - second order effects

I am reading the Deep Learning Book, where there is a section on generalisations of the ReLU (section 6.3.1). It states: The second derivative of the rectifying operation is 0 almost everywhere, ...

n1k31t4

641

asked Feb 18, 2017 at 15:35

7 votes

1 answer

2k views

Overfitting a neural network to a single batch as a sanity check - how small a loss value is small enough and long to run for?

I'm currently developing a neural network for a regression task. Following on from the advice given in places like here, here, and here I'm attempting to overfit my model to a single batch of 5 ...

CommunityBot

1

modified Mar 3 at 13:08

7 votes

0 answers

3k views

Does Attention Help with standard auto-encoders

I understand the use of attention mechanisms in the encoder-decoder for sequence-to-sequence problem such as a language translator. I am just trying to figure out whether it is possible to use ...

SathukaBootham

1

modified Jul 14, 2023 at 8:30

7 votes

0 answers

745 views

Understanding Object2Vec

AWS released an interesting feature as part of the SageMaker service called Object2Vec that lets you make an embedding for search out of pretty much anything: documents, users, products, ...

Ryan Zotti

6,797

asked Jan 29, 2020 at 18:21

7 votes

0 answers

114 views

Choosing the number of hidden layers and nodes in a Deep Belief Network

What are the recent advances and current best practices in choosing the number and size of stacked Restricted Boltzmann Machines in Deep Belief Networks ?

kjetil b halvorsen♦

84k

modified Dec 21, 2019 at 0:09

7 votes

0 answers

358 views

Would gradient boosting machines benefit from adaptive learning rates?

In deep learning, a big deal is made about optimizing an adaptive learning rate. There are numerous popular adaptive learning rate algorithms. The hyperparameters for all of the leading gradient ...

zkurtz

2,170

modified Dec 30, 2021 at 0:26

7 votes

0 answers

8k views

Change image input size of a pre-trained convnet

maybe this question will sound a bit as a newbie one but I'd like to have some clarification. I'm using a VGG16-like convnet, pre-trained with VGG16 weights and edited top layers to work with my ...

matteodv

171

asked Nov 4, 2017 at 10:50

Stack Exchange Network

Unanswered Questions

Difference between Shapley values and SHAP

What's up with Neural Stochastic Differential Equations from a practical standpoint?

Is there a ML or DL tool that can learn to detect periodically occurring patterns in a one dimensional time series?

Advantage of RMSProp over Adam?

In the attention mechanism, why are there separate weight matrices for the queries and keys?

What machine learning and deep learning models are used for longitudinal studies (panel data)?

Multi-target Regression Neural Network: Trade Off

Compatible Function Approximation Theorem in Reinforcement Learning

ReLU derivative - second order effects

Overfitting a neural network to a single batch as a sanity check - how small a loss value is small enough and long to run for?

Does Attention Help with standard auto-encoders

Understanding Object2Vec

Choosing the number of hidden layers and nodes in a Deep Belief Network

Would gradient boosting machines benefit from adaptive learning rates?

Change image input size of a pre-trained convnet

Unanswered Questions

Unanswered Tags