Highly scored unanswered questions

17 votes

0 answers

2k views

Rademacher complexity of logistic regression

Consider logistic regression. We have the logistic loss function, $\phi: R\rightarrow [0,1], \phi(u)=\log(1+\exp(-u))$, which is Lipschitz, and we have the linear function class $F=\{f_w:R^d \...

axk

828

modified Nov 10, 2014 at 23:05

14 votes

0 answers

700 views

Convolutional neural network for multi-variate time series?

I want to use CNN architectures for classification of multivariate time-series, where we apply one label to each sequence. I searched the net for the available designs in the literature and i found ...

Bob

439

modified Jun 15, 2018 at 10:16

13 votes

0 answers

272 views

Logistic regression for classification: are there any analytical solutions for the out-of-sample accuracy?

I run a binary logistic regression, with a binary dependent variable and a continuous independent one. Now I want to evaluate the out-of-sample performance of the classification algorithm so obtained. ...

robertspierre

2,578

modified Apr 13, 2021 at 12:41

12 votes

0 answers

1k views

Why we really need the concept of "Local" Rademacher complexity?

Recently, I have been studying High-Dimensional Statistics: A Non-Asymptotic Viewpoint written by Martin J. Wainwright. In this book, the author uses a special complexity measure which is called Local ...

Wei-Cheng Lee

348

asked Aug 2, 2019 at 4:57

12 votes

0 answers

2k views

Computing a bootstrap confidence interval for the prediction error with the percentile and the BCa method

I have two related questions regarding the computation of a non-parametric bootstrap confidence interval for the prediction error. Setting: I have a sample S from a data population P and a learner L, ...

CommunityBot

1

modified Feb 7, 2022 at 13:02

10 votes

0 answers

624 views

When using L2 regularization outside of linear regression, do the same MAP estimation assumptions hold?

Some context is shared below, and my question is bolded at the end. MLE from observation noise In the linear regression setting, we learn model weights $\mathbf{w}$ to make scalar predictions $\hat{y}...

kdbanman

877

modified Aug 31, 2022 at 20:27

10 votes

0 answers

2k views

Difference between Shapley values and SHAP

The Paper regarding die SHAP value gives a formula for the Shapley Values in (4) and for SHAP values apparently (?) in (8) Still I dont really understand the difference between Shapley and SHAP ...

Quastiat

233

asked Oct 2, 2019 at 15:31

10 votes

0 answers

377 views

Reinforcement Model Learning

Classical reinforcement learning (Q- or Sarsa-Learning) can be extended with models of the environment. These models are usually transition tables that contain the probability of arriving at a ...

CommunityBot

1

modified Jan 13, 2017 at 8:41

10 votes

2 answers

2k views

Random Forest: Class specific feature importance

I'm using the bigrf R-package to analyse a dataset with ca. 50.000 observations x 120 variables, classified into two groups. After growing a forest of 1000 trees, ...

CommunityBot

1

modified Feb 8 at 22:07

9 votes

0 answers

765 views

How come the BART results are this good at the 2016 Atlantic causal inference competition?

The famous paper Dorie,2017 shows that BART performs dramatically well in causal inference. In my replication, MSE in BART can be 40% lower than MSE in other machine learning methods. But all machine ...

Scriddie

2,623

modified Sep 8, 2023 at 17:31

9 votes

1 answer

179 views

Is there a ML or DL tool that can learn to detect periodically occurring patterns in a one dimensional time series?

I am trying to create a tool that labels refrigerator temperature readings. A reading is taken every 5 minutes, and its label identifies whether of not it was taken while the refrigerator was ...

EngrStudent

10k

answered Jan 4, 2021 at 18:34

8 votes

0 answers

3k views

In the attention mechanism, why are there separate weight matrices for the queries and keys?

To perform self attention over a collection of $n$ vectors stacked up into a matrix $X \in \mathbb{R}^{n \times d}$, we first obtain query, key, and value representations of these vectors via three ...

tddevlin

3,407

asked Dec 2, 2020 at 6:40

8 votes

0 answers

2k views

Zero-inflation with sklearn and continuous target?

My current data have quite a large amount of zeros (~60%), and I'm thinking of trying to implement a zero-inflated model of sorts with sklearn. While I've used zero-inflated poisson/negative binomial ...

bjr96571

81

asked Feb 1, 2020 at 8:50

8 votes

0 answers

1k views

The extrapolation problem: model selection, performance metrics, and improvement

Machine learning models are fit to a response variable within a given range. This leads to weak and sometimes disastrous performance when it comes to instances with an actual response variable outside ...

CommunityBot

1

modified Aug 17, 2020 at 4:06

8 votes

0 answers

331 views

What machine learning and deep learning models are used for longitudinal studies (panel data)?

As the title suggests, I have a longitudinal database (also called panel data). (I have over 100.000 observations. The time period is X years. This means that for every year I have the values of the ...

kjetil b halvorsen♦

84k

modified Apr 24, 2019 at 21:27

Stack Exchange Network

Unanswered Questions

Rademacher complexity of logistic regression

Convolutional neural network for multi-variate time series?

Logistic regression for classification: are there any analytical solutions for the out-of-sample accuracy?

Why we really need the concept of "Local" Rademacher complexity?

Computing a bootstrap confidence interval for the prediction error with the percentile and the BCa method

When using L2 regularization outside of linear regression, do the same MAP estimation assumptions hold?

Difference between Shapley values and SHAP

Reinforcement Model Learning

Random Forest: Class specific feature importance

How come the BART results are this good at the 2016 Atlantic causal inference competition?

Is there a ML or DL tool that can learn to detect periodically occurring patterns in a one dimensional time series?

In the attention mechanism, why are there separate weight matrices for the queries and keys?

Zero-inflation with sklearn and continuous target?

The extrapolation problem: model selection, performance metrics, and improvement

What machine learning and deep learning models are used for longitudinal studies (panel data)?

Unanswered Questions

Unanswered Tags