Most active questions

5 votes

2 answers

267 views

Does value iteration still return the true Q-values in stochastic environment?

I'm working with the FrozenLake environment (8x8) from Gymnasium. In the deterministic case (is_slippery=False), I understand that using value iteration can ...

Jien Weng

69

asked Apr 10 at 1:15

2 votes

1 answer

669 views

How can the exact same model give different confusion matrices for the test dataset and the entire dataset?

I have recently implemented a simple artificial neural network with 1 hidden layer. I split my data using train_test_split and I end up with the following confusion matrix in my test set. ...

The Logician

21

asked Apr 15 at 10:03

3 votes

2 answers

54 views

Understanding Why TD Learning Has Lower Variance Despite Using an Estimated Value

In Temporal Difference (TD) learning, the value function is updated using its own estimate, following the rule:$V (S_t) \leftarrow V (S_t) + \alpha[R_{t+1} + \gamma V (S_{t+1}) − V (S_t)]$. It's often ...

Goldhand

51

asked Apr 25 at 10:15

4 votes

1 answer

107 views

Understanding the optimal value function in RL

The definition (section 3.6 Barto Sutton) for the optimal policy states that $\pi > \pi'$ iff $v_{\pi}(s) > v_{\pi'}(s)$ for all $s \in S$. I have difficulty understanding why the value (under ...

ahron

265

asked Apr 11 at 6:58

2 votes

1 answer

91 views

Proposal for AGI model

I've been doing a bit of research into formal models for AGI, searching for fertile ground for developing new ideas. One area that didn't seem too thoroughly explored was in designing agents that “...

bishop-fish

121

asked Apr 27 at 8:59

1 vote

1 answer

46 views

Do neural networks do wishful thinking?

I will give an example of wishful thinking. When you try to prove a theorem you think what would imply that theorem and maybe try to find a lemma that implies it. Maybe neurons try to connect previous ...

gha00

21

asked Apr 25 at 7:36

4 votes

1 answer

57 views

Are vision transformers scale invariant like CNNs?

I was trying to implement a vision transformer (RT-DETR) for object detection. I trained the model on 640x640 px images and tested it on a 2000x2000 px image containing many objects - the outputs did ...

Lockhart

143

asked Apr 10 at 15:17

0 votes

3 answers

76 views

What are some practical use cases where generative AI has saved you time or boosted creativity?

I’ve been testing out different generative AI tools recently, and I’m wondering what kinds of real, everyday use cases people here have found most useful. Not just flashy demos — I mean the tools that ...

FaceSwapAI

1

asked Apr 16 at 9:40

0 votes

2 answers

61 views

Why can the function that turns the history into one Markov state be any function?

Summary In David Silver's RL lecture slides, he defines the State $S_t$ formally as a function of the history: David then goes on to define the Markov state as any state $S_t$ such that the ...

Andrew

1

asked May 1 at 23:53

2 votes

1 answer

44 views

Doubt regarding the convergence proof of $Q$-learning

I was trying the understand the proof of $Q$-learning from here. At page 17 as you can see $||\Delta_t + Q^*||^2_{\infty} \leq ||\Delta_t||^2 + Q^*||^2_{\infty}$ has been used to make a bound on $Var(...

Subhajit Saha

121

asked Apr 10 at 19:49

0 votes

1 answer

37 views

Who argued that we're entering a 4th era of science with machine learning?

I remember reading a reference to a recent paper that argued that science today is in its 4th stage (paradigm?), the era of modelling with machine learning. The 3rd was that of Newton, Kepler, et al. ...

Geremia

525

asked Apr 23 at 22:53

2 votes

1 answer

37 views

Can Self Attention capture rate of change of token?

From what I understand, the self-attention mechanism captures the dependency of a given token on various other tokens in a sequence. Inspired by nature, where natural laws are often expressed in terms ...

Manish Kumar Singh

21

asked Apr 14 at 21:41

2 votes

0 answers

32 views

Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)

I'm fine-tuning two different CNNs for an image classification task: The first CNN uses a ResNet101 backbone, and the second uses a MobileNetV2 backbone. Both are pre-trained on ImageNet. I use the ...

S.E.K.

41

asked Apr 28 at 21:04

0 votes

1 answer

52 views

How do tools like V0.dev provide the project's code context to the AI while minimizing its input tokens?

Under the hood, tools like V0.dev use OpenAI API to generate code. How does V0 know which file context to send to the AI when user makes prompts like "make xyz change/change the button color/add ...

raspace

9

asked Apr 21 at 5:47

2 votes

1 answer

43 views

Can the output of a language model be identical to it's training data if finetuned with reference documents also present on the training data?

E.g. Finetuning a language model using text from Wikipedia articles (without modifications) when the language model has Wikipedia data in its training dataset will cause the model to reproduce the ...

user1678860

377

asked Apr 17 at 0:00

Stack Exchange Network

Does value iteration still return the true Q-values in stochastic environment?

How can the exact same model give different confusion matrices for the test dataset and the entire dataset?

Understanding Why TD Learning Has Lower Variance Despite Using an Estimated Value

Understanding the optimal value function in RL

Proposal for AGI model

Do neural networks do wishful thinking?

Are vision transformers scale invariant like CNNs?

What are some practical use cases where generative AI has saved you time or boosted creativity?

Why can the function that turns the history into one Markov state be any function?

Doubt regarding the convergence proof of $Q$-learning

Who argued that we're entering a 4th era of science with machine learning?

Can Self Attention capture rate of change of token?

Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)

How do tools like V0.dev provide the project's code context to the AI while minimizing its input tokens?

Can the output of a language model be identical to it's training data if finetuned with reference documents also present on the training data?

Hot Network Questions

Most active questions

Related Tags