Most active questions

5 votes
2 answers
267 views

Does value iteration still return the true Q-values in stochastic environment?

I'm working with the FrozenLake environment (8x8) from Gymnasium. In the deterministic case (is_slippery=False), I understand that using value iteration can ...
Jien Weng's user avatar
2 votes
1 answer
669 views

How can the exact same model give different confusion matrices for the test dataset and the entire dataset?

I have recently implemented a simple artificial neural network with 1 hidden layer. I split my data using train_test_split and I end up with the following confusion matrix in my test set. ...
The Logician's user avatar
3 votes
2 answers
54 views

Understanding Why TD Learning Has Lower Variance Despite Using an Estimated Value

In Temporal Difference (TD) learning, the value function is updated using its own estimate, following the rule:$V (S_t) \leftarrow V (S_t) + \alpha[R_{t+1} + \gamma V (S_{t+1}) − V (S_t)]$. It's often ...
Goldhand's user avatar
4 votes
1 answer
107 views

Understanding the optimal value function in RL

The definition (section 3.6 Barto Sutton) for the optimal policy states that $\pi > \pi'$ iff $v_{\pi}(s) > v_{\pi'}(s)$ for all $s \in S$. I have difficulty understanding why the value (under ...
ahron's user avatar
  • 265
2 votes
1 answer
91 views

Proposal for AGI model

I've been doing a bit of research into formal models for AGI, searching for fertile ground for developing new ideas. One area that didn't seem too thoroughly explored was in designing agents that “...
bishop-fish's user avatar
1 vote
1 answer
46 views

Do neural networks do wishful thinking?

I will give an example of wishful thinking. When you try to prove a theorem you think what would imply that theorem and maybe try to find a lemma that implies it. Maybe neurons try to connect previous ...
gha00's user avatar
  • 21
4 votes
1 answer
57 views

Are vision transformers scale invariant like CNNs?

I was trying to implement a vision transformer (RT-DETR) for object detection. I trained the model on 640x640 px images and tested it on a 2000x2000 px image containing many objects - the outputs did ...
Lockhart 's user avatar
0 votes
3 answers
76 views

What are some practical use cases where generative AI has saved you time or boosted creativity?

I’ve been testing out different generative AI tools recently, and I’m wondering what kinds of real, everyday use cases people here have found most useful. Not just flashy demos — I mean the tools that ...
FaceSwapAI's user avatar
0 votes
2 answers
61 views

Why can the function that turns the history into one Markov state be any function?

Summary In David Silver's RL lecture slides, he defines the State $S_t$ formally as a function of the history: David then goes on to define the Markov state as any state $S_t$ such that the ...
Andrew's user avatar
  • 1
2 votes
1 answer
44 views

Doubt regarding the convergence proof of $Q$-learning

I was trying the understand the proof of $Q$-learning from here. At page 17 as you can see $||\Delta_t + Q^*||^2_{\infty} \leq ||\Delta_t||^2 + Q^*||^2_{\infty}$ has been used to make a bound on $Var(...
Subhajit Saha's user avatar
0 votes
1 answer
37 views

Who argued that we're entering a 4th era of science with machine learning?

I remember reading a reference to a recent paper that argued that science today is in its 4th stage (paradigm?), the era of modelling with machine learning. The 3rd was that of Newton, Kepler, et al. ...
Geremia's user avatar
  • 525
2 votes
1 answer
37 views

Can Self Attention capture rate of change of token?

From what I understand, the self-attention mechanism captures the dependency of a given token on various other tokens in a sequence. Inspired by nature, where natural laws are often expressed in terms ...
Manish Kumar Singh's user avatar
2 votes
0 answers
32 views

Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)

I'm fine-tuning two different CNNs for an image classification task: The first CNN uses a ResNet101 backbone, and the second uses a MobileNetV2 backbone. Both are pre-trained on ImageNet. I use the ...
S.E.K.'s user avatar
  • 41
0 votes
1 answer
52 views

How do tools like V0.dev provide the project's code context to the AI while minimizing its input tokens?

Under the hood, tools like V0.dev use OpenAI API to generate code. How does V0 know which file context to send to the AI when user makes prompts like "make xyz change/change the button color/add ...
raspace's user avatar
2 votes
1 answer
43 views

Can the output of a language model be identical to it's training data if finetuned with reference documents also present on the training data?

E.g. Finetuning a language model using text from Wikipedia articles (without modifications) when the language model has Wikipedia data in its training dataset will cause the model to reproduce the ...
user1678860's user avatar

15 30 50 per page