Most active questions
59 questions from the last 30 days
5
votes
2
answers
267
views
Does value iteration still return the true Q-values in stochastic environment?
I'm working with the FrozenLake environment (8x8) from Gymnasium.
In the deterministic case (is_slippery=False), I understand that using value iteration can ...
2
votes
1
answer
669
views
How can the exact same model give different confusion matrices for the test dataset and the entire dataset?
I have recently implemented a simple artificial neural network with 1 hidden layer. I split my data using train_test_split and I end up with the following confusion matrix in my test set.
...
3
votes
2
answers
54
views
Understanding Why TD Learning Has Lower Variance Despite Using an Estimated Value
In Temporal Difference (TD) learning, the value function is updated using its own estimate, following the rule:$V (S_t) \leftarrow V (S_t) + \alpha[R_{t+1} + \gamma V (S_{t+1}) − V (S_t)]$. It's often ...
4
votes
1
answer
107
views
Understanding the optimal value function in RL
The definition (section 3.6 Barto Sutton) for the optimal policy states that $\pi > \pi'$ iff $v_{\pi}(s) > v_{\pi'}(s)$ for all $s \in S$.
I have difficulty understanding why the value (under ...
2
votes
1
answer
91
views
Proposal for AGI model
I've been doing a bit of research into formal models for AGI, searching for fertile ground for developing new ideas. One area that didn't seem too thoroughly explored was in designing agents that “...
1
vote
1
answer
46
views
Do neural networks do wishful thinking?
I will give an example of wishful thinking. When you try to prove a theorem you think what would imply that theorem and maybe try to find a lemma that implies it. Maybe neurons try to connect previous ...
4
votes
1
answer
57
views
Are vision transformers scale invariant like CNNs?
I was trying to implement a vision transformer (RT-DETR) for object detection. I trained the model on 640x640 px images and tested it on a 2000x2000 px image containing many objects - the outputs did ...
0
votes
3
answers
76
views
What are some practical use cases where generative AI has saved you time or boosted creativity?
I’ve been testing out different generative AI tools recently, and I’m wondering what kinds of real, everyday use cases people here have found most useful. Not just flashy demos — I mean the tools that ...
0
votes
2
answers
61
views
Why can the function that turns the history into one Markov state be any function?
Summary
In David Silver's RL lecture slides, he defines the State $S_t$ formally as a function of the history:
David then goes on to define the Markov state as any state $S_t$ such that the ...
2
votes
1
answer
44
views
Doubt regarding the convergence proof of $Q$-learning
I was trying the understand the proof of $Q$-learning from here. At page 17
as you can see $||\Delta_t + Q^*||^2_{\infty} \leq ||\Delta_t||^2 + Q^*||^2_{\infty}$ has been used to make a bound on $Var(...
0
votes
1
answer
37
views
Who argued that we're entering a 4th era of science with machine learning?
I remember reading a reference to a recent paper that argued that science today is in its 4th stage (paradigm?), the era of modelling with machine learning. The 3rd was that of Newton, Kepler, et al.
...
2
votes
1
answer
37
views
Can Self Attention capture rate of change of token?
From what I understand, the self-attention mechanism captures the dependency of a given token on various other tokens in a sequence. Inspired by nature, where natural laws are often expressed in terms ...
2
votes
0
answers
32
views
Fine-tuning ResNet101 stuck at ~50% accuracy while MobileNetV2 reaches ~90% (same data, head, training setup)
I'm fine-tuning two different CNNs for an image classification task:
The first CNN uses a ResNet101 backbone, and the second uses a MobileNetV2 backbone. Both are pre-trained on ImageNet.
I use the ...
0
votes
1
answer
52
views
How do tools like V0.dev provide the project's code context to the AI while minimizing its input tokens?
Under the hood, tools like V0.dev use OpenAI API to generate code. How does V0 know which file context to send to the AI when user makes prompts like "make xyz change/change the button color/add ...
2
votes
1
answer
43
views
Can the output of a language model be identical to it's training data if finetuned with reference documents also present on the training data?
E.g. Finetuning a language model using text from Wikipedia articles (without modifications) when the language model has Wikipedia data in its training dataset will cause the model to reproduce the ...