I join the request. It's a good meme if and only if it is true.
can't we just look at weights?
As I understand, interpretability research doesn't exactly got stuck, but it's very-very-very far from something like this even for not-SotA models. And the gap is growing.
Which concept they might obtain by reading my book on Highly Advanced Epistemology 101 For Beginners, or maybe just my essay on Local Validity as a Key to Sanity and Civilization, I guess?"
Perhaps, there should be two links here?
Do you think that if someone filtered and steelmanned Quintin's criticism, it would be valuable? (No promises)
I think from Eliezer's point of view it goes kinda like this:
After reading this text at least one person (you) thinks that the goal "avoid dishonesty and rudeness" were not achieved, so text is a failure.
After reading this text at least one person (me) thinks that 1. I got some useful ideas and models. 2. Of course, at least the smartest opponents of Eliezer have better arguments and I don't think Eliezer would disagree with that, so text is a success.
Ideally, Eliezer should update his strategy of writing texts based on both pieces of evidence.
I can be wrong, of course.
What if just turn off the possibility to use the reaction by clicking it in the list of already used reactions? Yes, people would use them less, but more deliberately.
I think I'm at least close to agreeing, but even if it's like this now, it doesn't mean that the complex-positive-value-optimizer can produce more value mass than simple-negative-value-optimizer.
I think (not sure) Kelly Criterion applies to you only if you already are concave