Comments

gwern2d4-2

This sounds like a bad plan because it will be a logistics nightmare (undermining randomization) with high attrition, and extremely high variance due to between-subject design (where subjects differ a ton at baseline, in addition to exposure) on a single occasion with uncontrolled exposures and huge measurement error where only the most extreme infections get reported (sometimes). You'll probably get non-answers, if you finish at all. The most likely outcome is something goes wrong and the entire effort is wasted.

Since this is a topic which is highly repeatable within-person (and indeed, usually repeats often through a lifetime...), this would make more sense as within-individual and using higher-quality measurements.

One good QS approach would be to exploit the fact that infections, even asymptomatic ones, seem to affect heart rate etc as the body is damaged and begins fighting the infection. HR/HRV is now measurable off the shelf with things like the Apple Watch, AFAIK. So you could recruit a few tech-savvy conference-goers for measurements from a device they already own & wear. This avoids any 'big bang' and lets you prototype and tweak on a few people - possibly yourself? - before rolling it out, considerably de-risking it.

There are some people who travel constantly for business and going to conferences, and recruiting and managing a few of them would probably be infinitely easier than 500+ randos (if for no reason other than being frequent flyers they may be quite eager for some prophylactics), and you would probably get far more precise data out of them if they agree to cooperate for a year or so and you get eg 10 conferences/trips out of each of them which you can contrast with their year-round baseline & exposome and measure asymptomatic infections or just overall health/stress. (Remember, variance reduction yields exponential gains in precision or sample-size reduction. It wouldn't be too hard for 5 or 10 people to beat a single 250vs250 one-off experiment, even if nothing whatsoever goes wrong in the latter. This is a case where a few hours writing simulations to do power analysis on could be very helpful. I bet that the ability to detect asymptomatic cases, and run within-person, will boost statistical power a lot more than you think compared to ad hoc questionnaires emailed afterwards which may go straight to spam...)

I wonder if you could also measure the viral load as a whole to proxy for the viral exposome through something like a tiny air filter, which can be mailed in for analysis, like the exposometer? Swap out the exposometer each trip and you can measure load as a covariate.

Answer by gwernApr 20, 20244321

Yes. Commoditize-your-complement dynamics do not come with any set number. They can justify an expense of thousands of dollars, or of billions - it all depends on the context. If you are in a big enough industry, and the profits at stake are large enough, and the investment in question is critical enough, you can justify any number as +EV. (Think of it less as 'investment' and more as 'buying insurance'. Facebook's META market cap is worth ~$1,230 billion right now; how much insurance should its leaders buy against the periodic emergences of new platforms or possible paradigm shifts? Definitely at least in the single billions, one would think...)

And investments of $10m are highly routine and ordinary, and people have already released weights (note: most of these AI releases are not 'open source', including Llama-3) for models with easily $10m of investment before. (Given that a good ML researcher-engineer could have a fully-loaded cost of $1m/year, if you have a small team of 10 and they release a model per year, then you already hit $10m spent the first year.) Consider Linux: if you wanted to make a Linux kernel replacement, which has been tested in battle and supported as many things as it does etc, today, that would probably cost you at least $10 billion, and the creation of Linux has been principally bankrolled by many companies collectively paying for development (for a myriad of reasons and ways). Or consider Android Linux. (Or go through my list and think about how much money it must take to do things like RISC-V.)

If Zuckerberg feels that LLMs are enough of a threat to the Facebook advertising model or creating a new social media which could potentially supersede Facebook (like Instagram and Whatsapp were), then he certainly could justify throwing a billion dollars of compute at a weights release in order to shatter the potential competition into a commoditized race-to-the-bottom. (He's already blown much, much more on VR.)

The main prediction, I think, of commoditize-your-complement is that there is not much benefit to creating the leading-edge model or surpassing the SOTA by a lot. Your motivation is to release the cheapest model which serves as a spoiler model. So Llama-3 doesn't have to be better than GPT-4 to spoil the market for OA: it just needs to be 'good enough'. If you can do that by slightly beating GPT-4, then great. (But there's no appetite to do some amazing moonshot far surpassing SOTA.)

However, because LLMs are moving so fast, this isn't necessarily too useful to point out: Zuckerberg's goal with Llama-3 is not to spoil GPT-4 (which has already been accomplished by Claude-3 and Databricks and some others, I think), but to spoil GPT-5 as well as Claude-4 and unknown competitors. You have to skate to where the puck will be because if you wait for GPT-5 to fully come out before you start spinning up your comoditizer model, your teams will have staled, infrastructure rotted, you'll lose a lot of time, and who knows what will happen with GPT-5 before you finally catch up.

The real killer of Facebook investment would be the threat disappearing and permanent commoditization setting in, perhaps by LLMs sigmoiding hard and starting to look like a fad like 3D TVs. For example, if GPT-5 came out and it was barely distinguishable from GPT-4 and nothing else impressive happened and "DL hit a wall" at long last, then Llama-4 would probably still happen at full strength - since Zuck already bought all those GPUs - but then I would expect a Llama-5 to be much less impressive and be coasting on fumes and not receive another 10 or 100x scaleup, and Facebook DL R&D would return to normal conditions.

EDIT: see https://thezvi.wordpress.com/2024/04/22/on-llama-3-and-dwarkesh-patels-podcast-with-zuckerberg/

gwern4d20

It might be tempting to think you could use multivariate statistics like factor analysis to distill garbage information by identifying axes which give you unusually much information about the system. In my experience, that doesn't work well, and if you think about it for a bit, it becomes clear why: if the garbage information has a 50 000 : 1 ratio of garbage : blessed, then finding an axis which explains 10 variables worth of information still leaves you with a 5 000 : 1 ratio of garbage : blessed. The distillation you get with such techniques is simply not strong enough.[1][2]

That doesn't seem like it. In many contexts, a 10x saving is awesome and definitely a 'blessed' improvement if you can kill 90% of the noise in anything you have to work with. But you don't want to do that with logs. You can't distill information in advance of a bug (or anomaly, or attack) because a bug by definition is going to be breaking all of the past behavior & invariants governing normal behavior that any distillation was based on. If it didn't, it would usually be fixed already. ("We don't need to record variable X in the log, which would be wasteful accurst clutter, because X cannot change." NARRATOR: "X changed.") The logs are for the exceptions - which are precisely what any non-end-to-end lossy compression (factor analysis or otherwise) will correctly throw out information about to compress as residuals to ignore in favor of the 'signal'. Which is why the best debugging systems like time-travel debugging or the shiny new Antithesis work hard to de facto save everything.

gwern4d230

The Daily Nous (a relatively 'popular' academic philosophy blog) managed to get a non-statement out of Oxford:

Oxford University has taken the difficult decision to close the Future of Humanity Institute, a research centre in the Faculty of Philosophy. The Institute has made an important contribution to the study of the future of humanity, for which we would like to thank and recognise the research team. Researchers elsewhere across Oxford University are likely to continue to work on this emerging field.

gwern4d281

I would say that the closest to FHI at Oxford right now would probably be Global Priorities Institute (GPI). A lot of these papers would've made just as much sense coming out of FHI. (Might be worth considering how GPI apparently seems to have navigated Oxford better.)

gwern5d52

Any updates on this? For example, I notice that the new music services like Suno & Udio seem to be betraying a bit of mode collapse and noticeable same-yness, but they certainly do not degenerate into such within-song repetition like these were.

Load More