mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

13K
active users

Meredith Whittaker

Now that "scale is all we need" has predictably faltered...

1. Small AI models often perform better in context.

2. Obsession w bigness has bad consequences, from climate, to power concentration, to research capture.

Me +
@GaelVaroquaux
@sashaluccioni

arxiv.org/abs/2409.14160

@Mer__edith @GaelVaroquaux …to be fair, further scaling may falter, but current large models can do amazing things, will continue to do so and probably will get more efficient and cheaper

@ErikJonker @GaelVaroquaux efficiency doesn't (generally) lead to a reduction in scale. In fact, often the opposite. See: Jevons paradox. See: Alexnet and the transformers architecture (and many others) presented massive efficiency gains--gains that were then leveraged to build bigger than previously possible, not to reduce and/or maintain scale.

We talk about this in the paper.

@Mer__edith @GaelVaroquaux …also true, I have red the paper, I am not very optimistic about slowing down the adoption of energy wasting large AI models, even if I look in my own work environment (government) and with large platform players as Microsoft pushing their AI portfolio. Even some politicians push for AI adoption because of efficiency promises (mostly ill-informed, not well argumented)

@ErikJonker @Mer__edith @GaelVaroquaux

But will they get cheap enough to be profitable on their own without a surveillance Big Tech corp behind them?

What good is a technological "revolution" if it can only be operated, at great financial cost, by, like 5 entities?

@ErikJonker @Mer__edith @GaelVaroquaux “Current large models can do amazing things” [citation needed]

@ErikJonker @Mer__edith @GaelVaroquaux in most real use cases you don’t need to do amazing things, but more likely to perform one thing amazingly well.

obsession with bigness and scale is a problem in renewable energy generation as well. its much more effective at small, local scale.

@Mer__edith @GaelVaroquaux I'm looking forward to seeing how interconnected small models perform. I imagine there's a lot of difficulty with intermodel communication and task direction, but my brain seems to get it done fairly efficiently.

@Mer__edith @GaelVaroquaux you're absolutely right! Small models do text summary just as good as the huge ones, they extract the keywords equally well. There is no need to bother ChatGPT 4 or Claude Sonnet 3.5 with such tasks. 3B models will do them equally well at fraction of the cost. Plus they run teally well on CPU or mobile!

@Mer__edith @GaelVaroquaux

I once knew a guy with three Ph.D.s. He knew everything, spoke several languages, and was very fun to talk to. He was the guidance councilor. He liked education.

I also knew this other guy who got a Ph.D. in a field so narrow I can't describe it. He liked lasers, and made a really big one. He was also very fun to talk to (I hope he still is, unless he touched the plasma)

LLMs are a bit different though. They don't have to be one monolithic brain. They can have small, let's see what could we call them, "Expert Systems", that do lots of easy tasks subconsciously

@Mer__edith @GaelVaroquaux That’s exactly why I think Apples approach is the best. Fast and small models on device and if the task is too big for them a private AI cloud. And for even more complex tasks ChatGPT (and probably other models in the future) as a fallback. My expos that even at this early stage of Apple intelligence the small models work good enough for many daily tasks.

@Mer__edith @GaelVaroquaux Also, smaller models can run on a small affordable workstation or gaming PC, some even on an office PC or laptop, so you don't need to rely on some external cloud service.