mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

12K
active users

Church of Jeff

People are freaking out about this, without actually understanding how these AI's are built.

DeepSeek has stripped out and curated their data a whole hell of a lot.

The COST to create the AI was so small because the data used was small.

The way the US AI companies are working, they are dumping EVERYTHING into the models without trimming out almost anything, thus, the models take a lot more time to build.

Curating is key.

---> MORE

And what people aren't seeing is the cost of the curating process.

That has a cost too.

Humans need to validate and make sure the data flowing into these models is valid.

As I've said, the future of AI will be HUMANS going out in the world and licensing art, science, and validating information in these models.

It's how artists and scientists and the general humans that created it all for the AI to be paid in licensing fees.

@jeffowski I find your suggestion that #Deepseek may have curated the training data more accurately very interesting. But do you have any concrete evidence that this is the case? (By the way: I'm new on Mastodon...)

@ASchubbach -- Other than extensive study and research into AI?

Please take a look at the ChatGPT 3.0 data and look at all the data before they stopped publishing it publicly.

The exponential increase in size/complexity/carbon footprint are all calculable and follow specific mathematic rate (so far).

You also have to look at the multidimensional models and look at what the pathway looks like when responding to requests. As much as 80% of the model is bullshit and not used.

@ASchubbach -- Both of these factors point to a proper curation of the data before pushing it into the multidimensional model phase, which is the large price tags they are comparing.

But they aren't comparing apples to apples.

Again, when you have unlimited money and resources, the only real cost you have is TIME.

@jeffowski I do not have much practical experience with LLMs. So, thanks for your helpful explanation!