mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

8.1K
active users

#zstd

2 posts2 participants0 posts today
Replied in thread
@konstruct @SuperDicq GNU/JIHAD AGAINST "OPEN SOURCE" AND ALL OTHER FORMS OF PROPRIETARY SOFTWARE!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

The best technique available now is to start running a tor middle and the Chinese firewall will stop poorly programmed IPv4 LLM scrapers from China for you (just make sure you are also reachable over IPv6).

Another technique is to add bombs folder and link to it and make some GNUzip bomps;
dd if=/dev/zero bs=1G count=1 | gzip > 01GiB.gz
dd if=/dev/zero bs=1G count=1 | gzip > 10GiB.gz

Also make some zstandard bombs;
dd if=/dev/zero bs=1G count=1 | zstd > 01GiB.zst
dd if=/dev/zero bs=1G count=10 | zstd > 10GiB.zst

Also add a text file noting to right-click -> Save As to save the bombs instead of triggering them.

Also add to robots.txt;
User-agent: *
Disallow: /path/to/bombs/

Then add to the relevant server {} in nginx.conf;
#gzip bombs
location ~* /path/to/bombs/.*\.gz {
add_header Content-Encoding "gzip";
default_type "text/html";
}

#zstd bombs
location ~* /path/to/bombs/.*\.zst {
add_header Content-Encoding "zstd";
default_type "text/html";
}

Also 403 empty useragents; if ($http_user_agent = "") { return 403; }

There are also some LLM scrapers that will identify themselves;
if ($http_user_agent ~ (.*Amazonbot.*|.*Applebot.*|.*ClaudeBot.*|.*GPTBot.*)) { return 403; }

You also want ratelimiting;
In http {};
#10 requests per second on average
limit_req_zone $binary_remote_addr zone=perip:10m rate=10r/s;

Then to each server{};
limit_req zone=perip burst=10 nodelay;

For bursty protocols like git, you'll need a larger burst amount (allows several git clone's in a row from an IP, but will block if you spam git clone);
limit_req zone=perip burst=1024 nodelay;


That is usually all that is needed to stop all aggressive LLM scrapers and should continue to work forever.


Sites manually targeted for scraping will need specifically targeted defenses (evil Anubis will be bypassed in such cases).

Как я делаю бекапы домашней системы Linux: простой пример инкрементального rsync + btrfs с zstd сжатием

Статья покажет простой rsync скрипт для инкрементального бекапа (с использованием хардлинков из предыдущего бекапа) и про использование btrfs сжатия в zstd.

habr.com/ru/articles/929182/

ХабрКак я делаю бекапы домашней системы Linux: простой пример инкрементального rsync + btrfs с zstd сжатиемБекапы делать важно - пожалуйста делайте бекапы, иначе потеряете данные. Многие люди уже потеряли данные, а вы будьте умнее. Мой скрипт : # Start this script from the git folder of this script # This...
#rsync#btrfs#zstd

#Amazon анонсував новий алгоритм стискання (на жаль не такий як у #SiliconValley). Його можна спробувати використати для створення резервних копій заміть #gzip, бо він в 3-5 разів швидчий при тому самому рейті стискання
#backups #archiving #zstd
engineering.fb.com/2016/08/31/

Engineering at Meta · Smaller and faster data compression with ZstandardVisit the post for more.

php -r 'for($i=256;$i-->0;)for($ii=256;$ii-->0;)print(chr($i).chr($ii));' | zstd | wc -c

131085 bytes (131KB). The #zstd "compression" only makes it larger at every compression level!

bzip2 compresses it fine, to be 5 times smaller. (It compresses best at the worst compression setting, -1)

I always wondered what circumstances make bzip2 occasionally do better than the much newer zstd. Finally found a pattern by accident :)

If you know how to make zstd handle this properly, let me know please

Today I discovered that there is also an excellent compression format, #Zstandard (ZSTD), is fast with an excellent compression ratio, developed by #Meta, and released as #opensource
I needed to backup my files because I have to wipe my PC and reinstall #Linux. Now I have a dilemma: choosing between #antiX and #Lubuntu. I have a fairly decent computer, so I could even install a more full-featured OS, but I prefer an OS that doesn’t use too many resources.
#zstd
youtube.com/watch?v=k5XsiuxHv_A