mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

8.3K
active users

#tokenizers

0 posts0 participants0 posts today
Anoncheg<p><a href="https://techhub.social/tags/dailyreport" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dailyreport</span></a> <a href="https://techhub.social/tags/tokenizers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenizers</span></a> <a href="https://techhub.social/tags/huggingface" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>huggingface</span></a> <a href="https://techhub.social/tags/rust" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rust</span></a> <a href="https://techhub.social/tags/gentoo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>gentoo</span></a><br> <a href="https://techhub.social/tags/ebuild" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ebuild</span></a> <a href="https://techhub.social/tags/secops" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>secops</span></a> <a href="https://techhub.social/tags/cargo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cargo</span></a><br>I compiled the HF 🤗 tokenizers library from sources and<br> enhanced Gentoo ebuild file to allow reproducible<br> installation from sources.<br>I removed optional dependencies and disabled HTTP<br> requirements to enhance security.</p><p>I wrote very simple tests for tokenizers, safetensors,<br> transformers and integration test for them, because<br> tokenizers require HF hub for testing, that I disabled.</p><p>It was a hard but good experience with Cargo package<br> manager of Rust. The main problems was due to strange<br> cfg flags that Gentoo should have set automaticly, for<br> ex. target_os=linux was not set. "cfg" is an<br> abomination that you can't add change this safely.</p><p>I didn't find a working solution to manage "cfg" and, so<br> I just patched the Cargo.toml files of dependencies by<br> commenting out lines.<br>(∠・ω )⌒</p>
Anoncheg<p><a href="https://techhub.social/tags/dailyreport" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dailyreport</span></a> <a href="https://techhub.social/tags/tokenizers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenizers</span></a> <a href="https://techhub.social/tags/huggingface" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>huggingface</span></a> <a href="https://techhub.social/tags/rust" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rust</span></a> <a href="https://techhub.social/tags/gentoo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>gentoo</span></a><br> <a href="https://techhub.social/tags/ebuild" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ebuild</span></a> <a href="https://techhub.social/tags/secops" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>secops</span></a> <a href="https://techhub.social/tags/cargo" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cargo</span></a><br>I compiled HF tokenizers library from sources and<br> enhanced Gentoo ebuild file that allow reproducable<br> installation from sources.<br>I removed optional dependencies and disabled http<br> requirements to enhance security.</p><p>I wrote very simple tests for tokenizers, safetensors,<br> transformers and integration test for them.</p><p>It was hard but good experience with Cargo package<br> manager of Rust. Main problems was because of strange<br> cfg flags that Gentoo should set automaticly:<br> target_os=linux was not set. "cfg" is an abomination you<br> can't add change this safely.</p><p>I didn't found working solution to manage "cfg" and just<br> patched Cargo.toml files of dependencies. by commenting<br> lines.<br>(∠・ω )⌒</p>
Arthur Hau, PhD🐶🐱🌱🎵🦣<p>To most people the word <a href="https://tribe.net/tags/token" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>token</span></a> is a black box. I am not using the <a href="https://tribe.net/tags/tokenizers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenizers</span></a> that are commonly used in <a href="https://tribe.net/tags/DeepLearning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DeepLearning</span></a> <a href="https://tribe.net/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a>. Instead I am using my own <a href="https://tribe.net/tags/WordCoding" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>WordCoding</span></a> system that I will call yxxx+. I am using base 16 for coding 300 common ESL English words for my <a href="https://tribe.net/tags/SLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SLM</span></a> project. y ranges from 0-F which denotes the <a href="https://tribe.net/tags/POS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>POS</span></a> (part of speech) of a word. xxx are 3 base 16 digits. Theoretically, I can expand my model to 4000 "base" words. + denotes an additional code which I will explain later. <a href="https://tribe.net/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a></p>
:rss: .NET Blog<p>Introducing the AI Dev Gallery: Your Gateway to Local AI Development with .NET<br><a href="https://devblogs.microsoft.com/dotnet/introducing-ai-dev-gallery-gateway-to-local-ai-development/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">devblogs.microsoft.com/dotnet/</span><span class="invisible">introducing-ai-dev-gallery-gateway-to-local-ai-development/</span></a></p><p><a href="https://rss-mstdn.studiofreesia.com/tags/microsoft" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>microsoft</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/NET" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NET</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/AI" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AI</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/NET_9" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NET_9</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/dev_tools" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dev_tools</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/generative_ai" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>generative_ai</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/Machine_Learning" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Machine_Learning</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/tokenizers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenizers</span></a> <a href="https://rss-mstdn.studiofreesia.com/tags/vector_search" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vector_search</span></a></p>
michabbb<p>🔧 <a href="https://social.vivaldi.net/tags/code2prompt" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>code2prompt</span></a>: A command-line tool for converting codebases to <a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> prompts</p><p>Key features:<br>• 📁 Generates well-formatted <a href="https://social.vivaldi.net/tags/Markdown" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Markdown</span></a> prompts with source tree structure<br>• 🛠️ Customizable <a href="https://social.vivaldi.net/tags/Handlebars" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Handlebars</span></a> templates for versatile prompt generation<br>• 🔍 Respects .gitignore and supports file filtering with glob patterns<br>• 🔢 Displays token count using various <a href="https://social.vivaldi.net/tags/tokenizers" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tokenizers</span></a> (cl100k, p50k, r50k_base)<br>• 📊 <a href="https://social.vivaldi.net/tags/Git" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Git</span></a> diff integration for commit messages and <a href="https://social.vivaldi.net/tags/PullRequest" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>PullRequest</span></a> descriptions<br>• 📋 Automatic clipboard copy and option to save output to file</p><p>Additional capabilities:<br>• 🔢 Line numbering for source code blocks<br>• 🔀 JSON output option for structured data<br>• 🚫 Exclusion of files/folders from source tree<br>• 📝 Support for user-defined variables in templates</p><p><a href="https://social.vivaldi.net/tags/opensource" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>opensource</span></a> project written in <a href="https://social.vivaldi.net/tags/Rust" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Rust</span></a>, available on <a href="https://social.vivaldi.net/tags/crates_io" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>crates_io</span></a> and <a href="https://social.vivaldi.net/tags/AUR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AUR</span></a></p><p>Useful for:<br>• Quick <a href="https://social.vivaldi.net/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>LLM</span></a> prompt generation from codebases<br>• Code documentation and analysis<br>• Bug finding and security vulnerability assessment<br>• Performance optimization suggestions</p><p><a href="https://github.com/mufeedvh/code2prompt" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/mufeedvh/code2promp</span><span class="invisible">t</span></a></p>