mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

8.1K
active users

#duckdb

6 posts6 participants0 posts today

🌗 什麼是語意層及其建置方式 — 以 DuckDB 為例
➤ 解密語意層的威力:從概念到 DuckDB 實踐指南
motherduck.com/blog/semantic-l
本文深入探討語意層的重要性,並實際演示如何使用 DuckDB 和 Ibis 搭配 YAML 設定檔及 Python 腳本,建立一個簡易的語意層。作者強調語意層能統一定義業務指標、簡化複雜分析、提升數據治理效率,並改善與 AI 模型的互動。文章也說明何時不需要語意層,並建議進一步的學習資源。
+ 終於有篇關於語意層的實用文章,而且還是用我習慣的 DuckDB!期待實際操作。
+ 講得真好,尤其是「統一單一真相來源」這點,真的很有感觸。
#數據分析 #語意層 #DuckDB #Python #Ibis #資料治理

MotherDuckWhy Semantic Layers Matter — and How to Build One with DuckDB - MotherDuck BlogLearn what a semantic layer is, why it matters, and how to build a simple one with DuckDB and Ibis using just YAML and Python | Reading time: 21 min read

My talks at @useR_conf is here defuneste.codeberg.page/useR_2

tldr: I think storing "big" data as a parquet files, stored in s3 accessed with duckDB and wrapped in an R package is a nice way to save some of your sanity.

Now that we know that DuckDB is great let start showing how R can make it in production! 😉

Side notes: loved using {litedown} and codeberg for the prez. Mermai.js you are also great but I am not rdy!

defuneste.codeberg.pageData as Code
Continued thread

@duckdb Future of this new package is unknown, but maybe I will implement a few more functions from {sf} and {areal} in {ducksf} in the coming months. It is also not unlikely that the devs of #DuckDB Spatial extension (github.com/duckdb/duckdb-spati) will just implement areal interpolation themselves, but then my job will only be easier, I will just wrap their function in {𝐝𝐮𝐜𝐤𝐬𝐟} instead of implementing it in SQL right now.

Get 9-30x speed doing areal-weighted interpolation with my new {𝐝𝐮𝐜𝐤𝐬𝐟} #rstats package compared to {sf}/{areal}. Experimental, but tested against both {areal} and {sf}. github.com/e-kotov/ducksf . Despite the costs of moving data between R and #DuckDB, the performance of {𝐝𝐮𝐜𝐤𝐬𝐟} is impressive, thanks to #DuckDB . Look at the attached benchmark results. And be sure to read the recent post of @duckdb about the performance improvements of their spatial joins here: duckdb.org/2025/08/08/spatial-

I’ve always known that the #DuckDB appender interface was the way to go for bulk loading data. But today I had reason to write a #Golang benchmark to see just how much faster it is and discovered it’s at least 250x faster (on my laptop) at inserting a bigint into a table.

I tested both in-memory and on-disk as well as testing INSERT with auto-commit and with batched commits at various batch sizes.

gist.github.com/rkennedy-argus

I suppose I should test INSERTs with prepared statements, too. But I doubt they’ll put much of a dent in that difference.

Go DuckDB bulk loading benchmark. GitHub Gist: instantly share code, notes, and snippets.
GistGo DuckDB bulk loading benchmarkGo DuckDB bulk loading benchmark. GitHub Gist: instantly share code, notes, and snippets.

🌘 Xorq:以 Python 簡潔性實現 SQL 規模的機器學習目錄、組合與部署
➤ 打造具備 Python 簡潔性與 SQL 擴展能力的下一代 ML 管道
github.com/xorq-labs/xorq
Xorq 是一個新穎的機器學習框架,旨在簡化並標準化 ML 管道的建置、分享與部署流程。它透過結合 Python 的易用性與 SQL 的強大擴展性,讓開發者能夠以聲明式的方式跨多個計算引擎(如 DuckDB、Snowflake 和 DataFusion)建立可重複使用的 ML 管道。Xorq 的核心技術包括使用 Apache Arrow 進行零拷貝資料傳輸,以及利用 Ibis 和 DataFusion 實現高效運算。其特點包括:支援 pandas 風格語法與 Ibis 的多引擎聲明式表達;將 Python 運算式定義為 YAML 格式,確保可重複性;提供可移植的 UDF 與 UDAF,並支援自動序
#機器學習 #資料工程 #管道 #Python #SQL #Ibis #DuckDB #Snowflake #DataFusion #Apache Arrow

Catalog, compose, and ship ML—Python simplicity, SQL scale. - xorq-labs/xorq
GitHubGitHub - xorq-labs/xorq: Catalog, compose, and ship ML—Python simplicity, SQL scale.Catalog, compose, and ship ML—Python simplicity, SQL scale. - xorq-labs/xorq