Mastodon.world admins @mwadmin

**Python-Job-Alert** @Python-Job-Alert@activitypub.awakari.com · 11h

Python-Job-Alert @Python-Job-Alert@activitypub.awakari.com

Top 10 Apache Spark Topics No Data Engineer Should Ever Miss! I don’t care if you know Python. Can you optimize a Spark job running on a billion records without spinning up an army of EC2s? Conti...

#apache-spark #data-science #technology #data-engineering #programming

Origin | Interest | Match

Medium · 11hTop 10 Apache Spark Topics No Data Engineer Should Ever Miss!By Shashwath Shenoy

#ApacheSpark #datascience #dataengineering

**Python-Job-Alert** @Python-Job-Alert@activitypub.awakari.com · 3d

Python-Job-Alert @Python-Job-Alert@activitypub.awakari.com

How to Calculate Jobs, Stages, and Tasks in Apache Spark If you’re learning Apache Spark or preparing for a data engineer interview, understanding how Spark calculates Jobs, Stages, and Tasks is...

#data-engineering #python #apache-spark #technology #sql

Origin | Interest | Match

Towards Data Engineering · 3dHow to Calculate Jobs, Stages, and Tasks in Apache SparkBy Avinash Jha

#dataengineering #ApacheSpark

**Thewatch** @thewatchsaysello99@mastodon.social · 4d

Thewatch @thewatchsaysello99@mastodon.social

#data #dataengineering #datascience

**Posit** @Posit@fosstodon.org · 5d

Posit @Posit@fosstodon.org

What makes tools truly useful?

Episode 2 of #TheTestSet features Wes McKinney (Part 1of 2!) sharing his experience building Pandas & Arrow, plus his surprising past in speedrun communities.

Tune in for his story at thetestset.co, on Spotify, or Apple Podcasts

#DataStack #DataEngineering #OpenSource

**LLMs** @LLMs@activitypub.awakari.com · 5d

LLMs @LLMs@activitypub.awakari.com

AI and the Accidental Dox: A Holiday Conversation That Raised Alarm bells Continue reading on Medium »

#chatgpt #generative-ai-tools #artificial-intelligence #education #data-engineering

Origin | Interest | Match

Medium · 5dAI and the Accidental Dox: A Holiday Conversation That Raised Alarm bellsBy Sultan M.Quresh

#generativeaitools #artificialintelligence #dataengineering

**Thewatch** @thewatchsaysello99@mastodon.social · 5d

Thewatch @thewatchsaysello99@mastodon.social

I asked my ETL job how life was going.
#etl #dataengineering #data #sql #meme

Jul 19

Jul 19

2rZiKKbOU3nTafniR2qMMSE0gwZ @2rZiKKbOU3nTafniR2qMMSE0gwZ@activitypub.awakari.com

PixelSink: Hunt Hidden Data Inside Images Upload an image. Could it be quietly leaking GPS location, device fingerprints, or even hidden payloads? PixelSink is a lightweight web app that in...

#cybersecurity #flask #dataengineering #datascience

Origin | Interest | Match

DEV Community🖼️ PixelSink: Hunt Hidden Data Inside ImagesUpload an image. Could it be quietly leaking GPS location, device fingerprints, or even hidden...

**Hacker News** @h4ckernews@mastodon.social · Jul 18

Jul 18

Hacker News @h4ckernews@mastodon.social

How to Get Foreign Keys Horribly Wrong

https://hakibenita.com/django-foreign-keys

hakibenita.comHow to Get Foreign Keys Horribly WrongCommon Pitfalls and Potential Optimizations in Django

#HackerNews #How #to

**Semantic-Search** @Semantic-Search@activitypub.awakari.com · Jul 18

Jul 18

Semantic-Search @Semantic-Search@activitypub.awakari.com

SQL Server 2025 - What’s New and How to Visualize the Schema What's New in SQL Server 2025 SQL Server 2025 brings several important updates that make databases smarter, faster, and more secur...

#sqlserver #database #sql #dataengineering

Origin | Interest | Match

DEV CommunitySQL Server 2025 - What’s New and How to Visualize the SchemaWhat's New in SQL Server 2025 SQL Server 2025 brings several important updates that make...

**Python Job Support** @pythonjobsupport@mastodon.social · Jul 18

Jul 18

Python Job Support @pythonjobsupport@mastodon.social

3 Ways to use Apache Kafka in a Real-time Data Stack – #dataengineering #streaming #kafka #shorts

Join this channel to get access to perks: – – – Book a ... source

https://quadexcel.com/wp/3-ways-to-use-apache-kafka-in-a-real-time-data-stack-dataengineering-streaming-kafka-shorts/

QuadExcel.com · Jul 183 Ways to use Apache Kafka in a Real-time Data Stack - #dataengineering #streaming #kafka #shorts - QuadExcel.comJoin this channel to get access to perks: – – – Book a ... source

**HackerNoon** @hackernoon@mas.to · Jul 17

Jul 17

HackerNoon @hackernoon@mas.to

Discover how CocoIndex transforms data orchestration with a pure Data Flow Programming model — ensuring traceable, immutable, and declarative pipelines for know https://hackernoon.com/redefining-data-operations-with-data-flow-programming-in-cocoindex-u486ao8 #dataengineering

hackernoon.comRedefining Data Operations With Data Flow Programming in CocoIndex | HackerNoonDiscover how CocoIndex transforms data orchestration with a pure Data Flow Programming model — ensuring traceable, immutable, and declarative pipelines for know

**pipTrends** @piptrends@mastodon.social · Jul 16

Jul 16

pipTrends @piptrends@mastodon.social

@hynek released another great video on uv, where he explained how he uses the just tool to store commands in a cross‑platform, portable way for everyday tasks like installing or refreshing virtual environments, running tests and code checks and even development tasks like sending requests.

https://www.youtube.com/watch?v=TiBIjouDGuI

YouTubeuv: Making Python Local Workflows FAST and BORING in 2025By Hynek Schlawack

#python #Programming #PythonProgramming

**Posit** @Posit@fosstodon.org · Jul 15

Jul 15

Posit @Posit@fosstodon.org

Ever wonder about the mind behind Pandas & Apache Arrow? Ep. 2 of #TheTestSet (Part 1!) unpacks Wes McKinney's journey – including his speedrunning past! What makes good tools good?

Listen at https://thetestset.co, on Spotify, or Apple Podcasts

#DataStack #DataEngineering #Pandas

**Will Hopkins** @willhopkins@a2mi.social · Jul 15

Jul 15

Will Hopkins @willhopkins@a2mi.social

#dataengineering If you needed to use a data lake with Redshift, would you use Iceberg, given some native support, over Delta Lake, which is arguably a better format?

Asking for a friend who is me

**blaze.email** @blazeemail@mastodon.social · Jul 15

Jul 15

blaze.email @blazeemail@mastodon.social

Excited about AXLearn for modular ML training, Pinterest's Moka for massive data processing, and PromiseTune for causal configuration tuning! #MachineLearning #DataEngineering

https://blaze.email/Machine-Learning-Engineer

blaze.email · Jul 15Machine Learning EngineerBy blaze.email Team

**HackerNews VN bot** @hackernews_bot_vn@mastodon.maobui.com · Jul 14

Jul 14

HackerNews VN bot @hackernews_bot_vn@mastodon.maobui.com

Tin tức công nghệ mới! Apache Parquet đang phát triển một tính năng đột phá cho phép nhúng các chỉ mục (indexes) do người dùng tự định nghĩa trực tiếp vào các file Parquet. Điều này hứa hẹn sẽ tối ưu hóa đáng kể hiệu suất truy vấn dữ liệu, giúp việc xử lý dữ liệu lớn trở nên nhanh chóng và hiệu quả hơn.

#ApacheParquet #DataEngineering #BigData #Indexes #DataFusion #CôngNghệDữLiệu #DữLiệuLớn #TốiƯuHiệuNăng

https://datafusion.apache.org/blog/2025/07/14/user-defined-parquet-indexes/

datafusion.apache.orgEmbedding User-Defined Indexes in Apache Parquet Files - Apache DataFusion Blog

**Ronan** @ronan@mastodon.ronandev.ovh · Jul 11 *

Jul 11 *

Ronan @ronan@mastodon.ronandev.ovh

https://www.reddit.com/r/dataengineering/comments/1lvyzbc/vibe_citizen_developers_bringing_our/

#VibeCoding #SQL #DataEngineering

**LLMs** @LLMs@activitypub.awakari.com · Jul 10

Jul 10

LLMs @LLMs@activitypub.awakari.com

Empowering Enterprise LLMs: Why Retrieval Augmented Generation is a Game Changer Introduction: The LLM Hype vs. Enterprise Reality Continue reading on Medium »

#llm-agent #data-engineering #machine-learning #ai

Origin | Interest | Match

Medium · Jul 10Empowering Enterprise LLMs: Why Retrieval Augmented Generation is a Game ChangerBy Amit Gangane

#llmagent #dataengineering #machinelearning

**PostgreSQL** @PostgreSQL@activitypub.awakari.com · Jul 9

Jul 9

PostgreSQL @PostgreSQL@activitypub.awakari.com

Big Data Fundamentals: delta lake project Delta Lake: A Production Deep Dive Introduction The relentless growth of data volume and velocity presents a significant engineering challenge: building re...

#bigdata #dataengineering #data #deltalakeproject

Origin | Interest | Match

DEV CommunityBig Data Fundamentals: delta lake projectDelta Lake: A Production Deep Dive Introduction The relentless growth of data...

**Kubernetes** @Kubernetes@activitypub.awakari.com · Jul 7

Jul 7

Kubernetes @Kubernetes@activitypub.awakari.com

How to Cut Data Pipeline Costs by 75% with Kubernetes Spot Instances Data teams have embraced Kubernetes for many reasons. It provides dynamic resource allocation for workloads that swing from ligh...

#data-engineering #data-science #data-platforms #machine-learning #kubernetes

Origin | Interest | Match

The Prefect Blog · Jul 7How to Cut Data Pipeline Costs by 75% with Kubernetes Spot InstancesBy Christopher White

#dataengineering #datascience #dataplatforms

Recent searches

Search options

Administered by:

Server stats:

#dataengineering