mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

12K
active users

#spark

4 posts3 participants1 post today

Tech Addicts 2025 – Mobile Meat

Our last show until June. Gareth and Ted chat about Mobile World Congress 2025. Folding phones from Samsung, Nxtpaper have a new tablet, Lenovo charging ahead with solar, Xiaomi flagship fails to impress, Infinix go tri-fold and loads mor

garethmyles.com/tech-addicts-2

#Podcast #TechAddicts #2025 #3dPrintedMeat #anker #creative #Infinix #Lenovo #meat #mecheer #MobileWorldCongress #MWC #Nxtpaper #Samsung #spark #tecno #Xiaomi

An analysis of 100 Fortune 500 job postings reveals the tools and technologies shaping the data engineering field in 2025. Top skills in demand:
⁕ Programming Languages (196) - SQL (85), Python (76), Scala (14), Java (14)
⁕ ETL and Data Pipeline (136) - ETL (65), Data Integration (46)
⁕ Cloud Platforms (85) - AWS (45), GCP (26), Azure (14)
⁕ Data Modeling and Warehousing (83) - Data Modeling (40), Data Warehousing (22), Data Architecture (21)
⁕ Big Data Tools (67) - Spark (40), Big Data Tools (19), Hadoop (8)
⁕ DevOps, Version Control, and CI/CD (52) - Git (14), CI/CD (13), DevOps (7), Version Control (6), Terraform (6)
...

#DataEngineering #BigData #SQL #Python #ETL #AWS #CloudComputing #Spark #DataModeling #DataWarehouse #DevOps #DataGovernance #DataVisualization #MachineLearning #API #Scala #Java #GCP #Azure #Hadoop #Git #CICD #Terraform #DataQuality #Tableau #PowerBI #Collaboration #Microservices #MLOps #TechSkills

reddit.com/r/dataengineering/c

Как не утонуть в данных: выбираем между DWH, Data Lake и Lakehouse

Привет, Хабр! Меня зовут Алексей Струченко, я работаю архитектором информационных систем в Arenadata. Сегодня хотелось бы поговорить о хранилищах данных — их видах, ключевых особенностях и о том, как выбрать подходящее решение. В эпоху цифровой трансформации данные стали одним из самых ценных активов для компаний любого масштаба и сферы деятельности. Эффективное хранение, обработка и анализ больших объёмов данных помогают организациям принимать обоснованные решения, повышать операционную эффективность и создавать конкурентные преимущества. Однако с ростом объёмов данных и усложнением их структуры традиционные методы хранения сталкиваются с ограничениями. В этой статье мы подробно рассмотрим подходы к хранению данных: Data Warehouse (DWH) , Data Lake и относительно новую концепцию Lakehouse . Разберем их особенности, различия, преимущества и недостатки, а также предложим рекомендации по выбору каждого подхода. Всплыть

habr.com/ru/companies/arenadat

ХабрКак не утонуть в данных: выбираем между DWH, Data Lake и LakehouseПривет, Хабр! Меня зовут Алексей Струченко, я работаю архитектором информационных систем в Arenadata. Сегодня хотелось бы поговорить о хранилищах данных — их видах, ключевых особенностях и о...

#DuckDB (and a tonne of RAM) have absolutely saved my behind these last few months while dealing with huge biological datasets.

If you do any #data munging at all on a daily basis, well worth picking up DuckDB. Don't let the DB part fool you, it's more like #Dask or #Spark but #SQL .

duckdb.org/

DuckDBAn in-process SQL OLAP database management systemDuckDB is an in-process SQL OLAP database management system. Simple, feature-rich, fast & open source.

I know this is a long shot but does anybody know how to set a "secondary role" (or activate all secondary roles) in #Snowflake via the #Spark connector? I'm going to note that a lot of things that seem like they should work don't, so I'd be grateful for ideas from folks who are in a position to actually test this.

📢 New call for proposals: #Spark enables researchers from all disciplines to test or develop novel and unconventional scientific approaches, methods, theories or ideas within a short time.
🗓️ Submission deadline: 4 March 2025.
➡️snf.ch/en/CVNR0Q5f3P32Cg9f/new

Swiss National Science Foundation (SNSF)Spark: call for proposalsThe Swiss National Science Foundation (SNSF) funds excellent research at universities and other institutions.

Just caught up with the recent Delta Lake webinar,

> Revolutionizing Delta Lake workflows on AWS Lambda with Polars, DuckDB, Daft & Rust

Some interesting hints there regarding lightweight processing of big-ish data. Easy to relate to any other framework instead of Lambda, e.g. #ApacheAirflow tasks

youtu.be/BR9oFD0QMAs

youtu.be- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

This is a customer-facing role, so if that's not your thing, keep scrolling.

TLDR: If you know Hadoop and live close enough to Belfast to commute, you should apply.

I've posted this before, but it's been a little while #fedihire. Also, adding some additional information this time. This is my team. We are already on three continents and 6 timezones, but #Belfast is a new location for the team. I know literally nothing about the office.

I know a lot of places Hadoop is the past, and sure we see a ton of #Spark (I do not understand why that is not listed in the job description but maybe because they want to emphasis that we need hadoop expertise?). You can see all the projects we support at openlogic.com/supported-techno

It depends on how you count, as I was on two teams during tradition, but I've been on this team for over 5 years now. It's a great team. I've been with the company now right at 7 years. I cannot say how we compare to Belfast employers but this is well more than double where I have stayed at any other employer (even if you count UNC-CH as a single employer rather than the different departments, I've beat them by well over a year at this point).

My manager has been on this team for almost 15 years. His manager has been with this team for almost as long as me, but with the company much longer. His manager has been here almost as long as me (I actually did orientation with him). His manager is a her and she's been here almost as long as me. So, obviously, this is a place where people want to stay!

Our team has a lot of testosterone, but when I started, our CEO was a woman. The GM for the division is a woman.

My manager is black. The manager of our sister team is black.

I think you'll find our team and company is concerned about your work product and not how you dress, what bathroom you use, or the color of your skin.

If you take a look at our careers page, you'll see this:

Work Should Be Fun
There’s always something to look forward to as a Perforce employee: scavenger hunts, community lunches, summer events, virtual games, and year-end celebrations just to name a few.

We take that shit seriously. Nauseatingly so sometimes, lol.

Actually, we take everything on the careers page seriously, but I know from experience that some places treat support like they are a shoe sole to be worn down. Not so here. It's not all rainbows and sunshine, of course. The whole point is that the customer is having an issue! Our customers treat us with respect because management demands that they do.

------

The Director of Product Development at Perforce is searching for a Enterprise Architect (#BigData Solutions) to join the team. We are looking for an individual who loves data solutions, views technology as a lifestyle, and has a passion for open source software. In this position, you’ll get hands on experience building, configuring, deploying, and troubleshooting our big data solutions, and you’ll contribute to our most strategic product offerings.

At OpenLogic we do #opensource right, and our people make it happen. We provide the technical expertise required for maintaining healthy implementations of hundreds of integrated open source software packages. If your skills meet any of the specs below, now is the time to apply to be a part of our passionate team.
Responsibilities:

Troubleshoot and conduct root cause analysis on enterprise scale big data systems operated by third-party clients. Assisting them in resolving complex issues in mission critical environments.
Install, configure, validate, and monitor a bundle of open source packages that deliver a cohesive world class big data solution.
Evaluate existing Big Data systems operated by third-party clients and identify areas for improvement.
Administer automation for provisioning and updating our big data distribution.

Requirements:

Demonstrable proficiency in #Linux command-line essentials
Strong #SQL and #NoSQL background required
Demonstrable experience designing or testing disaster recovery plans, including backup and recovery
Must have a firm understanding of the #Hadoop ecosystem, including the various open source packages that contribute to a broader solution, as well as an appreciation for the turmoil and turf wars among vendors in the space
Must understand the unique use cases and requirements for platform specific deployments, including on-premises vs cloud vs hybrid, as well as bare metal vs virtualization
Demonstrable experience in one or more cloud-based technologies (AWS or Azure preferred)
Experience with #virtualization and #containerization at scale
Experience creating architectural blueprints and best practices for Hadoop implementations
Some programming experience required
#Database administration experience very desirable
Experience working in enterprise/carrier production environments
Understanding of #DevOps and automation concepts
#Ansible playbook development very desirable
Experience with #Git-based version control
Be flexible and willing to support occasional after-hours and weekend work
Experience working with a geographically dispersed virtual team

jobs.lever.co/perforce/479dfdd

www.openlogic.comSupported Open Source Technologies | OpenLogicOpenLogic by Perforce supports hundreds of open source technologies. Search this list to see which packages we most often support for customers.