Mastodon

HGPU groupPerformant Unified GPU Kernels for Portable Singular Value Computation Across Hardware and Precision<a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://mast.hpc.social/tags/SYCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#SYCL</a> <a href="https://mast.hpc.social/tags/HIP" class="mention hashtag" rel="nofollow noopener" target="_blank">#HIP</a> <a href="https://mast.hpc.social/tags/Kokkos" class="mention hashtag" rel="nofollow noopener" target="_blank">#Kokkos</a> <a href="https://mast.hpc.social/tags/Julia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Julia</a><a href="https://hgpu.org/?p=30096" rel="nofollow noopener" translate="no" target="_blank">https://hgpu.org/?p=30096</a>

@reiver ⊼ (Charles) :batman:(programming) Golang & OpenGL

LLMsConTraPh: Contrastive Learning for Parallelization and Performance Optimization With the advancement of HPC platforms, the demand for high-performing applications continues to grow. One effective w... <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/Computer" target="_blank">#Computer</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/science" target="_blank">#science</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/OpenCL" target="_blank">#OpenCL</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/paper" target="_blank">#paper</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/Code" target="_blank">#Code</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/generation" target="_blank">#generation</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/Heterogeneous" target="_blank">#Heterogeneous</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/systems" target="_blank">#systems</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/HPC" target="_blank">#HPC</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/LLM" target="_blank">#LLM</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/nVidia" target="_blank">#nVidia</a> <a href="https://hgpu.org/?p=30084" rel="nofollow noopener" target="_blank">Origin</a> | <a href="https://awakari.com/sub-details.html?id=LLMs" rel="nofollow noopener" target="_blank">Interest</a> | <a href="https://awakari.com/pub-msg.html?id=1QXUjmnGWJOcg97Pjiqo6zYBZBY&interestId=LLMs" rel="nofollow noopener" target="_blank">Match</a>

HGPU groupConTraPh: Contrastive Learning for Parallelization and Performance Optimization<a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://mast.hpc.social/tags/OpenACC" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenACC</a> <a href="https://mast.hpc.social/tags/OpenMP" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenMP</a> <a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a> <a href="https://mast.hpc.social/tags/LLM" class="mention hashtag" rel="nofollow noopener" target="_blank">#LLM</a> <a href="https://mast.hpc.social/tags/CodeGeneration" class="mention hashtag" rel="nofollow noopener" target="_blank">#CodeGeneration</a><a href="https://hgpu.org/?p=30084" rel="nofollow noopener" translate="no" target="_blank">https://hgpu.org/?p=30084</a>

HabrУчимся разрабатывать для GPU на примере операции GEMMПривет, Хабр! Сегодня я расскажу про реализацию матричного умножения и особенности разработки для GPU. Познакомлю вас с устройством GPU, объясню, чем отличается программирование от привычного для CPU, какие нюансы нужно учитывать для эффективной реализации операций GEMM. А затем сравним производительность разных подходов к реализации.<a href="https://habr.com/ru/companies/yadro/articles/934878/" rel="nofollow noopener" translate="no" target="_blank">https://habr.com/ru/companies/yadro/articles/934878/</a><a href="https://zhub.link/tags/gpu_%D0%B2%D1%8B%D1%87%D0%B8%D1%81%D0%BB%D0%B5%D0%BD%D0%B8%D1%8F" class="mention hashtag" rel="nofollow noopener" target="_blank">#gpu_вычисления</a> <a href="https://zhub.link/tags/opencl" class="mention hashtag" rel="nofollow noopener" target="_blank">#opencl</a> <a href="https://zhub.link/tags/gemm" class="mention hashtag" rel="nofollow noopener" target="_blank">#gemm</a>

Martin Boller :debian: :tux: :freebsd: :windows: :mastodon:Short write-up on running Hashcat 7 (or older) with OpenCL on CPUs and/or using the Nouveau FOSS driver for NVIDIA cards. <a href="http://www.infosecworrier.dk/blog/2025/08/opencl/" rel="nofollow noopener" target="_blank">www.infosecworrier.dk/blog/2025/08/opencl/</a>All the good stuff is from <a href="https://infosec.exchange/@tychotithonus" class="u-url mention" rel="nofollow noopener" target="_blank">@tychotithonus</a> original post. The rest is just me standing on his shoulders.<a href="https://infosec.exchange/tags/Hashcat" class="mention hashtag" rel="nofollow noopener" target="_blank">#Hashcat</a> <a href="https://infosec.exchange/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://infosec.exchange/tags/Nouveau" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nouveau</a> <a href="https://infosec.exchange/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#Linux</a> <a href="https://infosec.exchange/tags/ABC" class="mention hashtag" rel="nofollow noopener" target="_blank">#ABC</a> <a href="https://infosec.exchange/tags/AlwaysBeCracking" class="mention hashtag" rel="nofollow noopener" target="_blank">#AlwaysBeCracking</a> <a href="https://infosec.exchange/tags/NVIDIA" class="mention hashtag" rel="nofollow noopener" target="_blank">#NVIDIA</a> <a href="https://infosec.exchange/tags/Legacy" class="mention hashtag" rel="nofollow noopener" target="_blank">#Legacy</a>

Dantali0n :arch: :i3:One of the major problems with <a href="https://fosstodon.org/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> is that its kernel placement in global and local space is more a suggestion how to place the execution amongst compute units and threads then a hard requirement.But for most vendors you can still expect poor performance if your local range is smaller then the wavefront (warp) size of your architecture.<a href="https://fosstodon.org/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>

Dr. Moritz LehmannTurns out the <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> __builtin_amdgcn_sdot4 intrinsic for dp4a on AMD <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>s is only supported up to RDNA2. RDNA3+ needs another intrinsic, __builtin_amdgcn_sudot4 🖖🤯 My OpenCL-Benchmark now supports both: <a href="https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/kernel.cpp#L6-L20" rel="nofollow noopener" translate="no" target="_blank">https://github.com/ProjectPhysX/OpenCL-Benchmark/blob/master/src/kernel.cpp#L6-L20</a> <a href="https://github.com/llvm/llvm-project/blob/c1968fee972859dfd03a7e698422e18a5bc1d478/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L3213" rel="nofollow noopener" translate="no" target="_blank">https://github.com/llvm/llvm-project/blob/c1968fee972859dfd03a7e698422e18a5bc1d478/llvm/include/llvm/IR/IntrinsicsAMDGPU.td#L3213</a>

Neustradamus :xmpp: :linux:<a href="https://mastodon.social/tags/Mesa" class="mention hashtag" rel="nofollow noopener" target="_blank">#Mesa</a> 25.1.5 has been released (<a href="https://mastodon.social/tags/Mesa3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#Mesa3D</a> / <a href="https://mastodon.social/tags/Mesa3DGraphicsLibrary" class="mention hashtag" rel="nofollow noopener" target="_blank">#Mesa3DGraphicsLibrary</a> / <a href="https://mastodon.social/tags/GraphicsLibrary" class="mention hashtag" rel="nofollow noopener" target="_blank">#GraphicsLibrary</a> / <a href="https://mastodon.social/tags/OpenGL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenGL</a> / <a href="https://mastodon.social/tags/EGL" class="mention hashtag" rel="nofollow noopener" target="_blank">#EGL</a> / <a href="https://mastodon.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> / <a href="https://mastodon.social/tags/Vulkan" class="mention hashtag" rel="nofollow noopener" target="_blank">#Vulkan</a> / <a href="https://mastodon.social/tags/Gallium3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#Gallium3D</a>) <a href="https://mesa3d.org/" rel="nofollow noopener" translate="no" target="_blank">https://mesa3d.org/</a>

Khronos GroupOpenCL v3.0.19 maintenance update released with bug fixes & clarifications and adds two new extensions: cl_khr_spirv_queries to simplify querying the SPIR-V capabilities of a device, and cl_khr_external_memory_android_hardware_buffer to more efficiently interoperate with other APIs on Android devices. In addition, the cl_khr_kernel_clock extension to sample a clock within a kernel has been finalized and is no longer an experimental extension. Khronos <a href="https://fosstodon.org/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> Registry: <a href="https://registry.khronos.org/OpenCL/" rel="nofollow noopener" translate="no" target="_blank">https://registry.khronos.org/OpenCL/</a>

Rainer<a href="https://federation.network/@GuettisKnippse" class="u-url mention" rel="nofollow noopener" target="_blank">@GuettisKnippse</a> Unter Einstellungen/Bearbeitung/ <a href="https://social.anoxinon.de/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> aktiviert?

Gamey :thisisfine: :antifa:I want to get <a href="https://chaos.social/tags/davinci_resolve" class="mention hashtag" rel="nofollow noopener" target="_blank">#davinci_resolve</a> working on <a href="https://chaos.social/tags/Fedora" class="mention hashtag" rel="nofollow noopener" target="_blank">#Fedora</a> 42 with my now very old AMD rx480 8GB but it uses <a href="https://chaos.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>. The obvious choice would be <a href="https://chaos.social/tags/rocm" class="mention hashtag" rel="nofollow noopener" target="_blank">#rocm</a> but that dropped support for my GPU years ago and from what I found also causes issues with Davinci resolve for even more years. The other obvious choice would be mesas implementation but while <a href="https://chaos.social/tags/Rusticl" class="mention hashtag" rel="nofollow noopener" target="_blank">#Rusticl</a> improved things it's still not a feature complete implementation and rather slow. Is it smart to use the amdgpu-pro ICD with mesa drivers for this?

रञ्जित (Ranjit Mathew)“Blackwell: Nvidia’s Massive GPU”, Chester Lam, Chips And Cheese (<a href="https://chipsandcheese.com/p/blackwell-nvidias-massive-gpu" rel="nofollow noopener" translate="no" target="_blank">https://chipsandcheese.com/p/blackwell-nvidias-massive-gpu</a>).On HN: <a href="https://news.ycombinator.com/item?id=44409391" rel="nofollow noopener" translate="no" target="_blank">https://news.ycombinator.com/item?id=44409391</a><a href="https://mastodon.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> <a href="https://mastodon.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a> <a href="https://mastodon.social/tags/Blackwell" class="mention hashtag" rel="nofollow noopener" target="_blank">#Blackwell</a> <a href="https://mastodon.social/tags/Hardware" class="mention hashtag" rel="nofollow noopener" target="_blank">#Hardware</a> <a href="https://mastodon.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#HPC</a> <a href="https://mastodon.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>

Dr. Moritz LehmannFinally I can "SLI" AMD+Intel+Nvidia <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>s at home! I simulated this crow in flight at 680M grid cells in 36GB VRAM, pooled together from - 🟥 <a href="https://mast.hpc.social/tags/AMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#AMD</a> Radeon RX 7700 XT 12GB (RDNA3) - 🟦 <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> Arc B580 12GB (Battlemage) - 🟩 <a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> Titan Xp 12GB (Pascal) My <a href="https://mast.hpc.social/tags/FluidX3D" class="mention hashtag" rel="nofollow noopener" target="_blank">#FluidX3D</a> <a href="https://mast.hpc.social/tags/CFD" class="mention hashtag" rel="nofollow noopener" target="_blank">#CFD</a> software can pool the VRAM of any combination of any GPUs together via <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>. <a href="https://mast.hpc.social/tags/Kr%C3%A4henliebe" class="mention hashtag" rel="nofollow noopener" target="_blank">#Krähenliebe</a> <a href="https://mast.hpc.social/tags/birds" class="mention hashtag" rel="nofollow noopener" target="_blank">#birds</a> <a href="https://mast.hpc.social/tags/crow" class="mention hashtag" rel="nofollow noopener" target="_blank">#crow</a> <a href="https://www.youtube.com/watch?v=1z5-ddsmAag" rel="nofollow noopener" translate="no" target="_blank">https://www.youtube.com/watch?v=1z5-ddsmAag</a>

John-Mark GurneyAs usual, getting something like GPU compute that's cross platform working is a message because everyone likes to do their own thing and reinvent the wheel.I would like something that is [modern] macOS and FreeBSD compatible, but doesn't look like that's possible since Apple deprecated OpenCL.(Also, could Apple have picked a less searchable term for their new GPU framework?)It's again looking like the best way to be cross platform is to use JS+browser.Or am I missing some library?<a href="https://flyovercountry.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> <a href="https://flyovercountry.social/tags/GPUCompute" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPUCompute</a> <a href="https://flyovercountry.social/tags/FreeBSD" class="mention hashtag" rel="nofollow noopener" target="_blank">#FreeBSD</a>

karolherbst 🐧 🦀Who is using CL_sRGBA images with <a href="https://chaos.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a>, specifically to write to it (cl_khr_srgb_image_writes)?There is limited hw support for writing to sRGBA images and I'm now curious what even uses that feature.It was apparently important enough to require support for it for OpenCL 2.0, but... that's not telling me much.

Dr. Moritz LehmannIs it possible to run AMD+Intel+Nvidia <a href="https://mast.hpc.social/tags/GPU" class="mention hashtag" rel="nofollow noopener" target="_blank">#GPU</a>s in the same PC? Yes! 🖖😋 Got this RDNA3 chonker for free from 11 bit studios contest! It completes my 36GB VRAM RGB SLI abomination setup: - 🟥 <a href="https://mast.hpc.social/tags/AMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#AMD</a> Radeon RX 7700 XT 12GB - 🟦 <a href="https://mast.hpc.social/tags/Intel" class="mention hashtag" rel="nofollow noopener" target="_blank">#Intel</a> Arc B580 12GB - 🟩 <a href="https://mast.hpc.social/tags/Nvidia" class="mention hashtag" rel="nofollow noopener" target="_blank">#Nvidia</a> Titan Xp 12GB The drivers all work together in <a href="https://mast.hpc.social/tags/Linux" class="mention hashtag" rel="nofollow noopener" target="_blank">#Linux</a> Ubuntu 24.04.2. Backbone is an ASUS ProArt Z790 with i7-13700K and 64GB, PCIe 4.0 x8/x8 + 3.0 x4 - plenty interconnect bandwidth. Finally I can develop and test <a href="https://mast.hpc.social/tags/OpenCL" class="mention hashtag" rel="nofollow noopener" target="_blank">#OpenCL</a> on all major patforms!

txt.fileToday’s hate about computers and software

LLMsCASS: Nvidia to AMD Transpilation with Data, Models, and Benchmark We introduce CASS, the first l... <a href="https://hgpu.org/?p=29913" rel="nofollow noopener" target="_blank">https://hgpu.org/?p=29913</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/Computer" target="_blank">#Computer</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/science" target="_blank">#science</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/CUDA" target="_blank">#CUDA</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/OpenCL" target="_blank">#OpenCL</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/paper" target="_blank">#paper</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/AI" target="_blank">#AI</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/AMD" target="_blank">#AMD</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/Radeon" target="_blank">#Radeon</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/RX" target="_blank">#RX</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/7900" target="_blank">#7900</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/XT" target="_blank">#XT</a> <a href="https://awakari.com/pub-msg.html?id=BOlq25XcQ0BBmvhFBBHMoaU3P7Y&interestId=LLMs" rel="nofollow noopener" target="_blank">Result Details</a>

Recent searches

Search options

Administered by:

Server stats:

#opencl