Thesis: Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing

Thesis: Hardware-Assisted Software Testing and Debugging for Heterogeneous Computing
Even now, Thrust as a dependency is one of the main reason why we have a #CUDA backend, a #HIP / #ROCm backend and a pure #CPU backend in #GPUSPH, but not a #SYCL or #OneAPI backend (which would allow us to extend hardware support to #Intel GPUs). <https://doi.org/10.1002/cpe.8313>
This is also one of the reason why we implemented our own #BLAS routines when we introduced the semi-implicit integrator. A side-effect of this choice is that it allowed us to develop the improved #BiCGSTAB that I've had the opportunity to mention before <https://doi.org/10.1016/j.jcp.2022.111413>. Sometimes I do wonder if it would be appropriate to “excorporate” it into its own library for general use, since it's something that would benefit others. OTOH, this one was developed specifically for GPUSPH and it's tightly integrated with the rest of it (including its support for multi-GPU), and refactoring to turn it into a library like cuBLAS is
a. too much effort
b. probably not worth it.
Again, following @eniko's original thread, it's really not that hard to roll your own, and probably less time consuming than trying to wrangle your way through an API that may or may not fit your needs.
6/
I'm getting the material ready for my upcoming #GPGPU course that starts on March. Even though I most probably won't get to it,I also checked my trivial #SYCL programs. Apparently the 2025.0 version of the #Intel #OneAPI #DPCPP runtime doesn't like any #OpenCL platform except Intel's own (I have two other platforms that support #SPIRV, so why aren't they showing up? From the documentation I can find online this should be sufficient, but apparently it's not …)
Exploring data flow design and vectorization with oneAPI for streaming applications on CPU+GPU
Just how deep is #Nvidia's #CUDA moat really?
Not as impenetrable as you might think, but still more than Intel or AMD would like
It's not enough just to build a competitive part: you also have to have #software that can harness all those #FLOPS — something Nvidia has spent the better part of two decades building with its CUDA runtime, while competing frameworks for low-level #GPU #programming are far less mature like AMD's #ROCm or Intel's #OneAPI.
https://www.theregister.com/2024/12/17/nvidia_cuda_moat/ #developers
Howdy all - registrations are still open for the first oneAPI DevSummit hosted by the UXL Foundation! Learn about GPGPU programming, oneAPI and how companies are coalescing around #oneapi / #sycl
https://linuxfoundation.regfox.com/oneapiuxldevsummit2024
Registration will closeat 5pm today. The DevSummit will start at 8pm PT or 8:30am IST. See you there!
Intel(R) SHMEM: GPU-initiated OpenSHMEM using SYCL
Introduction to #oneAPI, #SYCL2020 & #OpenMP offloading
September 23-25, 2024
In this 3-day online course, HLRS - High-Performance Computing Center Stuttgart provides an introduction to Intel Corporation's oneAPI implementation
Read more & Register https://www.hlrs.de/training/2024/intel-oneapi
Just one more day to submit your session for the UXL oneAPI DevSummit being held October 9th & 10th!
Learn more: https://sessionize.com/uxldevsummit
#SYCL #oneAPI #UXL
Evaluating Operators in Deep Neural Networks for Improving Performance Portability of SYCL
@pytorch 2.4 upstream now includes a prototype feature supporting Intel GPUs through source build using #SYCL and #oneDNN as well as a backend integrated to inductor on top of Triton - enabling a path for millions and millions of GPUs through #oneAPI for #AI.
Lots of important milestones to make this happen - including support for #UXL Foundation open AI technologies. Just a prototype, but a big step forward... thanks to all in the PyTorch community. Feedback welcome!
Assessing Intel OneAPI capabilities and cloud-performance for heterogeneous computing
Can we run TornadoVM applications on CPUs and take advantage of all CPU cores? The answer is YES. All you need is an OpenCL implementation that can run on your CPU. In this video, I will show you how you can configure TornadoVM to run on such systems using the Intel oneAPI base toolkit for Intel CPUs, and even FPGAs.
SYCL-Bench 2020: Benchmarking SYCL 2020 on AMD, Intel, and NVIDIA GPUs
Assessing opportunities of SYCL for biological sequence alignment on GPU-based systems
#SYCL #CUDA #oneAPI #Bioinformatics #Biology #SequenceAlignment #Package
Out of kernel tuning and optimizations for portable large-scale docking experiments on GPUs