#CSOs - Mastodon

2 posts1 participant2 posts today

LLMs @LLMs@activitypub.awakari.com

#cs.OS #cs.DC

Origin | Interest | Match

arXiv.orgTowards Efficient and Practical GPU Multitasking in the Era of LLMGPU singletasking is becoming increasingly inefficient and unsustainable as hardware capabilities grow and workloads diversify. We are now at an inflection point where GPUs must embrace multitasking, much like CPUs did decades ago, to meet the demands of modern AI workloads. In this work, we highlight the key requirements for GPU multitasking, examine prior efforts, and discuss why they fall short. To advance toward efficient and practical GPU multitasking, we envision a resource management layer, analogous to a CPU operating system, to handle various aspects of GPU resource management and sharing. We outline the challenges and potential solutions, and hope this paper inspires broader community efforts to build the next-generation GPU compute paradigm grounded in multitasking.

LLMs @LLMs@activitypub.awakari.com

#cs.CR #cs.LG #cs.OS

Origin | Interest | Match

arXiv.orgSelective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM InferenceGlobal KV-cache sharing has emerged as a key optimization for accelerating large language model (LLM) inference. However, it exposes a new class of timing side-channel attacks, enabling adversaries to infer sensitive user inputs via shared cache entries. Existing defenses, such as per-user isolation, eliminate leakage but degrade performance by up to 38.9% in time-to-first-token (TTFT), making them impractical for high-throughput deployment. To address this gap, we introduce SafeKV (Secure and Flexible KV Cache Sharing), a privacy-aware KV-cache management framework that selectively shares non-sensitive entries while confining sensitive content to private caches. SafeKV comprises three components: (i) a hybrid, multi-tier detection pipeline that integrates rule-based pattern matching, a general-purpose privacy detector, and context-aware validation; (ii) a unified radix-tree index that manages public and private entries across heterogeneous memory tiers (HBM, DRAM, SSD); and (iii) entropy-based access monitoring to detect and mitigate residual information leakage. Our evaluation shows that SafeKV mitigates 94% - 97% of timing-based side-channel attacks. Compared to per-user isolation method, SafeKV improves TTFT by up to 40.58% and throughput by up to 2.66X across diverse LLMs and workloads. SafeKV reduces cache-induced TTFT overhead from 50.41% to 11.74% on Qwen3-235B. By combining fine-grained privacy control with high cache reuse efficiency, SafeKV reclaims the performance advantages of global sharing while providing robust runtime privacy guarantees for LLM inference.

#cscr #cslg #CSOs

2rZiKKbOU3nTafniR2qMMSE0gwZ @2rZiKKbOU3nTafniR2qMMSE0gwZ@activitypub.awakari.com

#cs.CR #cs.CY #cs.NI #cs.OS #cs.PL

Origin | Interest | Match

arXiv.orgUnder the Hood of BlotchyQuasar: DLL-Based RAT Campaigns Against Latin AmericaA sophisticated malspam campaign was recently uncovered targeting Latin American countries, with a particular focus on Brazil. This operation utilizes a highly deceptive phishing email to trick users into executing a malicious MSI file, initiating a multi-stage infection. The core of the attack leverages DLL side-loading, where a legitimate executable from Valve Corporation is used to load a trojanized DLL, thereby bypassing standard security defenses. Once active, the malware, a variant of QuasarRAT known as BlotchyQuasar, is capable of a wide range of malicious activities. It is designed to steal sensitive browser-stored credentials and banking information, the latter through fake login windows mimicking well-known Brazilian banks. The threat establishes persistence by modifying the Windows registry , captures user keystrokes through keylogging , and exfiltrates stolen data to a Command-and-Control (C2) server using encrypted payloads. Despite its advanced capabilities, the malware code exhibits signs of rushed development, with inefficiencies and poor error handling that suggest the threat actors prioritized rapid deployment over meticulous design. Nonetheless, the campaign extensive reach and sophisticated mechanisms pose a serious and immediate threat to the targeted regions, underscoring the need for robust cybersecurity defenses.

#cscr #cscy #csni

LLMs @LLMs@activitypub.awakari.com

#cs.OS #cs.CR

Origin | Interest | Match

arXiv.orgBreaking the Boundaries of Long-Context LLM Inference: Adaptive KV Management on a Single Commodity GPUAdvanced Large Language Models (LLMs) have achieved impressive performance across a wide range of complex and long-context natural language tasks. However, performing long-context LLM inference locally on a commodity GPU (a PC) with privacy concerns remains challenging due to the increasing memory demands of the key-value (KV) cache. Existing systems typically identify important tokens and selectively offload their KV data to GPU and CPU memory. The KV data needs to be offloaded to disk due to the limited memory on a commodity GPU, but the process is bottlenecked by token importance evaluation overhead and the disk's low bandwidth. In this paper, we present LeoAM, the first efficient importance-aware long-context LLM inference system for a single commodity GPU with adaptive hierarchical GPU-CPU-Disk KV management. Our system employs an adaptive KV management strategy that partitions KV data into variable-sized chunks based on the skewed distribution of attention weights across different layers to reduce computational and additional transmission overheads. Moreover, we propose a lightweight KV abstract method, which minimizes transmission latency by storing and extracting the KV abstract of each chunk on disk instead of the full KV data. LeoAM also leverages the dynamic compression and pipeline techniques to further accelerate inference. Experimental results demonstrate that LongInfer achieves an average inference latency speedup of 3.46x, while maintaining comparable LLM response quality. In scenarios with larger batch sizes, it achieves up to a 5.47x speedup.

LLMs @LLMs@activitypub.awakari.com

#cs.CL #cs.OS

Origin | Interest | Match

arXiv.orgVeriLocc: End-to-End Cross-Architecture Register Allocation via LLMModern GPUs evolve rapidly, yet production compilers still rely on hand-crafted register allocation heuristics that require substantial re-tuning for each hardware generation. We introduce VeriLocc, a framework that combines large language models (LLMs) with formal compiler techniques to enable generalizable and verifiable register allocation across GPU architectures. VeriLocc fine-tunes an LLM to translate intermediate representations (MIRs) into target-specific register assignments, aided by static analysis for cross-architecture normalization and generalization and a verifier-guided regeneration loop to ensure correctness. Evaluated on matrix multiplication (GEMM) and multi-head attention (MHA), VeriLocc achieves 85-99% single-shot accuracy and near-100% pass@100. Case study shows that VeriLocc discovers more performant assignments than expert-tuned libraries, outperforming rocBLAS by over 10% in runtime.

All for Gardening @afg@vive.im

At London Climate Action Week, Let’s Talk Food Industry Sustainability – Food Tank https://www.allforgardening.com/1334313/at-london-climate-action-week-lets-talk-food-industry-sustainability-food-tank/ #ChiefSustainabilityOfficer #CSOs #EITFood #garden #gardening #GardeningIreland #google #LondonClimateActionWeek #REGENHouse #UNEP #Unilever

linux @linux@activitypub.awakari.com

https://arxiv.org/abs/2506.00384

#cs.LG #cs.DC #cs.OS

Result Details

arXiv.orgDeep-Learning-Driven Prefetching for Far MemoryModern software systems face increasing runtime performance demands, particularly in emerging architectures like far memory, where local-memory misses incur significant latency. While machine learning (ML) has proven effective in offline systems optimization, its application to high-frequency, runtime-level problems remains limited due to strict performance, generalization, and integration constraints. We present FarSight, a Linux-based far-memory system that leverages deep learning (DL) to efficiently perform accurate data prefetching. FarSight separates application semantics from runtime memory layout, allowing offline-trained DL models to predict access patterns using a compact vocabulary of ordinal possibilities, resolved at runtime through lightweight mapping structures. By combining asynchronous inference, lookahead prediction, and a cache-resident DL model, FarSight achieves high prediction accuracy with low runtime overhead. Our evaluation of FarSight on four data-intensive workloads shows that it outperforms the state-of-the-art far-memory system by up to 3.6 times. Overall, this work demonstrates the feasibility and advantages of applying modern ML techniques to complex, performance-critical software runtime problems.

#cslg #csdc #CSOs

linux @linux@activitypub.awakari.com

https://arxiv.org/abs/2505.06546

#cs.OS #cs.RO

Result Details

arXiv.orgWork in Progress: Middleware-Transparent Callback Enforcement in Commoditized Component-Oriented Real-time SystemsReal-time scheduling in commoditized component-oriented real-time systems, such as ROS 2 systems on Linux, has been studied under nested scheduling: OS thread scheduling and middleware layer scheduling (e.g., ROS 2 Executor). However, by establishing a persistent one-to-one correspondence between callbacks and OS threads, we can ignore the middleware layer and directly apply OS scheduling parameters (e.g., scheduling policy, priority, and affinity) to individual callbacks. We propose a middleware model that enables this idea and implements CallbackIsolatedExecutor as a novel ROS 2 Executor. We demonstrate that the costs (user-kernel switches, context switches, and memory usage) of CallbackIsolatedExecutor remain lower than those of the MultiThreadedExecutor, regardless of the number of callbacks. Additionally, the cost of CallbackIsolatedExecutor relative to SingleThreadedExecutor stays within a fixed ratio (1.4x for inter-process and 5x for intra-process communication). Future ROS 2 real-time scheduling research can avoid nested scheduling, ignoring the existence of the middleware layer.

FunctionalProgramming @FunctionalProgramming@activitypub.awakari.com

https://arxiv.org/abs/2505.01603

#cs.DC #cs.OS

Result Details

arXiv.orgUnlocking True Elasticity for the Cloud-Native Era with DandelionElasticity is fundamental to cloud computing, as it enables quickly allocating resources to match the demand of each workload as it arrives, rather than pre-provisioning resources to meet performance objectives. However, even serverless platforms -- which boot sandboxes in 10s to 100s of milliseconds -- are not sufficiently elastic to avoid over-provisioning expensive resources. Today's FaaS platforms rely on pre-provisioning many idle sandboxes in memory to reduce the occurrence of slow, cold starts. A key obstacle for high elasticity is booting a guest OS and configuring features like networking in sandboxes, which are required to expose an isolated POSIX-like interface to user functions. Our key insight is that redesigning the interface for applications in the cloud-native era enables co-designing a much more efficient and elastic execution system. Now is a good time to rethink cloud abstractions as developers are building applications to be cloud-native. Cloud-native applications typically consist of user-provided compute logic interacting with cloud services (for storage, AI inference, query processing, etc) exposed over REST APIs. Hence, we propose Dandelion, an elastic cloud platform with a declarative programming model that expresses applications as DAGs of pure compute functions and higher-level communication functions. Dandelion can securely execute untrusted user compute functions in lightweight sandboxes that cold start in hundreds of microseconds, since pure functions do not rely on extra software environments such as a guest OS. Dandelion makes it practical to boot a sandbox on-demand for each request, decreasing performance variability by two to three orders of magnitude compared to Firecracker and reducing committed memory by 96% on average when running the Azure Functions trace.

Linux-Maintainers @Linux-Maintainers@activitypub.awakari.com

https://arxiv.org/abs/2504.21394

#cs.OS

Result Details

arXiv.orgConcurrency Testing in the Linux Kernel via eBPFConcurrency is vital for our critical software to meet modern performance requirements, yet concurrency bugs are notoriously difficult to detect and reproduce. Controlled Concurrency Testing (CCT) can make bugs easier to expose by enabling control over thread interleavings and systematically exploring the interleaving space through scheduling algorithms. However, existing CCT solutions for kernel code are heavyweight, leading to significant performance, maintainability and extensibility issues. In this work, we introduce LACE, a lightweight CCT framework for kernel code empowered by eBPF. Without hypervisor modification, LACE features a custom scheduler tailored for CCT algorithms to serialize non-determistic thread execution into a controlled ordering. LACE also provides a mechanism to safely inject scheduling points into the kernel for fine-grained control. Furthermore, LACE employs a two-phase mutation strategy to integrate the scheduler with a concurrency fuzzer, allowing for automated exploration of both the input and schedule space. In our evaluation, LACE achieves 38\% more branches, 57\% overhead reduction and 11.4$\times$ speed-up in bug exposure compared to the state-of-the-art kernel concurrency fuzzers. Our qualitative analysis also demonstrates the extensibility and maintainability of LACE. Furthermore, LACE discovers eight previously unknown bugs in the Linux kernel, with six confirmed by developers.

LLMs @LLMs@activitypub.awakari.com

https://arxiv.org/abs/2504.15302

#cs.DC #cs.OS

Result Details

arXiv.orgRAGDoll: Efficient Offloading-based Online RAG System on a Single GPURetrieval-Augmented Generation (RAG) enhances large language model (LLM) generation quality by incorporating relevant external knowledge. However, deploying RAG on consumer-grade platforms is challenging due to limited memory and the increasing scale of both models and knowledge bases. In this work, we introduce RAGDoll, a resource-efficient, self-adaptive RAG serving system integrated with LLMs, specifically designed for resource-constrained platforms. RAGDoll exploits the insight that RAG retrieval and LLM generation impose different computational and memory demands, which in a traditional serial workflow result in substantial idle times and poor resource utilization. Based on this insight, RAGDoll decouples retrieval and generation into parallel pipelines, incorporating joint memory placement and dynamic batch scheduling strategies to optimize resource usage across diverse hardware devices and workloads. Extensive experiments demonstrate that RAGDoll adapts effectively to various hardware configurations and LLM scales, achieving up to 3.6 times speedup in average latency compared to serial RAG systems based on vLLM.

linux @linux@activitypub.awakari.com

https://arxiv.org/abs/2504.13994

#cs.HC #cs.OS #cs.SE

Result Details

arXiv.orgTerminal Lucidity: Envisioning the Future of the TerminalThe Unix terminal, or just simply, the terminal, can be found being applied in almost every facet of computing. It is available across all major platforms and often integrated into other applications. Due to its ubiquity, even marginal improvements to the terminal have the potential to make massive improvements to productivity on a global scale. We believe that evolutionary improvements to the terminal, in its current incarnation as windowed terminal emulator, are possible and that developing a thorough understanding of issues that current terminal users face is fundamental to knowing how the terminal should evolve. In order to develop that understanding we have mined Unix and Linux Stack Exchange using a fully-reproducible method which was able to extract and categorize 91.0% of 1,489 terminal-related questions (from the full set of nearly 240,000 questions) without manual intervention. We present an analysis, to our knowledge the first of its kind, of windowed terminal-related questions posted over a 15-year period and viewed, in aggregate, approximately 40 million times. As expected, given its longevity, we find the terminal's many features being applied across a wide variety of use cases. We find evidence that the terminal, as windowed terminal emulator, has neither fully adapted to its now current graphical environment nor completely untangled itself from features more suited to incarnations in previous environments. We also find evidence of areas where we believe the terminal could be extended along with other areas where it could be simplified. Surprisingly, while many current efforts to improve the terminal include improving the terminal's social and collaborative aspects, we find little evidence of this as a prominent pain point.

#cshc #CSOs #csse

Ilan Kelman @ilankelman@mastodon.green

Disasters Avoided Newsletter #7
Focus on Non-Governmental Organisations (#NGOs) and Civil Society Organisations (#CSOs)
https://mailchi.mp/9e31722a66e3/the-disasters-avoided-newsletter-14190206

#DisastersAvoided

#DRR #DisastersAreNotNatural #NoNaturalDisasters (so we avoid the phrases #NaturalDisaster #NaturalDisasters) #vulnerability #resilience #SendaiFramework #Switch2Sendai #SFDRR #DisasterRisk #DisasterRiskReduction #disaster #disasters

mailchi.mp500: We've Run Into An Issue | Mailchimp

OpenSource @OpenSource@activitypub.awakari.com

https://arxiv.org/abs/2503.23068

#cs.OS

Event Attributes

arXiv.orgLinux for Everyone: Can Standardization Drive Mainstream Adoption?Despite its technical superiority and flexibility, Linux remains a niche OS in the consumer markets. Because fragmentation stems from diverse distributions, it lacks the standardized experience, which discourages mainstream adoption. This foundational paper explores whether a balanced approach to standardization can bridge this gap without compromising Linux's core philosophy of freedom and openness. We analyze historical attempts at unification, such as Flatpak, Wayland, and Snap, identifying reasons for their limited success. Using case studies and statistical insights, we understand how fragmentation affects developers, designers, management users, and gaming users. The paper proposes a standardized yet modular Linux ecosystem ensuring adaptability for new users and flexibility for power users. Rather than giving a technical solution, this paper discusses the feasibility of a unified Linux experience by providing the groundwork for structured standardization. We aim to inspire future research as well for positioning Linux as a viable alternative to Windows and MacOS without sacrificing its open--source nature.

Maati TV @MaatiTV@paktodon.asia

After 2013, the way #Pakistan views NGOs has changed drastically, with many looking at their role with doubt and suspicion of Western influence.

In this conversation, we have been joined by Ms. Sabiha Shaheen, executive director at BARGAD Youth, Mr. Mohammad Waseem, director Interactive Resource Centre, and Ms. Uzma Yaqoob, director FDI Pakistan, to talk about the different roles of #NGOs, Civil Societies, and #CSOs in Pakistan today.

Watch full episode here :https://youtu.be/DBsEoYkur94

youtu.be- YouTubeEnjoy the videos and music you love, upload original content, and share it all with friends, family, and the world on YouTube.

LLMs @LLMs@activitypub.awakari.com

https://arxiv.org/abs/2503.09663

#cs.OS #cs.SE

Event Attributes

arXiv.orgBYOS: Knowledge-driven Large Language Models Bring Your Own Operating System More ExcellentKernel configurations play an important role in the performance of Operating System (OS). However, with the rapid iteration of OS, finding the proper configurations that meet specific requirements can be challenging, which can be primarily attributed to the default kernel provided by vendors does not take the requirements of specific workloads into account, and the heavyweight tuning process cannot catch up with the rapid evolving pace of the kernel. To address these challenges, we propose BYOS, a novel framework powered by Large Language Models (LLMs) to customize kernel configurations for diverse user requirements. By integrating OS-oriented Dual-layer Knowledge Graph (OD-KG) and corresponding reasoning strategy, BYOS enhanced the LLM's understanding of the characteristics and capabilities of OS, thus enabling customized, cost-effective, and convenient generation of kernel configurations. Experiments show that the kernels configured by BYOS outperform the default vendor-configured kernels by 7.1% to 155.4%, demonstrating the effectiveness and efficiency of BYOS in customizing kernel configurations. Our code is available at https://github.com/LHY-24/BYOS.

Defend Democracy @DefendDemocracy@eupolicy.social

Civil society organisations (#CSOs) are under attack across Europe and America.

The #EU must act now to defend them and safeguard #democracy.

#DefendDemocracy

https://www.eesc.europa.eu/en/news-media/press-releases/civil-society-under-fire-why-eu-must-act-now

European Economic and Social Committee · Feb 28Civil Society Under Fire: Why the EU Must Act NowCivil society organisations (CSOs) are under attack across Europe and America. The EU must act now to defend them and safeguard democracy. At its plenary debate on the International Day of NGOs, the European Economic and Social Committee sent out a clear message: CSOs are democracy’s frontline defenders. With funding cuts threatening their survival, the EU must take immediate action to protect and support them.

LLMs @LLMs@activitypub.awakari.com

https://arxiv.org/abs/2502.17439

#cs.SE #cs.AI #cs.DC #cs.OS

Event Attributes

arXiv.orgLarge Language Models as Realistic Microservice Trace GeneratorsComputer system workload traces, which record hardware or software events during application execution, are essential for understanding the behavior of complex systems and managing their processing and memory resources. However, obtaining real-world traces can be challenging due to the significant collection overheads in performance and privacy concerns that arise in proprietary systems. As a result, synthetic trace generation is considered a promising alternative to using traces collected in real-world production deployments. This paper proposes to train a large language model (LLM) to generate synthetic workload traces, specifically microservice call graphs. To capture complex and arbitrary hierarchical structures and implicit constraints in such traces, we fine-tune LLMs to generate each layer recursively, making call graph generation a sequence of easier steps. To further enforce learning constraints in traces and generate uncommon situations, we apply additional instruction tuning steps to align our model with the desired trace features. Our evaluation results show that our model can generate diverse realistic traces under various conditions and outperform existing methods in accuracy and validity. We show that our synthetically generated traces can effectively substitute real-world data in optimizing or tuning systems management tasks. We also show that our model can be adapted to perform key downstream trace-related tasks, specifically, predicting key trace features and infilling missing data given partial traces. Codes are available in https://github.com/ldos-project/TraceLLM.

#csse #csai #csdc

LLMs @LLMs@activitypub.awakari.com

https://arxiv.org/abs/2502.15734

#cs.DC #cs.AI #cs.CL #cs.LG #cs.OS

Event Attributes

arXiv.orgCache-Craft: Managing Chunk-Caches for Efficient Retrieval-Augmented GenerationRetrieval-Augmented Generation (RAG) is often used with Large Language Models (LLMs) to infuse domain knowledge or user-specific information. In RAG, given a user query, a retriever extracts chunks of relevant text from a knowledge base. These chunks are sent to an LLM as part of the input prompt. Typically, any given chunk is repeatedly retrieved across user questions. However, currently, for every question, attention-layers in LLMs fully compute the key values (KVs) repeatedly for the input chunks, as state-of-the-art methods cannot reuse KV-caches when chunks appear at arbitrary locations with arbitrary contexts. Naive reuse leads to output quality degradation. This leads to potentially redundant computations on expensive GPUs and increases latency. In this work, we propose Cache-Craft, a system for managing and reusing precomputed KVs corresponding to the text chunks (we call chunk-caches) in RAG-based systems. We present how to identify chunk-caches that are reusable, how to efficiently perform a small fraction of recomputation to fix the cache to maintain output quality, and how to efficiently store and evict chunk-caches in the hardware for maximizing reuse while masking any overheads. With real production workloads as well as synthetic datasets, we show that Cache-Craft reduces redundant computation by 51% over SOTA prefix-caching and 75% over full recomputation. Additionally, with continuous batching on a real production workload, we get a 1.6X speed up in throughput and a 2X reduction in end-to-end response latency over prefix-caching while maintaining quality, for both the LLaMA-3-8B and LLaMA-3-70B models.

#csdc #csai #cscl

OpenSource @OpenSource@activitypub.awakari.com

https://arxiv.org/abs/2502.13163

#cs.OS #cs.CR #cs.SE

Event Attributes

arXiv.orgA Survey of Fuzzing Open-Source Operating SystemsVulnerabilities in open-source operating systems (OSs) pose substantial security risks to software systems, making their detection crucial. While fuzzing has been an effective vulnerability detection technique in various domains, OS fuzzing (OSF) faces unique challenges due to OS complexity and multi-layered interaction, and has not been comprehensively reviewed. Therefore, this work systematically surveys the state-of-the-art OSF techniques, categorizes them based on the general fuzzing process, and investigates challenges specific to kernel, file system, driver, and hypervisor fuzzing. Finally, future research directions for OSF are discussed. GitHub: https://github.com/pghk13/Survey-OSF.

#CSOs #cscr #csse

Drag & drop to upload