mastodon.world is one of the many independent Mastodon servers you can use to participate in the fediverse.
Generic Mastodon server for anyone to use.

Server stats:

8.1K
active users

#CSAR

6 posts6 participants1 post today
deepseek<br><br><a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.AR" target="_blank">#cs.AR</a><br><br><a href="https://arxiv.org/abs/2508.06047" rel="nofollow noopener" target="_blank">Origin</a> | <a href="https://awakari.com/sub-details.html?id=deepseek" rel="nofollow noopener" target="_blank">Interest</a> | <a href="https://awakari.com/pub-msg.html?id=JioP7c33HV8dRDxWHGdEOkftXQO&amp;interestId=deepseek" rel="nofollow noopener" target="_blank">Match</a>
Raspberry-Pi<br><br><a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.CL" target="_blank">#cs.CL</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.AI" target="_blank">#cs.AI</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.AR" target="_blank">#cs.AR</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.SY" target="_blank">#cs.SY</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/eess.SY" target="_blank">#eess.SY</a><br><br><a href="https://arxiv.org/abs/2506.11105" rel="nofollow noopener" target="_blank">Origin</a> | <a href="https://awakari.com/sub-details.html?id=Raspberry-Pi" rel="nofollow noopener" target="_blank">Interest</a> | <a href="https://awakari.com/pub-msg.html?id=Mxvwe3nlUws3WK890MyagLFTtLc&amp;interestId=Raspberry-Pi" rel="nofollow noopener" target="_blank">Match</a>
moonlight_seashell<p><span class="h-card" translate="no"><a href="https://ec.social-network.europa.eu/@EUCommission" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>EUCommission</span></a></span> <br>welcome to the new design :<br><a href="https://equestria.social/tags/CSAR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CSAR</span></a></p>
moonlight_seashell<p>Stop europe's most dangerous and authoritarian law: chat control</p><p>the goal break every form of encryption for private messaging</p><p><a href="https://korben.info/chat-control-europe-scan-messages.html" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">korben.info/chat-control-europ</span><span class="invisible">e-scan-messages.html</span></a><br><a href="https://equestria.social/tags/CSAR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CSAR</span></a> <a href="https://equestria.social/tags/chatcontrol" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>chatcontrol</span></a></p>
Parano-Sprite<p>ALERTE ROUGE : L'Union Européenne s'apprête à tuer Internet tel qu'on l'a construit.</p><p>(3) Alerte Chat Control !!! Après la DADVSI, ils recommencent ! - Association WDA<br><a href="https://wda-fr.org/forum/viewtopic.php?t=2843" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">wda-fr.org/forum/viewtopic.php</span><span class="invisible">?t=2843</span></a></p><p><a href="https://mamot.fr/tags/ChatControl" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ChatControl</span></a> <a href="https://mamot.fr/tags/CSAR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CSAR</span></a> <a href="https://mamot.fr/tags/StopScanningMe" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>StopScanningMe</span></a> <a href="https://mamot.fr/tags/stop_scanning_me" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>stop_scanning_me</span></a></p>
LLMs<br><br><a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.PF" target="_blank">#cs.PF</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.AI" target="_blank">#cs.AI</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.AR" target="_blank">#cs.AR</a> <a rel="nofollow noopener" class="mention hashtag" href="https://mastodon.social/tags/cs.LG" target="_blank">#cs.LG</a><br><br><a href="https://arxiv.org/abs/2508.00904" rel="nofollow noopener" target="_blank">Origin</a> | <a href="https://awakari.com/sub-details.html?id=LLMs" rel="nofollow noopener" target="_blank">Interest</a> | <a href="https://awakari.com/pub-msg.html?id=SDmjJei3LiH4tT9hJ8c9y5Cf7Im&amp;interestId=LLMs" rel="nofollow noopener" target="_blank">Match</a>
arXiv logo
arXiv.orgKLLM: Fast LLM Inference with K-Means QuantizationLarge language model (LLM) inference poses significant challenges due to its intensive memory and computation demands. Weight and activation quantization (WAQ) offers a promising solution by reducing both memory footprint and arithmetic complexity. However, two key challenges remain in the existing WAQ designs. (1) Traditional WAQ designs rely on uniform integer-based quantization for hardware efficiency, but this often results in significant accuracy degradation at low precision. K-Means-based quantization, a non-uniform quantization technique, achieves higher accuracy by matching the Gaussian-like distributions of weights and activations in LLMs. However, its non-uniform nature prevents direct execution on low-precision compute units, requiring dequantization and floating-point matrix multiplications (MatMuls) during inference. (2) Activation outliers further hinder effective low-precision WAQ. Offline thresholding methods for outlier detection can lead to significant model performance degradation, while existing online detection techniques introduce substantial runtime overhead. To address the aforementioned challenges and fully unleash the potential of WAQ with K-Means quantization for LLM inference, in this paper, we propose KLLM, a hardware-software co-design framework. KLLM features an index-based computation scheme for efficient execution of MatMuls and nonlinear operations on K-Means-quantized data, which avoids most of the dequantization and full-precision computations. Moreover, KLLM incorporates a novel outlier detection engine, Orizuru, that efficiently identifies the top-$k$ largest and smallest elements in the activation data stream during online inference. Extensive experiments show that, on average, KLLM achieves speedups of 9.67x, 7.03x and energy efficiency improvements of 229.50x, 150.21x compared to the A100 GPU and Atom, respectively.
arXiv logo
arXiv.orgMCP4EDA: LLM-Powered Model Context Protocol RTL-to-GDSII Automation with Backend Aware Synthesis OptimizationThis paper presents MCP4EDA, the first Model Context Protocol server that enables Large Language Models (LLMs) to control and optimize the complete open-source RTL-to-GDSII design flow through natural language interaction. The system integrates Yosys synthesis, Icarus Verilog simulation, OpenLane place-and-route, GTKWave analysis, and KLayout visualization into a unified LLM-accessible interface, enabling designers to execute complex multi-tool EDA workflows conversationally via AI assistants such as Claude Desktop and Cursor IDE. The principal contribution is a backend-aware synthesis optimization methodology wherein LLMs analyze actual post-layout timing, power, and area metrics from OpenLane results to iteratively refine synthesis TCL scripts, establishing a closed-loop optimization system that bridges the traditional gap between synthesis estimates and physical implementation reality. In contrast to conventional flows that rely on wire-load models, this methodology leverages real backend performance data to guide synthesis parameter tuning, optimization sequence selection, and constraint refinement, with the LLM functioning as an intelligent design space exploration agent. Experimental evaluation on representative digital designs demonstrates 15-30% improvements in timing closure and 10-20% area reduction compared to default synthesis flows, establishing MCP4EDA as the first practical LLM-controlled end-to-end open-source EDA automation system. The code and demo are avaiable at: http://www.agent4eda.com/
arXiv logo
arXiv.orgA3D-MoE: Acceleration of Large Language Models with Mixture of Experts via 3D Heterogeneous IntegrationConventional large language models (LLMs) are equipped with dozens of GB to TB of model parameters, making inference highly energy-intensive and costly as all the weights need to be loaded to onboard processing elements during computation. Recently, the Mixture-of-Experts (MoE) architecture has emerged as an efficient alternative, promising efficient inference with less activated weights per token. Nevertheless, fine-grained MoE-based LLMs face several challenges: 1) Variable workloads during runtime create arbitrary GEMV-GEMM ratios that reduce hardware utilization, 2) Traditional MoE-based scheduling for LLM serving cannot fuse attention operations with MoE operations, leading to increased latency and decreased hardware utilization, and 3) Despite being more efficient than conventional LLMs, loading experts from DRAM still consumes significant energy and requires substantial DRAM bandwidth. Addressing these challenges, we propose: 1) A3D-MoE, a 3D Heterogeneous Integration system that employs state-of-the-art vertical integration technology to significantly enhance memory bandwidth while reducing Network-on-Chip (NoC) overhead and energy consumption. 2) A 3D-Adaptive GEMV-GEMM-ratio systolic array with V-Cache efficient data reuse and a novel unified 3D dataflow to solve the problem of reduced hardware utilization caused by arbitrary GEMV-GEMM ratios from different workloads, 3) A Hardware resource-aware operation fusion scheduler that fuses attention operations with MoE operations to enhance hardware performance, and 4) MoE Score-Aware HBM access reduction with even-odd expert placement that reduces DRAM access and bandwidth requirements. Our evaluation results indicate that A3D-MoE delivers significant performance enhancements, reducing latency by a factor of 1.8x to 2x and energy consumption by 2x to 4x, while improving throughput by 1.44x to 1.8x compared to the state-of-the-art.
arXiv logo
arXiv.orgRailX: A Flexible, Scalable, and Low-Cost Network Architecture for Hyper-Scale LLM Training SystemsIncreasingly large AI workloads are calling for hyper-scale infrastructure; however, traditional interconnection network architecture is neither scalable nor cost-effective enough. Tree-based topologies such as the \textit{Rail-optimized} network are extremely expensive, while direct topologies such as \textit{Torus} have insufficient bisection bandwidth and flexibility. In this paper, we propose \textit{RailX}, a reconfigurable network architecture based on intra-node direct connectivity and inter-node circuit switching. Nodes and optical switches are physically 2D-organized, achieving better scalability than existing centralized circuit switching networks. We propose a novel interconnection method based on \textit{Hamiltonian Decomposition} theory to organize separate rail-based rings into \textit{all-to-all} topology, simultaneously optimizing ring-collective and all-to-all communication. More than $100$K chips with hyper bandwidth can be interconnected with a flat switching layer, and the diameter is only $2\sim4$ inter-node hops. The network cost per injection/All-Reduce bandwidth of \textit{RailX} is less than $10\%$ of the Fat-Tree, and the cost per bisection/All-to-All bandwidth is less than $50\%$ of the Fat-Tree. Specifically, only $\sim$\$$1.3$B is required to interconnect 200K chips with 1.8TB bandwidth. \textit{RailX} can also be used in the ML-as-a-service (MLaaS) scenario, where single or multiple training workloads with various shapes, scales, and parallelism strategies can be flexibly mapped, and failures can be worked around.
arXiv logo
arXiv.orgNeuromorphic Computing: A Theoretical Framework for Time, Space, and Energy ScalingNeuromorphic computing (NMC) is increasingly viewed as a low-power alternative to conventional von Neumann architectures such as central processing units (CPUs) and graphics processing units (GPUs), however the computational value proposition has been difficult to define precisely. Here, we explain how NMC should be seen as general-purpose and programmable even though it differs considerably from a conventional stored-program architecture. We show that the time and space scaling of NMC is equivalent to that of a theoretically infinite processor conventional system, however the energy scaling is significantly different. Specifically, the energy of conventional systems scales with absolute algorithm work, whereas the energy of neuromorphic systems scales with the derivative of algorithm state. The unique characteristics of NMC architectures make it well suited for different classes of algorithms than conventional multi-core systems like GPUs that have been optimized for dense numerical applications such as linear algebra. In contrast, the unique characteristics of NMC make it ideally suited for scalable and sparse algorithms whose activity is proportional to an objective function, such as iterative optimization and large-scale sampling (e.g., Monte Carlo).
arXiv logo
arXiv.orgHardware-Compatible Single-Shot Feasible-Space Heuristics for Solving the Quadratic Assignment ProblemResearch into the development of special-purpose computing architectures designed to solve quadratic unconstrained binary optimization (QUBO) problems has flourished in recent years. It has been demonstrated in the literature that such special-purpose solvers can outperform traditional CMOS architectures by orders of magnitude with respect to timing metrics on synthetic problems. However, they face challenges with constrained problems such as the quadratic assignment problem (QAP), where mapping to binary formulations such as QUBO introduces overhead and limits parallelism. In-memory computing (IMC) devices, such as memristor-based analog Ising machines, offer significant speedups and efficiency gains over traditional CPU-based solvers, particularly for solving combinatorial optimization problems. In this work, we present a novel local search heuristic designed for IMC hardware to tackle the QAP. Our approach enables massive parallelism that allows for computing of full neighbourhoods simultaneously to make update decisions. We ensure binary solutions remain feasible by selecting local moves that lead to neighbouring feasible solutions, leveraging feasible-space search heuristics and the underlying structure of a given problem. Our approach is compatible with both digital computers and analog hardware. We demonstrate its effectiveness in CPU implementations by comparing it with state-of-the-art heuristics for solving the QAP.
arXiv logo
arXiv.orgSPICEAssistant: LLM using SPICE Simulation Tools for Schematic Design of Switched-Mode Power SuppliesState-of-the-art large language models (LLMs) show high performance across a wide range of tasks in many domains of science. In the field of electronic design automation (EDA), it is yet to be determined to what extent they are capable to understand, adapt, and dimension electronic circuits. This paper focuses on the application of LLMs to switched-mode power supply (SMPS) design on printed circuit boards (PCBs). Particular challenges for LLMs in this context include their limited ability to interpret results from key simulation tools like SPICE and the multi-step design process. To address these challenges, we suggest SPICEAssistant, a framework that provides a broad selection of tools to an LLM. The tools serve as an interface to SPICE, allowing the LLM to interact flexibly with the simulator to estimate the impact of its modifications to the circuit. To evaluate the performance of SPICEAssistant, we defined a benchmark consisting of 256 questions testing the ability to adapt circuit netlists to fulfil different SMPS design tasks. The benchmarking results show that simulation feedback effectively improves SMPS design capabilities of LLMs. An increasing number of simulation iterations leads to enhanced performance. The SPICEAssistant framework significantly outperforms the standalone LLM GPT-4o on the benchmark by approximately 38%.
arXiv logo
arXiv.orgSLIM: A Heterogeneous Accelerator for Edge Inference of Sparse Large Language Model via Adaptive ThresholdingLarge language models (LLMs) have demonstrated exceptional proficiency in understanding and generating human language, but efficient inference on resource-constrained embedded devices remains challenging due to large model sizes and memory-intensive operations in feedforward network (FFN) and multi-head attention (MHA) layers. While existing accelerators offload LLM inference to expensive heterogeneous computing systems, they fail to exploit the significant sparsity inherent in LLM operations, leaving hardware resources underutilized. We propose SLIM, an algorithm-hardware co-design optimized for sparse LLM serving on edge devices. SLIM exploits LLM sparsity through an adaptive thresholding algorithm that enables runtime-configurable sparsity with negligible accuracy loss, fetching only activated neurons to dramatically reduce data movement. Our heterogeneous hardware architecture strategically combines near-storage processing (NSP) and processing-in-memory (PIM): FFN weights are stored in high-density 3D NAND and computed using NSP units, while memory-intensive MHA operations are processed in PIM modules. This design significantly reduces memory footprint, data movement, and energy consumption. Our comprehensive evaluation demonstrates SLIM's effectiveness, achieving 13-18x throughput improvements over SSD-GPU systems and 9-10x better energy efficiency over DRAM-GPU systems while maintaining low latency, making cost-effective LLM deployment viable for edge computing environments.