1. A Lisp compiler to RISC-V written in Lisp

Total comment counts : 12

Summary

The article discusses a simple experimental Lisp compiler written in uLisp that compiles Lisp functions into RISC-V machine code. It can be run on RISC-V cores like those in the Raspberry Pi Pico 2, building on an earlier project that compiled Lisp to ARM. The compiler is based on the structured nature of Lisp, removing the need for separate tokenizing or parsing, and is compatible with Common Lisp for displaying generated code, albeit not for execution.

To use the compiler, users must load the RISC-V assembler and can then compile a Lisp function using a specific command. The compiler supports basic Lisp constructs (like defining functions and performing arithmetic) and applies tail-call optimization for recursive functions, making them as efficient as iterative ones.

The article also describes the compiler’s handling of the RISC-V register usage and the management of function calls through a stack-based approach. Additionally, it notes a challenge with returning boolean values, which was addressed by keeping track of types to avoid ambiguity in return values. Examples of simple programs are provided to demonstrate the compiler’s functionality and performance compared to traditional Lisp execution.

Top 1 Comment Summary

SBCL is a Common Lisp compiler, developed in Common Lisp itself, that has the capability to target the RISC-V architecture.

Top 2 Comment Summary

The author expresses that the compiler in question is not yet fully functional enough to compile itself due to lacking support for essential features and operations, such as bitwise operations and specific commands like null, symbolp, eq, and atom. While it does implement some Lisp features like car and cdr, the author questions whether it can be fairly classified as a Lisp compiler. Nonetheless, the author appreciates Lisp’s flexibility, noting that it helps avoid complications related to syntax and parsing, which are significant challenges in compiler development.

2. Scuda – Virtual GPU over IP

Total comment counts : 15

Summary

The article discusses SCUDA, a GPU over IP bridge designed to connect GPUs on remote machines to CPU-only machines. It includes a demo featuring a NVIDIA GeForce RTX 4090 on a remote machine, with a local setup utilizing a Mac running a Docker container to verify CUDA availability. The SCUDA setup requires running a server and connects client commands for GPU operations.

The primary goals of SCUDA include:

  1. Ease of GPU Interaction: Developers can access distributed GPU resources over a network.

  2. Centralized GPU Management: Facilitates resource allocation for containerized applications needing GPU support.

  3. Remote Model Training: Allows training from low-power devices using optimized remote GPUs.

  4. Remote Inferencing: Enables local application setups directing CUDA calls to remote GPU servers for processing large data batches.

  5. Remote Data Processing: Facilitates operations on large datasets by utilizing remote GPU capabilities while handling results over the network.

  6. Remote Fine-Tuning: Supports fine-tuning pre-trained models through routed PyTorch CUDA calls to remote GPUs.

Overall, SCUDA aims to minimize performance impact while providing extensive GPU capabilities for local developers without needing full VM deployment. It draws inspiration from existing proprietary solutions.

Top 1 Comment Summary

The article discusses the inefficiencies of certain solutions for remote model training and inference, suggesting that alternatives like NCCL, Deepspeed, and Triton Inference Server are more effective. While the potential for testing with these solutions is acknowledged, concerns are raised about GPU contention and overall performance, even with high-speed interconnects. The article highlights that NCCL has been optimized for direct data transfers between GPUs and network interfaces, making it faster than the discussed alternatives. Additionally, Triton Inference Server utilizes advanced methods such as shared memory and NCCL for improved performance in inference tasks.

Top 2 Comment Summary

The article suggests that using “CUDA” in your tool’s name may lead to trademark issues with Nvidia. It proposes “Scuba” as an alternative name, which could be a suitable option if problems arise.

3. Nurdle Patrol

Total comment counts : 7

Summary

error

Top 1 Comment Summary

In 2023, only 221 out of 250 million shipped containers were lost at sea, resulting in an extremely low loss rate of 0.000088%. Plastic pellets, which can visibly pollute beaches, are mentioned as less harmful compared to the hidden environmental impact of carbon emissions. For perspective, a single container of plastic pellets can affect many beaches, while the annual CO2 emissions from an average American, totaling 15 tonnes, remain invisible.

Top 2 Comment Summary

The article mentions a UK initiative that has gained international traction, focusing on the issue of nurdles, which are small plastic pellets that are widespread in the environment. The piece likely emphasizes the efforts to address this pollution problem, directing readers to the Nurdle Hunt website for further information on nurdle sightings.

4. Nobel Peace Prize for 2024 awarded to Nihon Hidankyo

Total comment counts : 38

Summary

The Norwegian Nobel Committee has awarded the 2024 Nobel Peace Prize to Nihon Hidankyo, a grassroots organization of atomic bomb survivors from Hiroshima and Nagasaki, for their efforts to promote a world free of nuclear weapons. The Hibakusha have worked to raise awareness about the devastating humanitarian consequences of nuclear warfare through personal testimonies and educational campaigns, contributing to the establishment of a global norm against the use of nuclear weapons, known as the “nuclear taboo.” This year marks nearly 80 years since nuclear weapons were last used in war, yet there are concerns as nuclear powers are modernizing their arsenals and new countries may seek nuclear capabilities. Nihon Hidankyo has continuously engaged with the international community, advocating for nuclear disarmament and reminding the world of the urgency of their cause. The Nobel Committee aims to honor the survivors’ commitment to peace and hope, ensuring that their legacy and message continue to inspire future generations.

Top 1 Comment Summary

The article discusses a book about Dutch WWII survivors that includes an eyewitness account of the Hiroshima bomb from a Dutch POW. While working in a quarry near the city, he observed a plane drop a bomb with a parachute, after which he was thrown back by the explosion, witnessing the city’s destruction. The author expresses surprise at the existence of such eyewitnesses, particularly from their own country.

Top 2 Comment Summary

The article discusses the relevance of a recent award related to nuclear issues, highlighting concerns that the global focus on nuclear weapons has diminished. It emphasizes the dangers of these weapons still being on high alert to destroy major cities and suggests that the cessation of atmospheric testing and past disarmament treaties may have contributed to a false sense of security regarding the ongoing nuclear threat. The author calls for greater awareness and caution in dealing with nuclear powers.

5. Game Programming in Prolog

Total comment counts : 23

Summary

The article discusses the author’s journey in exploring unconventional programming paradigms, specifically LISP and Prolog. Initially familiar with object-oriented languages like C# and concepts like lambda expressions, the author found learning LISP through MIT’s “SICP” lecture series to be manageable, thanks to their background in electrical engineering. However, studying Prolog presented significant challenges due to its unique and esoteric syntax, which required a fundamentally different approach to programming, focusing on mathematical relations rather than traditional data structures.

Despite the initial difficulties, the author discovered intriguing links between Prolog’s concepts and engineering topics, like relational databases and digital circuits. This realization led to a belief that logic programming could be advantageous for developing complex systems, particularly in game mechanics. Although implementing an entire game solely in Prolog may be daunting and impractical for some, the author argues that using Prolog for core gameplay mechanics can lead to robust, modular systems that mitigate issues common in imperative programming. The key to success with this approach is designing systems strictly in terms of logical relations, avoiding traditional programming constructs.

Top 1 Comment Summary

The article mentions that while posts typically lose impact if they are only labeled as “Part 1” without follow-ups, this specific series actually provides substantial content. It includes multiple parts, with links to parts 2 through 5 available on the website, indicating that the series continues beyond just an introduction.

Top 2 Comment Summary

The article discusses the “Chemistry Engine” used in the game “Breath of the Wild,” which enables unique interactions between materials in an alchemical way, as opposed to a traditional physics engine that focuses on motion. This engine allows for surprising gameplay elements, like igniting wooden arrows. The author notes that while many game interactions are often hard-coded due to their simplicity, there’s potential for a more flexible implementation using logic programming languages like Prolog or Datalog. They draw a parallel to SQL, initially seen as slow but valued for its flexibility, suggesting that a similar approach could enhance game design.

6. $2 H100s: How the GPU Rental Bubble Burst

Total comment counts : 25

Summary

The article discusses recent developments in the GPU market, particularly regarding Eugene Cheah, cofounder of Featherless.AI, which provides access to a large collection of open-source models via a single API for a flat rate. There is excitement around NVIDIA’s new Blackwell series, which has been noted for its high demand and potential historical significance in the industry. However, the market dynamics have shifted, as the demand for H100 GPUs has drastically decreased from a shortage to an oversupply.

Key points include:

  • H100 GPUs, once highly sought after and costing upwards of $8 per hour, have seen rental prices drop to around $2 per hour due to oversupply and changes in market conditions.
  • Many startups have heavily invested in GPU resources, leading to financial strains as they spend more than they earn, projecting losses into 2026.
  • The article warns against investing in new H100s at this time, suggesting it’s more prudent to rent them given the current market oversupply and reduced costs.

Overall, the article suggests a turbulent landscape for GPU economics, with potential long-term implications for businesses involved in AI and machine learning.

Top 1 Comment Summary

The article discusses the challenges of establishing a bare metal mi300x service provider business. It argues that offering extremely low-priced GPUs, like $2 H100s, is likely unsustainable and indicative of either financial losses on resources or low-quality service. The author emphasizes that running a reliable GPU infrastructure requires significant investment and acknowledges the high standards set by major providers like AWS. They conclude that anyone offering such low prices is either operating under dubious conditions or is unable to generate profit.

Top 2 Comment Summary

The article discusses the financial opportunities available to data centers equipped with H100 hardware. It asserts that having this infrastructure allowed operators to generate significant profits, especially at rates exceeding $3.50 per hour. However, it notes that this profitable situation was unlikely to last indefinitely due to market efficiency. Even at reduced rates of $2.00 per hour, profit could still be realized if the data centers benefited from low electricity costs and efficient infrastructure and labor.

7. Initial CUDA Performance Lessons

Total comment counts : 8

Summary

The author shares their late discovery of CUDA, realizing it’s essentially C++ with some added features. They highlight key lessons for writing efficient CUDA code, particularly the importance of “memory coalescing,” where threads access contiguous memory to improve performance. An example shows that using contiguous memory access can be significantly faster, achieving a 3x speedup in operations like dot products.

The article discusses a graph depicting the theoretical performance of modern PCs, using a Ryzen 9950X and an RTX 4090, with the majority of performance derived from GPUs, especially with specialized hardware for tasks like machine learning and raytracing. The mention of tensor cores further illustrates how specialized chips maximize GPU performance, suggesting that straightforward CUDA programming may limit potential gains.

Memory management in CUDA is described as complex, categorizing it into registers, shared memory, and how these can influence performance. Registers can store significant data, and the author explains the relationship between threads and registers, emphasizing the need to control their allocation for efficient CUDA code. The author concludes that understanding and optimizing memory usage is crucial for leveraging GPU power effectively.

Top 1 Comment Summary

The article discusses the author’s experience optimizing CUDA code for an LHC experiment trigger, highlighting some technical details. It clarifies that NVIDIA GPUs have a maximum of 65,536 registers per Streaming Multiprocessor (SM), not per thread block. Each SM can handle a maximum of 1,024 (or 2,048) threads and 48 KB (or 64 KB) of shared memory. While you can design your thread blocks to use these resources fully, doing so can reduce occupancy and lead to underutilization of GPU resources, particularly if the kernel is not compute-bound. Using one thread block per SM limits scalability; for example, on a GPU with 60 SMs, only 60 blocks can run in parallel, which may hinder performance for larger problems that could benefit from more parallelism.

Top 2 Comment Summary

The article discusses the distinction between compute-bound and memory-bound problems in computing. It highlights that many practical issues, such as LLM (Large Language Model) AI inference on desktops, are primarily memory bound. This is because fetching large amounts of model data for each generated token can be a bottleneck. In such scenarios, specialized tensor cores perform similarly to well-optimized compute shaders on general-purpose GPU cores. In contrast, the article notes that AVX512 is significantly slower than GPUs due to their high memory bandwidth. For example, the author’s desktop features dual channel DDR5 memory with a bandwidth of 75 GB/s, while the VRAM in their discrete GPU can handle 670 GB/s.

8. Bridge to Nowhere

Total comment counts : 21

Summary

In this article, Richard L. Apodaca reflects on his experience facing brain surgery after being diagnosed with a significant tumor in his right parietal lobe. Upon learning about the tumor, he was presented with two surgical options but felt there was no genuine choice since “Option 3” — the possibility of refusing surgery — was not discussed. A doctor’s assertion that he would die within three months if he chose to avoid surgery left him feeling cornered.

Apodaca later realized that he had assumed successful surgery would return him to his previous life, which turned out to be naive. He recognized the importance of being more informed, including asking about life expectancy with and without surgery, and the realities of post-surgery recovery, especially regarding a suspected glioblastoma diagnosis. Ultimately, he suggests that a more transparent dialogue about his prognosis and the implications of surgery would have allowed for a better-informed decision. He concludes that the recommended surgery might merely postpone the inevitable, likening it to a “bridge to nowhere” given the aggressive nature of glioblastoma, emphasizing the need for comprehensive patient education in such critical situations.

Top 1 Comment Summary

The author, who works at a medical device company focused on radiation treatment for brain tumors, reflects on the profound realities faced by patients using their systems. They emphasize the fragility of life and the importance of finding moments of beauty and joy amidst life’s inevitable end. The author, acknowledging their own health challenges, expresses gratitude for the advancements in medical care that enhance quality of life. They hope their work on brain treatment technology can similarly improve the lives of patients in need.

Top 2 Comment Summary

The article emphasizes the importance of having a trained chaplain, pastor, or advocate present during medical discussions. It highlights that sometimes healthcare providers may not withhold information purposely, but may be pressed for time, leading to communication challenges. A knowledgeable advocate can help by clarifying and repeating what doctors say, making it easier for patients and families to understand critical information. The article lists key questions patients should ask, such as about medication side effects, treatment alternatives, prognosis, recovery time, and communication during procedures. The author shares experiences where they facilitated communication between doctors and patients, ensuring that vital information was not overlooked, thereby reinforcing the idea that having an advocate can serve as a valuable link in healthcare situations.

9. ARIA: An Open Multimodal Native Mixture-of-Experts Model

Total comment counts : 6

Summary

The article discusses arXivLabs, a platform for collaborators to develop and share new features for the arXiv website. It emphasizes the importance of values such as openness, community, excellence, and user data privacy, which are upheld by both arXiv and its partners. The article invites individuals and organizations to propose projects that can benefit the arXiv community and provides information on how to learn more about arXivLabs. Additionally, it mentions the availability of status notifications for arXiv’s operational status via email or Slack.

Top 1 Comment Summary

The article mentions that a model outperforms Pixtral-12B and Llama3.2-11B. It suggests that the model, referred to as “ARIA,” may require a more effective name for search engine optimization (SEO) reasons, as “ARIA” has existing associations with web applications.

Top 2 Comment Summary

The article discusses the memory requirements of a Mixture of Experts (MoE) model, particularly whether all components are loaded into memory simultaneously or if only one is used at a time. It questions whether the Mixtral-8x7B model has the memory capacity of a 7B model or a 56B model based on its architecture.

10. “Begin disabling installed extensions still using Manifest V2 in Chrome stable”

Total comment counts : 40

Summary

The article discusses the ongoing phase-out of Manifest V2 for Chrome extensions. As part of this process, a warning banner is now displayed on the chrome://extensions page for users with Manifest V2 extensions, and such extensions have begun to be disabled in the Chrome stable version. This rollout will occur gradually, with users directed to the Chrome Web Store for recommendations on Manifest V3 alternatives. Temporary reactivation of Manifest V2 extensions will still be possible for some users. Enterprises using the ExtensionManifestV2Availability policy are exempt from these changes until June 2025. Starting June 3rd, users on Chrome Beta, Dev, and Canary channels with Manifest V2 will see further warnings and affected extensions will lose their Featured badge. The Chrome Web Store has stopped accepting new Manifest V2 extensions in public or unlisted visibility. Overall, users and developers are encouraged to transition to Manifest V3 before the complete deprecation of Manifest V2.

Top 1 Comment Summary

As of March 2024, Firefox will continue to support Manifest Version 2 (MV2) extensions and has no immediate plans to deprecate this version. The company assures that should any changes occur regarding MV2 support in the future, developers will receive at least 12 months’ notice to make necessary adjustments.

Top 2 Comment Summary

The article mentions that Chrome Canary has discontinued support for uBlock Origin and other extensions using Manifest V2, which has generated some discussion in the comments section. The author notes that this topic hasn’t gained much attention previously, so they are keeping the current thread active.