1. Let us serve you, but don’t bring us down
Total comment counts : 34
Summary
The Internet Archive experienced an hour-long outage due to tens of thousands of requests per second for public domain OCR files launching from 64 virtual hosts on Amazon’s AWS services. Though the service was restored by blocking those IP addresses, another set of 64 addresses soon commenced the same kind of activity, leading to another hour of outage. The organization has recommended that those seeking to use materials in bulk should begin slowly and contact them at info@archive.org while avoiding large scale activity.
Top 1 Comment Summary
The author of the article explains how their employer’s API endpoint, which infers the purchasability of specific domains, was getting scraped by malicious users causing traffic to spike from less than 1 QPS to over 40 QPS. When the QPS reached over 200, the author implemented the use of recaptcha tokens for each request in order to stop the attack. Although this caused more friction for legit users, it was successful in stopping the attack, which was difficult to stop because it worked across a large number of IPv4 and IPv6 requesters.
Top 2 Comment Summary
The article suggests that archive.org should receive government grant money to fund worthy preservation projects. It mentions that various countries and international organizations pour funds into cultural projects of questionable value and that instead, they should fund something truly valuable. The article proposes that the EU could fund an archive.org mirror if certain conditions were met. It acknowledges the concern that some organizations waste public money and suggests that attaching concrete conditions to grants, such as a mirror in a specific location, could mitigate this issue.
2. Health officials delayed report linking fluoride to brain harm
Total comment counts : 33
Summary
The release of a report from the National Toxicology Program (NTP) on the effect of fluoridated water on cognitive and neurodevelopmental impacts on human health was delayed by the Department of Health and Human Services after criticism of its scientific validity from dental organizations including the American Dental Association. The report found a link between cognitive harm and fluoride in water at approximately two times the current recommended water fluoridation level, though the NTP has said that the overall link is unclear and that more studies are needed. Advocates of fluoridation say it is a necessary public health measure, especially for low-income communities, while others link fluoride exposure to brain damage. Previous peer reviews failed to address concerns raised by the National Academies of Sciences, Engineering, and Medicine review committee.
Top 1 Comment Summary
The author questions the practice of adding fluoride to water since it’s difficult to regulate for different factors like age and weight. They suggest that the European practice of adding fluoride to salt may be a better solution since iodized salt is already popular and commonplace. The author also emphasizes the importance of comparing regulations between countries and questioning dosage recommendations.
Top 2 Comment Summary
The Centers for Disease Control and Prevention (CDC) recommends breastfeeding infants as breastmilk contains less fluoride than fluoridated drinking water, but also approves of using fluoridated water in infant formula. However, mixing formula with low-fluoride bottled water is suggested to reduce the risk of dental fluorosis. Infants who are fed formula made with fluoridated tap water can have three to four times higher fluoride exposure than adults according to Dr. Bruce Lanphear. He warns that before an infant’s teeth come in, there is no benefit to fluoride exposure. Dr. Lanphear believes that vulnerable groups, such as infants, are not receiving adequate attention from these agencies.
3. India ruling party’s IT cell used AI to show smile on arrested protesters’ faces
Total comment counts : 45
Summary
Indian wrestlers, including Sakshi Malik, Bajrang Punia, and Vinesh Phogat, were detained by Delhi Police on Sunday, May 28. The group had been seeking the arrest of Wrestling Federation of India President Brij Bhushan Sharan Singh over allegations of sexual harassment by seven women wrestlers. The wrestlers wanted to hold a women’s Maha Panchayat in front of the new Parliament building. A selfie of the detained wrestlers smiling in a police vehicle went viral, with several accusing protesters of staging a political stunt. The photo was determined to be manipulated using an AI app.
Top 1 Comment Summary
The article talks about a disturbing trend in photo editing, where images are altered with ease and without any cost or mental barrier. The author is concerned that this kind of editing can be used to manipulate narratives and mislead people. While it is true that the truth eventually comes out, the author worries that many people may never see the original photo and continue to believe the manipulated narrative.
Top 2 Comment Summary
The article questions the importance of being a verified account on Twitter, noting that it seems more like a subscription where users pay Twitter for the verification badge. It questions why some verified accounts are mentioned in articles while others are not, and suggests that the article discussing the verified accounts tweeting the same image is strange.
4. Donut: OCR-Free Document Understanding Transformer
Total comment counts : 12
Summary
The article announces the official implementation of Donut and SynthDoG, two tools for document understanding and synthetic document generation. Donut is an OCR-free end-to-end Transformer model that achieves state-of-the-art results on various visual document understanding tasks such as document classification and information extraction. SynthDoG is a synthetic document generator that helps with pre-training the Donut model. The article provides links to the academic paper, pre-trained backbones, and SynthDoG-generated datasets. The code, model weights, synthetic data, and generator are available for download and use.
Top 1 Comment Summary
The author tested an AI/ML tool to scan and classify personal documents but found the results disappointing compared to the promised scores. They suggest sticking to traditional scanning and classification methods without AI/ML for at least the next 5 years.
Top 2 Comment Summary
The article discusses the author’s experience in building IDP solutions and concludes that the model outperforms Graph Neural Network on OCR tokens. The performance of the model depends on the heterogeneity of the data and can reach human levels of accuracy on documents with around 200 documents, scoring by Levenshtein ratio. However, the traditional approach had a problem with the quality of the OCR which acted as a bottleneck for overall model performance. The article also notes that training on CPU was possible with GCN but not with Donut.
5. Production AI systems are hard
Total comment counts : 24
Summary
error
Top 1 Comment Summary
The article argues that AI in radiology companies should focus on pre-drafting almost 90% of diagnostic studies that can be automated. The author believes that the money is not in detecting life-threatening conditions like hemorrhage or pulmonary embolism but rather in the number of reads per day. The author gives an example of how AI can assist radiologists by auto-populating a radiologist report with an appropriate description of each organ system, where the radiologist’s job is to confirm what the AI says in each section and go into further detail as needed.
Top 2 Comment Summary
The article highlights the advantage of using AI and a global database for diagnosing rare and obscure medical conditions, as radiologists who have encountered such cases can remember and identify them easily in the future. The author also shares a personal experience where a family member almost died due to a condition that was not familiar to the doctors attending to them. The author believes that sharing in-depth diagnosis of rare cases through AI and a global database could be of great benefit.
6. Easy Effects: Audio effects for PipeWire applications
Total comment counts : 12
Summary
Easy Effects is an audio effects application for PipeWire, formerly known as PulseEffects, which provides a range of plugins such as limiter, compressor, convolver, equalizer and auto volume. It allows users to have full control over the effects order and is available on most up-to-date Linux distributions with an easyeffects package that can be installed with the distribution package manager. Donations to support further development are welcome, and a comprehensive set of help pages are included within the application itself, accessed via the hamburger menu in the top right.
Top 1 Comment Summary
The author uses EasyEffects, formerly known as PulseEffects, to compress and limit the dynamic range of the audio that comes out of their speakers. This enables them to understand speech without being deafened by loud effects. The lack of this capability on their phone is the reason they do not consume videos or streams on it. The author can only fix the audio on their desktop at present.
Top 2 Comment Summary
The author highly recommends Easy Effects due to its highly effective “Noise Reduction” filter based on Xiph RNNoise, which removes background noise from the microphone. They also appreciate the easy-to-use interface, which allows them to easily record raw audio and process it later. Additionally, the software offers compressors and pitch changers. The author has the software set to start up automatically and recommends others to donate to the developer.
7. 300ms Faster: Reducing Wikipedia’s total blocking time
Total comment counts : 22
Summary
The article discusses how performance flaws on a website can cause frustration for users and details steps taken to reduce execution time for specific tasks on Wikipedia’s mobile site. The focus was on reducing Total Blocking Time (TBT), which is the sum of the blocking portion of all long tasks on the browser’s main thread. JavaScript execution was frequently the culprit of TBT problems, and removing unnecessary code or optimizing specific sections of code can have a significant impact on a website’s overall performance. By making small, targeted optimizations, even seemingly minor changes can make a substantial difference in delivering a more responsive browsing experience.
Top 1 Comment Summary
Wikipedia seems to be much faster if you are not logged in. Pages load almost instantly, and it may even be faster than Hacker News. However, if you are logged in, there is a delay of about a second on each page view. The reason for this is most likely because there is a cache or fast-path for pre-rendered pages that cannot be used when you are logged in.
Top 2 Comment Summary
The author of the article discusses their decision between using Astro or Next.js for their new blog and compares the two approaches. They conducted a test on two blog posts with similar content to compare their performance. The author notes that despite predominantly using only RSCs, Next.js is loading the entire React bundle and suggests this is due to having one “client-side” component. They also request the addition of robots.txt
or rss.xml
to the blog for readers who still use RSS.
8. Connect() – a new API for creating TCP sockets from Cloudflare Workers
Total comment counts : 18
Summary
Cloudflare Workers has introduced a new API for creating outbound TCP sockets, making it possible to connect directly to any TCP-based service from Workers. This includes databases that use a direct TCP “socket” connection to send queries and receive data. With the new API, users can use any database or infrastructure they choose when building full-stack applications on Workers. The TCP Socket API is available to everyone, and database drivers like PostgreSQL are already using it. Cloudflare plans to support inbound TCP and UDP connections, as well as new emerging application protocols based on QUIC, so that users can build applications beyond HTTP with Socket Workers.
Top 1 Comment Summary
The article notes that Cloudflare Workers is slowly transforming into a comprehensive “edge” platform. While in the past it was only capable of simple cloud functions and key-value storage, it now offers support for WebSockets, S3 storage, and SQL databases, making it a more viable option to consider.
Top 2 Comment Summary
The author of the article is excited about Cloudflare’s announcement of support for external databases, which was one of the biggest pain points for developers building on Workers/Pages. This move towards production viability for a diverse range of products is a significant step and has placed Cloudflare back on the author’s list for evaluation. The author commends Cloudflare for their ability to understand their customers’ needs and willingness to compete with themselves.
9. WP20 and Audrey Scholars
Total comment counts : 32
Summary
The article celebrates the 20th anniversary of the first release of WordPress and reflects on the success and growth of the platform. The article mentions the community that WordPress has created and the mission to democratize publishing. It also mentions the launch of the Audrey Scholars program and the hope that one day an Audrey Scholar could lead WordPress. The author congratulates the community and thanks them for providing their favorite publishing platform.
Top 1 Comment Summary
The writer admires Matt Mullenweg, the founder of Wordpress, for his principled nature in building community while maintaining humility, even without meeting him in person.
Top 2 Comment Summary
The author’s sister-in-law uses WordPress for her small online art store, but has had issues with compatibility and poor support due to the need for numerous plugins. The author is looking for alternative store fronts that are budget-friendly.
10. Fitting 44% More Data on a C64/1541 Floppy Disk
Total comment counts : 10
Summary
The physical data format on a Commodore 1541 5¼-inch floppy disk used by the C64 is defined by software, and this article explores different strategies, each with pros and cons, to fit up to 246 KB. While later disks and drive mechanics supported 40 tracks, the firmware of all 48 tpi Commodore 5¼-inch drives stayed with the original 35 track format. Using extra tracks can increase disk capacity by 12% to 15%, while using the fastest speed zone for all tracks can allow for a total disk capacity of 215 KB when using 41 tracks. It is possible to fit 41 more blocks (10.25 KB, 20% extra) without exceeding the specs of 48 tpi disks through optimizing accounting structures and using more of the tail gaps for data. Minimizing the gap after the header by using 2 byte header gaps, no sector gap, and 16 bit SYNC marks can save 18 bytes per sector.
Top 1 Comment Summary
In the 1980s, to increase the amount of data that could be stored on a C64/1541 floppy disk, the author created a hole on the other side of the disk using a hole punch and flipped it over.
Top 2 Comment Summary
The article describes how a high school student used a drill press to modify cheap 720KB floppy disks to make them act as if they were the more expensive 1.44MB floppies with great success.