2025-11-28 Hacker News Top Articles and Its Summaries
1. 28M Hacker News comments as vector embedding search dataset Total comment counts : 21 Summary The Hacker News dataset contains 28.74 million posts with precomputed 384-dimensional embeddings generated by SentenceTransformers all-MiniLM-L6-v2. The data is available as a single Parquet file in S3 and can be loaded into a ClickHouse table named hackernews with fields for text, vector, and metadata. A vector similarity index (HNSW, cosine distance) can be created and materialized to speed queries....