SAUCE: Truncated Sparse Document Signature Bit-Vectors for Fast Web-Scale Corpus Expansion
Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21), 2021
In this paper, we use bit-vector document representation for corpus expansion, resulting in a reduction of memory footprint by 24%, retrieval of 6.8% more rare terms, and a reduction of query execution time by 78%.
Recommended citation: Muntasir Wahed, Daniel Gruhl, Alfredo Alba, Anna Lisa Gentile, Petar Ristoski, Chad Deluca, Steve Welch, and Ismini Lourentzou. 2021. SAUCE: Truncated Sparse Document Signature Bit-Vectors for Fast Web-Scale Corpus Expansion. In Proceedings of the 30th ACM International Conference on Information and Knowledge Management (CIKM ’21), November 1–5, 2021, Virtual Event, QLD, Australia. ACM, New York, NY, USA, 11 pages. https://doi.org/10.1145/3459637.3481950 https://arxiv.org/abs/2108.11948