Please use this identifier to cite or link to this item: http://dspace.azjhpc.org/xmlui/handle/123456789/67
Title: A SURVEY OF RETRIEVAL ALGORITHMS AND THEIR PARALLELIZATION IN LARGE-SCALE SYSTEMS
Authors: Suleymanzade, Suleyman
Keywords: TF-IDF;BM-25;Apache spark;Information retrieval;HPC
Issue Date: Dec-2021
Publisher: Azerbaijan Journal of High Performance Computing
Abstract: This article presented a survey of two well-known algorithms, TF-IDF and BM-25 methods, for document ranking on a single CPU and parallel processes via HPC. An amazon review dataset with more than two million reviews was measured to measure the rank parameters. We set up the number of workers for the parallel processing during the experiment, which we selected as one and three. Four benchmarks evaluated the preprocess and reading time, vectorization time, TF-IDF transformation time, and overall time. Results metrics have shown a significant difference in speed.
URI: http://localhost:8080/xmlui/handle/123456789/67
ISSN: 2616-6127
2617-4383
DOI: https://doi.org/10.32010/26166127.2021.4.2.263.266
Journal Title: Azerbaijan Journal of High Performance Computing
Volume: 4
Issue: 2
First page number: 263
Last page number: 266
Number of pages: 4
Appears in Collections:Azerbaijan Journal of High Performance Computing

Files in This Item:
File Description SizeFormat 
doi.org.10.32010.26166127.2021.4.2.263.266.pdf387.38 kBAdobe PDFView/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.