Please use this identifier to cite or link to this item:
http://dspace.azjhpc.org/xmlui/handle/123456789/67
Title: | A SURVEY OF RETRIEVAL ALGORITHMS AND THEIR PARALLELIZATION IN LARGE-SCALE SYSTEMS |
Authors: | Suleymanzade, Suleyman |
Keywords: | TF-IDF;BM-25;Apache spark;Information retrieval;HPC |
Issue Date: | Dec-2021 |
Publisher: | Azerbaijan Journal of High Performance Computing |
Abstract: | This article presented a survey of two well-known algorithms, TF-IDF and BM-25 methods, for document ranking on a single CPU and parallel processes via HPC. An amazon review dataset with more than two million reviews was measured to measure the rank parameters. We set up the number of workers for the parallel processing during the experiment, which we selected as one and three. Four benchmarks evaluated the preprocess and reading time, vectorization time, TF-IDF transformation time, and overall time. Results metrics have shown a significant difference in speed. |
URI: | http://localhost:8080/xmlui/handle/123456789/67 |
ISSN: | 2616-6127 2617-4383 |
DOI: | https://doi.org/10.32010/26166127.2021.4.2.263.266 |
Journal Title: | Azerbaijan Journal of High Performance Computing |
Volume: | 4 |
Issue: | 2 |
First page number: | 263 |
Last page number: | 266 |
Number of pages: | 4 |
Appears in Collections: | Azerbaijan Journal of High Performance Computing |
Files in This Item:
File | Description | Size | Format | |
---|---|---|---|---|
doi.org.10.32010.26166127.2021.4.2.263.266.pdf | 387.38 kB | Adobe PDF | View/Open |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.