PENINGKATAN EFEKTIVITAS PENYAJIAN SEARCH RESULT DARI SISTEM TEMU KEMBALI INFORMASI MENGGUNAKAN  CLUSTERING DOKUMEN

Amir Hamzah

doi:10.34151/technoscientia.v2i1.409

Authors

Amir Hamzah Jurusan Teknik Informatika, IST AKPRIND Yogyakarta

DOI:

https://doi.org/10.34151/technoscientia.v2i1.409

Keywords:

search result clustering, retrieval effectiveness, F-measure

Abstract

The fast expansion of text information volume has caused the difficulty of infor-mation retrieval process, mainly on the model of word-based matching. The synonymy factor of word has caused non relevant document to be retrieved, whereas the polisemy factor has caused relevant document not to be retrieved. The application of document clustering to the search results before presented to the user can increase the effect-tiveness of retrieval. This study elaborates the application of document clustering to im-prove the effectiveness of retrieval by clustering to the search result before presented to the user. Three clustering algorithms from partitional approach i.e. K-Means, Bisecting K-Mean and Buckshot, and hierarchical agglomerative approach with two cluster similarity function i.e. UPGMA and Complete Link were chosen. The performance parameter was measured using F-measure, a metric derived from Precision and Recall of retrieval. The document collections to be tested are 1000 news document and 350 academic abstract documents. The results show that the presentation of search results by using clustering has improved the number of relevant document in the up-level ranks. The improvement was statistically significant compare to the page-rank method. The improvement of F-measure as a performance metric is about 14,34% for news documents and 28,18% for abstract documents.

References

Chisholm, E. and T. G. Kolda, 1999, New Term Weighting Formula for the Vector Space Method in Information Retrieval, Research Report, Computer Science and Mathematics Division, Oak Rid ge National Library, Oak Ridge, TN 3781-6367.
Cutting, D. R., D. R. Karger, J. O. Pederson, and J. W. Tukey,1992, Scatter/Gather:A Cluster-based Approach to Browsing Large Document Collection, Procedding 15th Annual Int 7ACM SIGIR Conference on R&D in IR, June 1992.
Frakes, W.B., and Baeza-Yates, R., Information Retrieval , Data Structures and Algorithm, Prentice Hall, Englewood New Jersey, 1992.
Luhn, H.P., 1958, The Automatic Creation of Literature Abstracts IBM Journal of Research and Development, 2:159-165.
Nazief, B., 2000, Development of Com-putational Linguistic Research: a Challenge for Indonesia, Computer Science Center, University of Indonesia
Salton, G., 1989, Automatix Text Processing, The Transformation, Analysis, and Retrieval of Information by Computer, Cornell University, Addison Wisley Publishing Comp, New York.
Tala, F. Z., 2004, A Study of Stemming Effect on Information Retrieval in Bahasa Indonesia, Master Thesis, Universiteit van Amsterdam, The Netherlands.
Osinki, S. , 2004, Dimensionality Reduction Techniques for Search Engine Results Clustering, Master Thesis, University of Sheffield, UK.
Rijsbergen, C. J.,1979, Information Retrieval, Information Retrieval Group, University of Glasgow .
Tombros, A., 2002, The Effectiveness of Query-Based Hierarchic Clustering of Documents for Information Retrieval, PhD Thesis, Univerity of Glasgow.
Vega, V. B. , 2001, Information Retrieval for the Indonesian Language, Master Thesis, National University of Singapore.
Widyantoro, D., H.,2007, Toward the Development of The Next Generation Search Engine, Proceeding of The International Conference on Electrical Engineering and Informatics, ICEEI2007, Bandung.
Zamir, O.E., 1999, Clustering Web Document : A Phrase-Based Method for Grouping Search Engine Result, PhD. Dissertation, University of Washington.

PENINGKATAN EFEKTIVITAS PENYAJIAN SEARCH RESULT DARI SISTEM TEMU KEMBALI INFORMASI MENGGUNAKAN CLUSTERING DOKUMEN

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

menu

Flag Counter

Information