EFISIENSI PHRASE SUFFIX TREE DENGAN SINGLE PASS CLUSTERING UNTUK PENGELOMPOKAN DOKUMEN WEB BERBAHASA INDONESIA

Authors

  • Desmin Tuwohingide Teknik Informatika, , Institut Teknologi Sepuluh Nopember
  • Mika Parwita Teknik Informatika, , Institut Teknologi Sepuluh Nopember
  • Agus Zainal Arifin Teknik Informatika, , Institut Teknologi Sepuluh Nopember

DOI:

https://doi.org/10.34151/technoscientia.v8i2.162

Keywords:

Documents Clustering, Single-Pass Clustering, Suffix Tree

Abstract

The number of indonesian documents which available on internet is growing very rapidly. Automatic documents clustering shown to improving the relevant documents search results of many found documents. Suffix tree is one of documents clustering method that developed, because it is proven to increase precision. In this paper, we propose a new method to clustering indonesian web documents based on phrase efficiency in the choice process of base cluster with the combination of documents frequency and term frequency calculation on the phrase with a single pass clustering algorithm (SPC). Every phrase that is considered as the base cluster will be vectored then calculate of the term frequency and document frequency. Furthermore, the documents will be calculate their similarity based on the tf-idf weighted using the cosine similarity and documents clustering is done by using a single pass clustering algorithm. The proposed method is tested on 6 dataset with number of different document 10, 20, 30, 40, 50 and 60 documents. The experiment result show that the proposed method succeeded clustering indonesian web documents by reducing the leaf node with no derivative and produces the F-measure an average of 0.78 while STC traditional produces the F-measure an average of 0.55.This result prove that the efficiency of phrase by phrase choice on internal nodes and leaf nodes that have derivative, and a combination of  term frequency and document frequency calculation on the base cluster, gives a significant impact on the process of clustering documents.

References

Arifin, A.Z., Darwanto, R., Navastara, D.A. & Ciptaningtyas, H.T. Kla-sifikasi Online Berita dengan Menggunakan Algoritma Suffix Tree Clustering. Proceeding of SESINDO. 2008.
Arifin, A.Z. & Novan, A.N. Klasifikasi Dokumen Berita Kejadian Berba-hasa Indonesia dengan Algorit-ma Single Pass Clustering. Prosiding Seminar on Intelligent Technology and its Applications (SITIA), Teknik Elektro, Institut Teknologi Sepuluh Nopember Surabaya. 2002.
Chim, H. & Deng, X. Efficient Phrase-Based Document Similarity for Clustering. IEEE Transactions on Knowledge and Data Enginee-ring. Vol. 20: 1217–1229. 2008.
Februariyanti, H. & Zuliarso, E. Algoritma Single Pass Clustering untuk Klastering Halaman Web. Pro-siding Seminar Nasional Kom-puter dan Elektro (SENOPU-TRO). 1–8. 2012.
Hammouda, K.M. & Kamel, M.S. Efficient Phrase-Based Document Index-ing for Web Document Cluster-ing. IEEE Transactions on Know-ledge and Data Engineering. Vol. 16: 1279–1296. 2004.



































Huang, C., Yin, J. & Hou, F. Text clustering using a suffix tree similarity measure. Journal of Computers. Vol. 6: 2180–2186. 2011.
Jain, A.K. & Maheshwari, S. Phrase based Clustering Scheme of Suffix Tree Document Clustering Model. International Journal of Computer Application. Vol. 63: 30–37. 2013.
Klampanos, I.A., Jose, J.M. & van Rijsbergen, C.J. Single-Pass Clustering for Peer-to-Peer Infor-mation Retrieval : The Effect of Document Ordering. Proceedings of the 1st international conferen-ce on Scalable information sys-tems. 2006.
Zamir, O. & Etzioni, O. Web document clustering: A feasibility demons-tration. Proceedings of the 21st International ACM SIGIR Confe-rence on Research and Development in Information Retrieval. 46–54. 1998.

Downloads

Published

01-02-2016

How to Cite

Tuwohingide, D., Parwita, M., & Arifin, A. Z. (2016). EFISIENSI PHRASE SUFFIX TREE DENGAN SINGLE PASS CLUSTERING UNTUK PENGELOMPOKAN DOKUMEN WEB BERBAHASA INDONESIA. JURNAL TEKNOLOGI TECHNOSCIENTIA, 8(2), 133–140. https://doi.org/10.34151/technoscientia.v8i2.162