Applying a dynamic threshold to improve cluster detection of LSI

P.N. van der Spek, A.S. Klusener

Research output: Contribution to JournalArticleAcademicpeer-review

Abstract

Latent Semantic Indexing (LSI) is a standard approach for extracting and representing the meaning of words in a large set of documents. Recently it has been shown that it is also useful for identifying concerns in source code. The tree cutting strategy plays an important role in obtaining the clusters, which identify the concerns. In this contribution the authors compare two tree cutting strategies: the Dynamic Hybrid cut and the commonly used fixed height threshold. Two case studies have been performed on the source code of Philips Healthcare to compare the results using both approaches. While some of the settings are particular to the Philips-case, the results show that applying a dynamic threshold, implemented by the Dynamic Hybrid cut, is an improvement over the fixed height threshold in the detection of clusters representing relevant concerns. This makes the approach as a whole more usable in practice. © 2010 Elsevier B.V. All rights reserved.
Original languageEnglish
Pages (from-to)1261-1274
JournalScience of Computer Programming
Volume76
Issue number12
DOIs
Publication statusPublished - 2011

Fingerprint

Dive into the research topics of 'Applying a dynamic threshold to improve cluster detection of LSI'. Together they form a unique fingerprint.

Cite this