Mathias Goller, Michael Schrefl,
: Proceedings of the IASTED International Conference on Databases and Applications (DBA 2004) as part of the 22nd IASTED International Multi-Conference on Applied Informatics, Innsbruck, Austria, ACTA Press, 2-2004, ISBN: 0-88986-383-0
Sprache des Titels:
Proceedings of the IASTED International Conference on Databases and Applications (DBA 2004) as part of the 22nd IASTED International Multi-Conference on Applied Informatics, Innsbruck, Austria
Clustering is a data mining task that is computationally intensive - especially in large databases. Previous work shows that using aggregated representations of the original data is successful in reducing the cost of computation. But the construction of these aggregated representations is still a big time consuming task. This article shows that clustering using aggregated data should be done in two separate steps - in a time-consuming preparation step and a clustering step that requires only a fraction of the time the first step does.
The parameters of a specific clustering task are unknown at the time the data are aggregated. Hence, the aggregated data must be stored in a way that does not exclude any future parameter settings. It is even unknown whether or not a clustering will ever happen. Hence, the preparation may require only few additional resources to make anticipatory clustering profitable.
This article shows that a task-independent representation can be computed as spin-off in other regular tasks like the Extract-Transform-Load Cycle in data warehouses - which are often used in combination with data mining.
Sprache der Kurzfassung:
Anzahl der Seiten:
Notiz zur Publikation:
For reason of copyright these pages only contain abstracts of the published papers. If you are interested in a paper, you may receive a copy (PDF) by sending an e-mail to our office (email@example.com). Please include the paper-no. (Goll04a) and agree to use the paper for scientific purposes and private use only!