SCVP:Multi-stage Method for Online Vertical Data Partitioning Based on Spectral Clustering (In Chinese)


Vertical data partitioning technology logically stores database table attributes satisfying certain semantic conditions in the same physical block, so as to reduce the cost of data accessing and improve the efficiency of querie processing. Every query is usually only related to the table’s some attributes in the database, so only a subset of the table’s attributes can be used to get the accurate query results. Reasonable vertical data partitioning can make most queries answered without scanning the whole table, so as to reduce the amount of data accessing and improve the efficiency of query processing. Traditional database vertical partitioning methods are mainly based on heuristic rules set by experts. The granularity of partitioning is coarse, and it can not provide different partition optimizations according to the characteristics of workload. Besides, when the scale of workload or the number of attributes becomes large, the execution time of the existing methods are too long and especially can not meet the performance requirements of online real-time tuning of database. Therefore, we propose a method called Spectral Clustering based Vertical Partitioning (SCVP) for the online environment. We adapt the idea of phased solution to reduce the time complexity of the algorithm and speed up partitioning. Firstly, SCVP reduces the solution space by increasing the constraint conditions, that is, generating initial partitions by spectral clustering. Secondly, SCVP designs the algoithum to search solution space, that is, the initial partitions are optimized by combining frequent itemset mining and greedy search. In order to further improve the performance of SCVP under high-dimensional attributes, we propose a new method called Special Clustering based Vertical Partitioning Redesign (SCVP-R) which is an improved version of SCVP. SCVP-R optimizes the partitions combiner component of SCVP by introducing sympatric-competition mechanism, double-elimination mechanism, and loop mechanism. The experimental results on different datasets show that SCVP and SCVP-R have faster execution time and better performance than the current state-of-the-art vertical partitioning method.

In Journal of Software
Luming Sun
Luming Sun
Senior R&D Engineer

My research interests include AI4Sys, AI4DB (especially Query Optimization).