Usage-based discovery of a fragmentation for a distributed tourism search database

Abstract

Fragmentation is an important aspect of the design of a distributed database (DDB). Many approaches exploit knowledge about the applications that are using the database (DB) to find the perfect split of the DB. Finding a fragmentation where this knowledge is not at hand can be difficult. However, if the DB that is going to be distributed already exists, statistics and usage logs can be captured.

In the context of a real world example at a company in the leisure travel market, I construct two methods that exploit such usage logs to gain a fragmentation for a DDB. I tailor the methods to the needs of this company meaning a read-only DB that can only be fragmented horizontally.

I conceptualize two methods. The first one is exploiting domain knowledge of the leisure travel market and using similarity of the DB’s filter possibilities. The other method is not based on domain knowledge but a graph partitioning algorithm applied to a graph based on the usage logs. Therefore, the second approach is more generally applicable. In the end I evaluate and compare both approaches.

My findings are that the first method is outperforming the second one drastically. Based on the evaluation, it yields a solid fragmentation and should perform well in the specific case of the company. The performance of the second method on the other hand is marginally better than a random fragmentation. I suspect that this is caused by the graph partitioning algorithm that I am using which is not optimized for characteristics of the graph yielded by the usage logs.

All in all, I find a suitable fragmentation for the specific DB. In addition, I end with a good foundation to research for the right graph algorithms suitable for DB fragmentation based on graph partitioning.

Fragmentation is an important aspect of the design of a distributed database (DDB). Many approaches exploit knowledge about the applications that are using the database (DB) to find the perfect split of the DB. Finding a fragmentation where this knowledge is not at hand can be difficult. However, if the DB that is going to be distributed already exists, statistics and usage logs can be captured.

In the context of a real world example, I construct two methods that exploit such usage logs to gain a fragmentation for a DDB.

Project information

Status:

Finished

Thesis for degree:

Bachelor

Student:

Chrisopher Gerdes

Supervisor:
Id:

2019-027