Utilizing Apache Hadoop in Clique Detection Methods
Keywords:
graph algorithms, clique detection, produce architecture, Apache Hadoop, parallel systemsAbstract
There are many areas in information technology and mathematics where we have to process large graphs, for example data mining based on social networks, route problems, etc. Many of these areas require us to explore the connections among nodes and find all the maximal cliques in the graphs, i.e. all the node sets whose members are mutually connected with each other. One possible and widely used clique detection method is the so-called Eron-Kerbosch algorithm. However, this technique alone might be too slow for big graphs, thus posting the method into a massively parallel system can reduce the overall runtime. This paper introduces some possibilities and starting points in utilizing the open source Apache Hadoop framework that can help in using the resources of multiple computers. The so-called MapReduce architecture makes it possible to divide and conquer the big task into smaller chunks and eventually solve the problem faster than the equivalent sequential methods.