Staff:

Prof. Dr. Henning Meyerhenke, Humboldt-Universität zu Berlin

 

Description:

Data-intensive computing is currently becoming a new column in science for generating new insights. This trend is due to---among others---gene sequencers, 3D microscopes, and the World Wide Web. These and other groundbreaking technologies generate massive data sets, many of which can be modeled as networks. The produced data sets contain valuable information hidden inside, waiting to be extracted and further processed with suitable analysis tools.

In this proposal we focus on network analysis with three combinatorial optimization tasks: Graph clustering, graph drawing, and network flow. Current tools for these tasks show serious limitations when the input is large or has a complex structure. Moreover, only few tools exploit the parallelism offered by current and emerging hardware. All three tasks have numerous applications; we concentrate on those in the biological sciences, where data sizes currently explode. Our selection is further motivated by the fact that these tasks profit considerably from the techniques we propose. Since most data sets contain inaccuracies, we advocate an inexact, yet faster solution process with approximation algorithms and heuristics. We will develop and implement new and significantly improved parallel solvers on networks that are not only massive, but can also be dense or dynamic. The targeted improvement over the state of the art concerns three main aspects: (i) The input size that can be handled in reasonable time on powerful servers, with a target improvement of at least an order of magnitude, and (ii) significantly improving the running time of the solvers (e.g. by parallelism), while (iii) maintaining or improving solution quality. The algorithmic techniques we propose are geared toward practicability. They are in part based on recent breakthrough results in algorithmics (such as graph sparsification) or adaptations of effective heuristics (such as the multilevel framework). Moreover, we regard networks not only from a combinatorial, but also from an algebraic point of view. This simplifies theuse of parallelism to speed up the solution process. We will integrate our new methods into our open-source network analysis software NetworKit. The tool is freely available to the public and to otherSPP projects, hence fostering immediate application of our contributions to real-world problems that require high-performance codes.