Data Structures And Algorithms
On the Convergence of Network Systems (1902.04121v2)
Evangelos Kipouridis, Kostas Tsichlas
2019-02-11
The apparent disconnection between the microscopic and the macroscopic is a major issue in the understanding of complex systems. To this extend, we study the convergence of repeatedly applying local rules on a network, and touch on the expressive power of this model. We look at network systems and study their behavior when different types of local rules are applied on them. For a very general class of local rules, we prove convergence and provide a certain member of this class that, when applied on a graph, efficiently computes its k-core and its (k-1)-crust giving hints on the expressive power of such a model. Furthermore, we provide guarantees on the speed of convergence for an important subclass of the aforementioned class. We also study more general rules, and show that they do not converge. Our counterexamples resolve an open question of (Zhang, Wang, Wang, Zhou, KDD- 2009) as well, concerning whether a certain process converges. Finally, we show the universality of our network system, by providing a local rule under which it is Turing-Complete.
Lempel-Ziv-like Parsing in Small Space (1903.01909v1)
Daniel Valenzuela, Dmitry Kosolobov, Gonzalo Navarro, Simon J. Puglisi
2019-03-05
Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the input text during the construction of the parsing, which is prohibitive for long inputs. An alternative is Relative Lempel-Ziv (RLZ), which indexes only a fixed reference sequence, whose size can be controlled. Deriving the reference sequence by sampling the text yields reasonable compression ratios for RLZ, but performance is not always competitive with that of LZ and depends heavily on the similarity of the reference to the text. In this paper we introduce ReLZ, a technique that uses RLZ as a preprocessor to approximate the LZ parsing using little memory. RLZ is first used to produce a sequence of phrases, and these are regarded as metasymbols that are input to LZ for a second-level parsing on a (most often) drastically shorter sequence. This parsing is finally translated into one on the original sequence. We analyze the new scheme and prove that, like LZ, it achieves the
th order empirical entropy compression
with
, where
is the input length and
is the alphabet size. In fact, we prove this entropy bound not only for ReLZ but for a wide class of LZ-like encodings. Then, we establish a lower bound on ReLZ approximation ratio showing that the number of phrases in it can be
times larger than the number of phrases in LZ. Our experiments show that ReLZ is orders of magnitude faster than other alternatives to compute the (exact or approximate) LZ parsing, at the reasonable price of an approximation factor below
in practice, and sometimes below
, to the size of LZ.
An
-Cost Algorithm for Semidefinite Programs with Diagonal Constraints (1903.01859v1)
Yin Tat Lee, Swati Padmanabhan
2019-03-05
We study semidefinite programs with diagonal constraints. This problem class appears in combinatorial optimization and has a wide range of engineering applications such as in circuit design, channel assignment in wireless networks, phase recovery, covariance matrix estimation, and low-order controller design. In this paper, we give an algorithm to solve this problem to
-accuracy, with a run time of
, where
is the number of non-zero entries in the cost matrix. We improve upon the previous best run time of
by Arora and Kale. As a corollary of our result, given an instance of the Max-Cut problem with
vertices and
edges, our algorithm when applied to the standard SDP relaxation of Max-Cut returns a
cut in time
, where
is the Goemans-Williamson approximation ratio. We obtain this run time via the stochastic variance reduction framework applied to the Arora-Kale algorithm, by constructing a constant-accuracy estimator to maintain the primal iterates.
Semi-dynamic shortest-path tree algorithms for directed graphs with arbitrary weights (1903.01756v1)
Sanjiang Li, Yongming Li
2019-03-05
Given a directed graph
with arbitrary real-valued weights, the single source shortest-path problem (SSSP) asks for, given a source
in
, finding a shortest path from
to each vertex
in
. A classical SSSP algorithm detects a negative cycle of
or constructs a shortest-path tree (SPT) rooted at
in
time, where
are the numbers of edges and vertices in
respectively. In many practical applications, new constraints come from time to time and we need to update the SPT frequently. Given an SPT
of
, suppose the weight on a certain edge is modified. We show by rigorous proof that the well-known {\sf Ball-String} algorithm for positively weighted graphs can be adapted to solve the dynamic SPT problem for directed graphs with arbitrary weights. Let
be the number of vertices that are affected (i.e., vertices that have different distances from
or different parents in the input and output SPTs) and
the number of edges incident to an affected vertex. The adapted algorithms terminate in
time, either detecting a negative cycle (only in the decremental case) or constructing a new SPT
for the updated graph. We show by an example that the output SPT
may have more than necessary edge changes to
. To remedy this, we give a general method for transforming
into an SPT with minimal edge changes in time
provided that
has no cycles with zero length.
A linear-time algorithm and analysis of graph Relative Hausdorff distance (1903.01682v1)
Sinan G. Aksoy, Kathleen E. Nowak, Stephen J. Young
2019-03-05
Graph similarity metrics serve far-ranging purposes across many domains in data science. As graph datasets grow in size, scientists need comparative tools that capture meaningful differences, yet are lightweight and scalable. Graph Relative Hausdorff (RH) distance is a promising, recently proposed measure for quantifying degree distribution similarity. In spite of recent interest in RH distance, little is known about its properties. Here, we conduct an algorithmic and analytic study of RH distance. In particular, we provide the first linear-time algorithm for computing RH distance, analyze examples of RH distance between families of graphs, and prove several analytic results concerning the range, density, and extremal behavior of RH distance values.
How Do Classifiers Induce Agents To Invest Effort Strategically? (1807.05307v3)
Jon Kleinberg, Manish Raghavan
2018-07-13
Algorithms are often used to produce decision-making rules that classify or evaluate individuals. When these individuals have incentives to be classified a certain way, they may behave strategically to influence their outcomes. We develop a model for how strategic agents can invest effort in order to change the outcomes they receive, and we give a tight characterization of when such agents can be incentivized to invest specified forms of effort into improving their outcomes as opposed to "gaming" the classifier. We show that whenever any "reasonable" mechanism can do so, a simple linear mechanism suffices.
Capacity-Achieving Private Information Retrieval Codes with Optimal Message Size and Upload Cost (1808.07536v2)
Chao Tian, Hua Sun, Jun Chen
2018-08-22
We propose a new capacity-achieving code for the private information retrieval (PIR) problem, and show that it has the minimum message size (being one less than the number of servers) and the minimum upload cost (being roughly linear in the number of messages) among a general class of capacity-achieving codes, and in particular, among all capacity-achieving linear codes. Different from existing code constructions, the proposed code is asymmetric, and this asymmetry appears to be the key factor leading to the optimal message size and the optimal upload cost. The converse results on the message size and the upload cost are obtained by a strategic analysis of the information theoretic proof of the PIR capacity, from which a set of critical properties of any capacity-achieving code in the code class of interest is extracted. The symmetry structure of the PIR problem is then analyzed, which allows us to construct symmetric codes from asymmetric ones, yielding a meaningful bridge between the proposed code and existing ones in the literature.
Space-efficient merging of succinct de Bruijn graphs (1902.02889v2)
Lavinia Egidi, Felipe A. Louza, Giovanni Manzini
2019-02-07
We propose a new algorithm for merging succinct representations of de Bruijn graphs introduced in [Bowe et al. WABI 2012]. Our algorithm is based on the lightweight BWT merging approach by Holt and McMillan [Bionformatics 2014, ACM-BCB 2014]. Our algorithm has the same asymptotic cost of the state of the art tool for the same problem presented by Muggli et al. [bioRxiv 2017, 2019], but it is more space efficient. A novel feature of our algorithm not found in the existing tools is that it can compute the Variable Order succinct representation of the union graph starting from the plain representation of the input graphs.
Faster Biclique Mining in Near-Bipartite Graphs (1903.01538v1)
Blair D. Sullivan, Andrew van der Poel, Trey Woodlief
2019-03-04
Identifying dense bipartite subgraphs is a common graph data mining task. Many applications focus on the enumeration of all maximal bicliques (MBs), though sometimes the stricter variant of maximal induced bicliques (MIBs) is of interest. Recent work of Kloster et al. introduced a MIB-enumeration approach designed for "near-bipartite" graphs, where the runtime is parameterized by the size k of an odd cycle transversal (OCT), a vertex set whose deletion results in a bipartite graph. Their algorithm was shown to outperform the previously best known algorithm even when k was logarithmic in |V|. In this paper, we introduce two new algorithms optimized for near-bipartite graphs - one which enumerates MIBs in time O(M_I |V||E| k), and another based on the approach of Alexe et al. which enumerates MBs in time O(M_B |V||E| k), where M_I and M_B denote the number of MIBs and MBs in the graph, respectively. We implement all of our algorithms in open-source C++ code and experimentally verify that the OCT-based approaches are faster in practice than the previously existing algorithms on graphs with a wide variety of sizes, densities, and OCT decompositions.
Lightweight merging of compressed indices based on BWT variants (1903.01465v1)
Lavinia Egidi, Giovanni Manzini
2019-03-04
In this paper we propose a flexible and lightweight technique for merging compressed indices based on variants of Burrows-Wheeler transform (BWT), thus addressing the need for algorithms that compute compressed indices over large collections using a limited amount of working memory. Merge procedures make it possible to use an incremental strategy for building large indices based on merging indices for progressively larger subcollections. Starting with a known lightweight algorithm for merging BWTs [Holt and McMillan, Bionformatics 2014], we show how to modify it in order to merge, or compute from scratch, also the Longest Common Prefix (LCP) array. We then expand our technique for merging compressed tries and circular/permuterm compressed indices, two compressed data structures for which there were hitherto no known merging algorithms.
th order empirical entropy compression
with
, where
is the input length and
is the alphabet size. In fact, we prove this entropy bound not only for ReLZ but for a wide class of LZ-like encodings. Then, we establish a lower bound on ReLZ approximation ratio showing that the number of phrases in it can be
times larger than the number of phrases in LZ. Our experiments show that ReLZ is orders of magnitude faster than other alternatives to compute the (exact or approximate) LZ parsing, at the reasonable price of an approximation factor below
in practice, and sometimes below
, to the size of LZ.
-accuracy, with a run time of
is the number of non-zero entries in the cost matrix. We improve upon the previous best run time of
by Arora and Kale. As a corollary of our result, given an instance of the Max-Cut problem with
edges, our algorithm when applied to the standard SDP relaxation of Max-Cut returns a
cut in time
is the Goemans-Williamson approximation ratio. We obtain this run time via the stochastic variance reduction framework applied to the Arora-Kale algorithm, by constructing a constant-accuracy estimator to maintain the primal iterates.
with arbitrary real-valued weights, the single source shortest-path problem (SSSP) asks for, given a source
in
in
time, where
are the numbers of edges and vertices in
of
be the number of vertices that are affected (i.e., vertices that have different distances from
the number of edges incident to an affected vertex. The adapted algorithms terminate in
time, either detecting a negative cycle (only in the decremental case) or constructing a new SPT
for the updated graph. We show by an example that the output SPT
provided that