Linbox: a Generic Library for Exact Linear Algebra

نویسندگان

  • J.-G. DUMAS
  • T. GAUTIER
  • M. GIESBRECHT
  • P. GIORGI
  • B. HOVINEN
  • E. KALTOFEN
  • G. VILLARD
  • Jean-Guillaume. Dumas
چکیده

Base Class pointers Concrete Field virtual functions Field Archetype Linbox field archetype Figure 1: Black box design. The LinBox black box matrix archetype is simpler than the field archetype because the design constraints are less stringent. As with the field type, we need a common object interface to describe how algorithms are to access black box matrices, but it only requires functions to access the matrix’s dimensions and to apply the matrix or its transpose to a vector. Thus our black box matrix archetype is simply an abstract class, and all actual black box matrices are subclasses of the archetype class. We note that the overhead involved with this inheritance mechanism is negligible in comparison with the execution time of the methods, unlike for our field element types. The black box matrix archetype is template-parameterized by the vector type upon which it acts, but not by the field in which the arithmetic is done as we saw no necessity for this. The field of entries is bound directly to the black box matrix class and is available as an argument to our black box algorithms, which may perform additional coefficient field operations. Optionally, black box matrix classes can have the field type as a template parameter. In addition, variants of the apply method are provided through which one could pass additional information, including the field over which to operate. linbox: submitted to World Scientific on May 20, 2002 4 LinBox currently has three types of vectors, dense, sparse sequence, and sparse map. The dense vectors store every entry of the vector and are generally implemented as std::vector. Sparse vectors only store nonzero entries. Sparse sequence vectors have the archetype std::list>, and sparse map vectors have the archetype std:: map. The C++ operator[] is disallowed in the latter to avoid fill-in with zero values. By its data structure, a map entry access is logarithmic time. We do not parameterize our algorithms with a black box type. We use the black box archetype directly in the algorithms. The caller provides a specific black box subclass. For convenience, some methods have default implementations in the archetype. For example, apply and applyTranspose each have three variants, which handle allocation of the input and output vectors differently. Only one variant is necessary: Vector & apply (Vector & y, const Vector & x) const; Vector & applyTranspose (Vector & y, const Vector & x) const; The archetype base class provides default implementations of the other variants, but a derived class can override them. Overall our design is carefully constructed to be generic with respect to the field and the blackbox matrices, and yet to provide high performance. A simple experiment illustrates the successful avoidance of performance degradation. In this test we compare using an accumulation by a sequence of additions and multiplications done using each of three different setups for the arithmetic. The first is use of NTL::zz_p arithmetic directly, the second is our wrapper of the NTL::zz_p class so that the arithmetic is done using LinBox’s generic field interface, and the third is with use of the LinBox field archetype. This third interface involves an extra pointer dereference for each field operation, but allows separate compilation of field and algorithm. The results are 8.52 seconds for NTL::zz_p directly, 8.32 seconds for Linbox wrapped NTL::zz_p, and 11.57 seconds for Linbox archetype over the LinBox wrapper of NTL::zz_p. Thus there is no performance cost for the generic field interface, because the function signature changes have been resolved at compilation, but there is a cost for support of separate compilation (and code bloat avoidance). However, note that this overhead is relatively smaller for more expensive arithmetic, such as using the archetype over NTL::ZZ_p on integers larger than word size. 3 Black box algorithms Wiedemann’s paper has led to many developments in the application of Krylov and Lanczos methods to fast exact problem solving. Here we present the main directions followed in LinBox regarding these methods. Linear allinbox: submitted to World Scientific on May 20, 2002 5 gebra done over finite fields is the core of the library. Computations over the integers or rationals build upon this core. Randomized algorithms, heuristics and checking. In black box linear algebra, the fastest known algorithms generally use random bits. Our library includes Monte Carlo algorithms (which produce correct results with controllably high probability), Las Vegas algorithms (which always produce correct results and are fast with high probability), and deterministic algorithms. Properties of these algorithms are proven under conditions on the source of random bits, e.g., that we are able to choose random elements uniformly from a sufficiently large subset of the ground field. This condition may be prohibitively costly in practice, e.g., if the field is small the use of an algebraic extension may be required. This has led us to a new implementation strategy. We relax theoretical conditions on our algorithms but introduce subsequent checks for correctness. We exploit randomized algorithms as heuristics even when the provable conditions for success are not satisfied. To complement this, specific checks are developed to certify the output. These checks may themselves be randomized (see below), in which case we certify only the probability of success. This strategy has been powerful in obtaining solutions for huge determinant and Smith form problems in Section 4. Minimal polynomial and linear system solution over finite fields. For a matrix A ∈ Fn×n over a field F, Lanczos and Krylov subspace methods essentially compute the minimal polynomial of a vector with respect to A. These methods access A only via matrix-vector products v = A · u and w = A · u, i.e., they treat A as a black box function. Thus the library will only employ a black box for A, which can exploit the structure or sparsity of A. The minimal polynomial f of a vector u is computed as a linear dependency between the iterates u,Au,Au, . . .. In the Krylov/Wiedemann approach, the dependency is found by applying the Berlekamp-Massey algorithm to the sequence vu, vAu, vAu, . . . ∈ F, for a random vector v. This identifies the minimum generating polynomial f v (x) = x d−fd−1xd−1−· · ·− f1x− f0, i.e., vAu = fd−1 · vTAd+i−1u + · · ·+ f1 · vAu + f0 · vAu, for all i ≥ 0. Then f v (x) always divides f, and they are equal with high probability. Berlekamp-Massey computes f v (x) after processing the first 2d elements of the sequence. LinBox::minpoly: With high probability, the minimal polynomial of the sequence {vAu}0≤i≤2n−1 is the minimum polynomial f of A, for randomly chosen vectors u, v. This algorithm is randomized of the Monte Carlo type. As first observed by Lobo, the cost can be reduced by early termination: as soon as the linear generator computed by the Berlekamp-Massey process linbox: submitted to World Scientific on May 20, 2002 6 remains the same for a few steps, it is likely the minimal polynomial. This argument is heuristic in general but provable for the Lanczos algorithm on preconditioned matrices over suitably large fields [Eberly, private communication (2000)]. A Monte Carlo check of the early termination is implemented by applying the computed polynomial to a random vector. Dominant costs in these algorithms are given in Table 1. Terms between brackets give memory requirements (in field elements). Early termination and randomized Monte Carlo algorithms correspond to bi-orthogonal Lanczos algorithms with or without lookahead. In both approaches, the number of matrix-vector products may be cut in half if the matrix is symmetric. Since the update of the linear generator is computed by dot products instead of elementary polynomial operations, a Lanczos strategy has a slightly higher cost for computing the minimal polynomial. Table 1. Costs of Wiedemann and Lanczos algorithms for f of degree d and for Az = b. A or A can be applied to a vector using at most ω operations. Early termin. f Monte Carlo f Sys. soln. Az = b Wiedem. [6n] 2dω + 4d(n + d) 2nω + 4n + 2d(n + d) +dω + 2dn Wiedem. [O(dn)] 2dω + 4d(n + d) 2nω + 4n + 2d(n + d) +2dn Lanczos [3n] 2dω + 8dn 2nω + 4n + 4dn +2dn LinBox::linsolve: For nonsingular A, a linear system Az = b is solved by computing the minimal polynomial f(x) = x + fd−1xd−1 + · · ·+ f1x + f0 of b; its coefficients directly give z = −(1/f0)(Ad−1b+ fd−1Ad−2b+ · · ·+ f1b). Checking Az = b makes the system solution Las Vegas. The Lanczos approach allows one to compute z within the iterations for the minimal polynomial, thus the arithmetic and memory costs are only slightly greater than for basic Lanczos. The main drawback of the Wiedemann approach is that it needs to either store or recompute the sequence {Ab}0≤i≤d−1. For both minimal polynomial and system solution, we are developing block versions of the Wiedemann and Lanczos algorithms. The choice between the strategies will rely on the same criteria as for the non-blocked versions. We can also cover the case of singular systems. Preconditioning for the rank and the determinant over finite fields. The computation of the rank and determinant of a black box matrix A reduce to the computation of the minimal polynomial of a matrix Ã, called a preconditioning of A, obtained by composing A with one or more other black boxes called preconditioners. A first type of preconditioning involves multiplication of A by Toeplitz, sparse, or butterfly matrices, and typically incurs a cost of O(n log n) or O(n log n) operations. For large n, this cost can be prohibitive. We have thus focused on diagonal preconditioners, i.e. scalings, linbox: submitted to World Scientific on May 20, 2002 7 which involve only n field elements and incur O(n) cost. They are proven effective for large fields. LinBox::rank: The minimal polynomial f v (x) of {vAu}i≥0 always reveals a lower bound for rank A. Whether this bound coincides with the rank depends on the spectral structure of A. In a given application, if that structure is favorable then LinBox::minpoly is sufficient (for instance, full rank is certified by checking if deg f v (x) is maximal). For large fields the preconditioning à = D1AD2AD1, with D1 and D2 two random diagonal matrices, probabilistically ensures a good structure. For smaller fields, one might use a field extension, but we can also use this preconditioning in the ground field as a heuristic. The Lanczos approach allows an easy check. It can produce an orthogonal basis for the range of Ã, and with O(n) dot products one may orthogonalize a random vector with respect to this basis. The rank is quickly certified by checking that the result is in the null space. LinBox::determinant: The determinant is easily obtained from the constant term of the minimal polynomial of the preconditioning à = DA, where D is a random scaling. This has been used for some of the experiments in Sections 4. Integer computations. Most LinBox algorithms for integer and rational number computations are based on the finite field functionality. Minimal polynomial, determinant and system solution may be computed using Chinese remaindering, including early termination strategies. Integer linear systems, especially sparse ones, may also be solved using the p-adic lifting approach combined with the use of a black box for the inverse matrix modulo p. An efficient Monte Carlo rank determination is based on rank computations modulo random primes (see above). Specialized algorithms for Smith normal form computations (see also Section 4) and diophantine problems served during the early development of LinBox to validate the design of the library. We plan to provide additional algorithms for integral problems, like a sparse diophantine linear system solver. 4 Computational experiences What do we know in practice about the prospects for high performance exact linear algebra computations? In this section we cite some examples to illustrate the range of possibilities. The problems described here have been difficult to solve by other means. The alternative to LinBox’s exact linear algebra is numeric approximation in some cases and combinatorial methods in others. These alternative approaches suffer fatal numerical instability and/or linbox: submitted to World Scientific on May 20, 2002 8 exponential memory demand, which are avoided here. The first example is not a real application, but rather a posed Challenge problem. Its solution is illustrative of LinBox’s capability. A noteworthy fact is that a very long computation succeeded without exhausting memory. Computations with high time to memory ratio are rare in computer algebra. Trefethen’s Hundred Digits. Nick Trefethen has just posed a “Hundred Dollar, Hundred Digit Challenge.” Aimed at numerical analysts, the Challenge consists of 10 problems which have real number solutions. Ten digits of accuracy are asked in the answer to each. All of them are numerically difficult. Problem #7 is the computation of the (1, 1) entry of A−1, where A is the 20000× 20000 matrix whose entries are zero everywhere except for the primes 2, 3, 5, 7, . . . , 224737 along the main diagonal and the number 1 in all the positions aij with |i− j| = 1, 2, 4, 8, . . . , 16384. We took this on as a challenge for LinBox to produce the exact solution. It is a rational number whose numerator and denominator each has approximately 100,000 digits. To compute this number is well beyond the capability of present day general purpose systems such as Maple and Mathematica on current processors and memories. Indeed, it is beyond the capabilities of any software we know of, save LinBox. We have computed this exact answer using approximately 2 years of CPU time (about 180 CPUs running for 4 days). Trefethen has asked that we not reveal the solution or the details of our method before the deadline of his Challenge, May 20, 2002. However, we intend to include a more complete description of this computation in the final version of this paper. Also, we are experimenting with several approaches to this problem and will expand on their relative performances. Rank, competition with Gaussian elimination techniques. In table 2 we report some comparisons between Wiedemann’s algorithm and elimination with reordering for computing the rank. For the first method, we compute the minimal polynomial of D1A D2AD1, as seen in Subsection Preconditioning of Section 3. In order to show the usefulness of this approach, we compare it to elimination-based methods. As the matrices are sparse, some pre-elimination and on-the-fly reordering may be used to reduce fill-in and augment efficiency. We have chosen not to focus on this (Dumas’s thesis and references therein review some reordering heuristics and their application to the matrices). The timings in column “Gauß” of Table 2 have been obtained using the algorithm of the thesis, §5.1. The timings in column “SuperLU” are those of a generic version [www-sop.inria.fr/galaad/logiciels/synaps] of the SuperLU 2.0 numerical code [www.nersc.gov/~xiaoye/SuperLU]. For several cases, fill-in causes a failure of elimination. This is due to memory thrashing (MT). linbox: submitted to World Scientific on May 20, 2002 9 Our experiments show that as long as enough memory is available, elimination is very often more efficient when the matrix is already nearly triangular (cyclic8 m11, from Gröbner basis computation), when the matrix has a very small number of elements per row (nick; chi-j, from algebraic topology), or when the matrix is very unbalanced (bibd – Balanced Incomplete Block Design, from combinatorics). Nonetheless, when the mean number of elements per row is significant (greater than 5, say), Wiedemann’s algorithm is superior, even on small matrices, and is sometimes the only practical solution. Table 2. Rank modulo 65521, Elimination vs. Wiedemann on an Intel PIII, 1GHz, 1Gb. n ×m is the shape, Ω the number of non-zero elements, r the integer rank of the matrix, timings are in seconds. Matrix Ω, n×m, r Gauß SuperLU Wiedem. cyclic8 m11 2462970, 4562x5761, 3903 257.33 448.38 2215.36 bibd 22 8 8953560, 231x319770, 231 100.24 938.81 594.29 n4c6.b12 1721226, 25605x69235, 20165 188.34 1312.27 2158.86 mk13.b5 810810, 135135x270270, 134211 MT 44907.1 ch7-7.b5 211680, 35280x52920, 29448 2179.62 MT 2404.5 ch7-8.b5 846720, 141120x141120, 92959 5375.76 MT 29109.8 ch7-9.b5 2540160, 423360x317520, 227870 MT 210698 ch8-8.b5 3386880, 564480x376320, 279237 MT 363754 TF14 29862, 2644x3160, 2644 50.58 50.34 27.21 TF15 80057, 6334x7742, 6334 734.39 776.68 165.67 TF16 216173, 15437x19321, 15437 18559.4

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Certified Diophantine Solving of Linear Systems in LinBox

LinBox[6] is a C++ template library for high-performance exact linear algebra. It provides optimized facilities for solving rational systems, and computing invariants and canonical forms of linear operators. It acts as middleware on top of existing low-level software libraries for multiprecision integer arithmetic (GMP, NTL), finite field algebra (Givaro, NTL) and linear algebra (BLAS, ATLAS). ...

متن کامل

Towards a diagrammatic modeling of the LinBox C++ linear algebra library

We propose a new diagrammatic modeling language, DML. The paradigm used is that of the category theory and in particular of the pushout tool. We show that most of the object-oriented structures can be described with this tool and have many examples in C++, ranging from virtual inheritance and polymorphism to template genericity. With this powerful tool, we propose a quite simple description of ...

متن کامل

Elements of Design for Containers and Solutions in the LinBox Library

We describe in this paper new design techniques used in the C++ exact linear algebra library LinBox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for containers, based on our founding scope allocation model. We explain design choices and their impact on coding: unification of our matrix classes, clear...

متن کامل

Elements of Design for Containers and Solutions in the LinBox Library - Extended Abstract

We describe in this paper new design techniques used in the C++ exact linear algebra library LinBox, intended to make the library safer and easier to use, while keeping it generic and efficient. First, we review the new simplified structure for containers, based on our founding scope allocation model. We explain design choices and their impact on coding: unification of our matrix classes, clear...

متن کامل

LinBox Founding Scope Allocation, Parallel Building Blocks, and Separate Compilation

To maximize efficiency in time and space, allocations and deallocations, in the exact linear algebra library LinBox, must always occur in the founding scope. This provides a simple lightweight allocation model. We present this model and its usage for the rebinding of matrices between different coefficient domains. We also present automatic tools to speed-up the compilation of template libraries...

متن کامل

Genblas: Basic Linear Algebra Subroutines in C++ for All Fields

The BLAS, Basic Linear Algebra Subprograms, has existed in various forms for over two decades. It has long been heavily used in numerical linear algebra and now is finding application in symbolic linear algebraic computation. Recently, the BLAS Technical Forum [1] standardized the bindings (interface) for Fortran 77, Fortran 95, and C calls to BLAS routines. C bindings are still in development....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002