PASCO 2015 Presentations

9:30am Thursday June 25th, 2015, in K9509.


A Compact Parallel Implementation of F4

Roman Pearce

A parallel implementation for polynomial multiplication modulo a prime

Marshall Law

Optimizing and Parallelizing the Modular GCD Algorithm

Michael Monagan



Abstracts

A Compact Parallel Implementation of F4

We present a compact parallel implementation of the F4
algorithm for computing Groebner bases.  Our C library
is just 30KB and computes mod a 31 bit prime.  We use
Maple code to perform Chinese remaindering and rational
reconstruction to compute over Q.  We show how to easily
parallelize the sparse linear algebra using Cilk, and
we discuss our other experiments in parallelization.


A parallel implementation for polynomial multiplication modulo a prime.

We present a parallel implementation in Cilk C of a modular
algorithm for multiplying two polynomials in Z_q[x] for
integer q>1, for multi-core computers.  Our algorithm uses
Chinese remaindering.  It multiplies modulo primes p_1,p_2,...
in parallel and uses a parallel FFT for each prime.  Our
software multiplies two polynomials of degree 10^9 modulo
a 32 bit integer q in 83 seconds on a 20 core computer.


Optimizing and Parallelizing the Modular GCD Algorithm

Our goal is to design and implement a high performance modular
GCD algorithm for polynomial GCD computation in Z_p[x1,x2,...,xn]
for multi-core computers which will be used to compute the GCD
of polynomials over Z.  For n=2 we have designed and implemented 
in C a highly optimized serial code for primes p<2^63.
For n>2 we parallelized in Cilk C Brown's dense modular GCD 
algorithm using our serial bivariate code at the base.
For n=3, we obtain good parallel speedup on multi-core computers
with 16 and 20 cores.  We also compare our code with the GCD codes
in Maple and Magma.