Improving Inter-subdomain Communication and Load-balancing for the Parallel Diﬀpack Library

Master thesis

2007

1.1 Background

Matrix and vector operations are a substantial part of the scientific computing

workload, and have been subject to much work and many optimisations in

order to increase the performance and efficiency. The introduction of parallel

computing and distributed data has complicated the work required to achieve

gains in performance, and several libraries have been written in an attempt

to hide much of the details and the difficulty involved with high-performance

parallel programming. With parallel computing came many new concepts and

challenges to computer science. For example the gain by using several processors

to do the work previously done by one was defined as speedup (Equation

1.1), where Tp is the time it takes the parallel implementation to complete the

same task (On p processors) as the serial implementation can do in time Ts.

Sp = Ts

(1.1)

Speedup equal to p (The number of processors) is called linear speedup, and

implies that doubling the number of processors halves the wall-time required

to complete the specific task. This is considered good speedup, but is difficult

to achieve due to the added communication between the processors in addition

to the computations they had to do initially. Sometimes super-linear speedup

(Sp > P) is observed, as splitting the domain over several processors will make

the sub-domains fit in a higher level of cache at the processors. One industry

standard library for inter-processor communication is the Message Passing Interface

(MPI [1]). To minimise the effects of communication it supports several

communication methods, deciding which one is best suited depends on the

situation.

Browse