Parallel Computing
I have given one lecture in the Parallel Computing course in place of Prof. dr. Dirk Roose who is the regular professor for this course. The subjects was Bulk Synchronous Parallel, with examples of an inner product algorithm and a sparse matrixvector multiplication algorithm. Also various distributions were introduced. The slides are available for download here.
Below follows some additional exercise material:
 the block vector distribution is unbalanced when n is not a multiple of p. But how bad can it be?
 Write a for the largest amount of vector elements a processor has after distribution, and b for the smallest amount of vector elements. Does a or b or both a and b affect the parallel execution time (disregarding communication costs)?
 Think on how to define a distribution function phi(i) which defines a more balanced block distribution.
 Is the worstcase execution time better with this new distribution?
 Does efficiency improve with this new distribution?

There are different ways of communicating partial results to all processors with the innerproduct algorithm. The lecture described an alltoall communication mechanism.
 For which of the following BSP computers does this method work best?
BSP computers with...
1. high latency costs l but a low message gap g
2. low l but high g
3. balanced l and g  Try to design BSP innerproduct algorithms which works well on the other two machines.
 For which of the following BSP computers does this method work best?
If you are curious whether the answers you found make sense, or are in need of hints to solve some problems, don't hesitate to swing by at my office (CS dept., building A, room 3.018). Alternatively you can email me, but the latency would be higher.
If you find yourself enticed to write parallel software for your multicore computer, I encourage you to look at MulticoreBSP; if you do, any feedback is always welcome!