While this first parallel version is slow, the demonstrated tendency to overtake the sequential version shows that it can be greatly improved. Because the parallel code I have examined is in its most simple state, almost any optimization that can be made will improve its performance over the sequential version. As I hope to continue this research in the future, the first optimization I want to make would allow the master to take a share of the work in computation. While this will provide for some speedups, my main concern is to prevent the processors from being idle. A partial solution distributes more work to an idle slave as soon as the return packet is received, before processing any of the information it has returned. This will be slightly more difficult to manage, but will decrease slave idleness, and may compensate for master inactivity. Another optimization would allow idle slaves to help complete the work of other active slaves.
In order to reduce communication overhead, an initial board set up message can be sent to each slave with the information that applies to all moves it will evaluate at the current level, sending small packets with the actual work as it relates to the given board. Such a change reduces the size of the messages that the master sends out to the slaves and reduces the amount of overhead required by the slave each time it receives a work message from the master.
In order to eliminate the startup overhead encountered at the beginning of each move search, the maximum number of slaves should be allocated at the beginning of the game. This is a feature I have not implemented because my slaves busy wait, or are not idle while waiting for messages, at this point and use resources even when they are not solving problem. I will change this soon.
Lastly, in the current algorithm, the first set of work that the master sends to the slaves is a set of completed boards to statically evaluate. Have the master evaluate all of these and start distributing work one level up may be more efficient. In this case the amount of time it would take to package and send the board, wait for and receive the results, and then process the results might be equivalent to, or less than, the amount of time it would take the master to merely evaluate the board.