Twelve Ways to Fool the Masses when giving Perfomance results on Parallel Computers

I came across one of the most interesting and humorous research papers  while doing my nightly reads. The paper is titled Twelve Ways to Fool the Masses When Giving Performance Results on Parallel Computers by David H. Bailey and published in 1991. You can download the full paper  here.
The title describes exactly what the paper is about and I’ll just share some interesting snippets from the document.

To quote in part the abstract:
Many of us in the field of highly parallel scientific computing recognize that it is often quite difficult to match the run time performance of the best conventional supercomputers.  But since lay persons usually don’t appreciate these difficulties and therefore don’t understand when we quote mediocre performance results, it is often necessary for us to adopt some advanced techniques in order to deflect attention from possibly unfavorable facts

I’ll just list the 12 ways on how  this is achieved but I would advise you to read the detailed explanation for each( very much worth the read  if you ask me). So next time, the numbers look like they have been taken to a spa and massaged, please look out for the tricks of the trade listed below:

  1. Quote only 32-bit performance results, not 64-bit results.
  2. Present performance figures for an inner kernel, and then represent these figures as the performance of the entire application.
  3. Quietly employ assembly code and other low-level language constructs.
  4. Scale up the problem size with the number of processors, but omit any mention of this fact.
  5. Quote performance results projected to a full system
  6. Compare your results against scalar, unoptimized code on Crays.
  7. When direct run time comparisons are required, compare with an old code on an obsolete system.
  8. If MFLOPS rates must be quoted, base the operation count on the parallel
    implementation, not on the best sequential implementation
  9. Quote performance in terms of processor utilization, parallel speedups or MFLOPS per dollar.
  10. Mutilate the algorithm used in the parallel implementation to match the architecture.
  11. Measure parallel run times on a dedicated system, but measure conventional run times in a busy environment.
    aaand to crown it all, sure we’ve all seen this:
  12. If all else fails, show pretty pictures and animated videos, and don’t talk about performance.
    and the explanation:
    It sometimes happens that the audience starts to ask all sorts of embarrassing questions. These people simply have no respect for the authorities of our field.  If you are so unfortunate as to be the object of such disrespect, there is always a way out — simply conclude your technical presentation and roll the videotape. Audiences love razzle-dazzle color graphics, and this material often helps deflect attention from the substantive technical issues.

Leave a Reply

Your email address will not be published. Required fields are marked *