Why aren’t the differences between native GPU and x86 FLOP counts even greater?

In the end, much of the code uses simpler operations (add, multiply, divide, etc) which counts as one FLOP under both systems. The instructions which are much more different (e.g. exp(x)) are more rare (say 1/10th the number of instructions) and thus the overall difference is closer to 2x.