Reward Convergence: FDFO vs Flow-GRPO

Combined reward over training epochs

FDFO (Row E) converges faster and reaches higher reward than Flow-GRPO, reflecting its more efficient use of Jacobian-transported gradients.