Post-Training Loop Sample → Score → update prompt c noise x₀ Draw a sample to roll out Current flow model vθ sample an image trajectory Generated Image Reward Function Learned model or human labels map samples to scalar rewards Use reward to update θ, then resample