Post-Training Loop
Sample → Score → update
prompt c
noise x₀
Draw a sample to roll out
Current flow model vθ
sample an image trajectory
Generated Image
Reward Function
Learned model or human labels
map samples to scalar rewards
Use reward to update θ, then resample