This preprint develops an on-policy decision-focused learning method for sequential contextual linear optimization with partial feedback.
The approach learns a stochastic predict-then-optimize policy and updates it with a hybrid gradient estimator combining score-function and decision-focused plug-in components. Experiments cover top-k selection, shortest path, combinatorial pricing, and energy scheduling.