Decision-Focused On-Policy Learning for Contextual Linear Optimization with Partial Feedback

Publication
arXiv preprint

This preprint develops an on-policy decision-focused learning method for sequential contextual linear optimization with partial feedback.

The approach learns a stochastic predict-then-optimize policy and updates it with a hybrid gradient estimator combining score-function and decision-focused plug-in components. Experiments cover top-k selection, shortest path, combinatorial pricing, and energy scheduling.