A large-scale food-delivery marketplace has successfully deployed a reinforcement learning system to adapt dispatch objective weights, leveraging delayed feedback from operational outcomes such as delivery speed and merchant congestion. This system utilizes multi-agent reinforcement learning to optimize dispatch decisions in a three-sided marketplace, where decisions are evaluated by delayed marketplace feedback. The system's objective-weight adaptation enables it to balance competing priorities, such as delivery speed and courier utilization, in real-time. By incorporating delayed feedback, the system can respond to changes in the marketplace and improve its decision-making over time1. This development has significant implications for the field of reinforcement learning, as it demonstrates the potential for these systems to be applied in complex, real-world environments. The use of reinforcement learning in this context also raises important questions about the potential risks and benefits of these systems, particularly in situations where they may be used to inform decisions with geopolitical implications, so what matters to practitioners is the need to reassess their threat models in light of state-aligned activity involving reinforcement learning.