Dynamical systems, such as recurrent neural networks and Markov chain Monte Carlo, can be parallelized across sequence length by reevaluating their computation as a system of equations. This approach enables the application of parallel Newton methods, breaking traditional sequential bottlenecks. By leveraging massively parallel hardware like GPUs, machine learning models can now be trained more efficiently on long sequence data. The parallelization of dynamical systems has significant implications for large-scale machine learning applications, as it can substantially reduce computation time. Researchers have demonstrated that this method can be applied to various dynamical systems, making it a valuable tool for optimizing complex computations1. This breakthrough matters to practitioners because it can accelerate the development of more sophisticated AI models, which in turn can have far-reaching consequences for fields beyond technology, including policy, security, and workforce dynamics.