A novel framework for joint spatiotemporal super-resolution using diffusion models has been proposed, enabling the enhancement of both spatial and temporal resolution in videos. This approach overcomes the limitations of existing models, which are often designed for a specific pair of super-resolution factors, thereby restricting their applicability to diverse scenarios. By leveraging diffusion models, the framework can adapt to varying scale requirements, making it more versatile and effective. The development of such a framework has significant implications, particularly in climate applications where high-resolution video data is crucial. The use of diffusion models in this context also raises concerns about potential state-aligned activity, as it shifts the threat model from criminal to geopolitical1. This shift necessitates a different approach to addressing potential security risks. The ability to enhance video resolution in both space and time will be crucial for practitioners working with video data, particularly in fields where high-resolution data is essential.