Datadog’s new AI model, Toto, marks a major leap in time series forecasting for observability. Designed with a focus on accuracy, reliability, and responsible AI use, Toto helps improve infrastructure performance by generating secure, interpretable forecasts. While it’s not built for general-purpose tasks, its architecture and pre-training corpus set new benchmarks across observability datasets. Future work includes integrating multi-modal data, enhancing model scalability, and aligning forecasting models with conversational AI for more intuitive troubleshooting and planning.Datadog’s new AI model, Toto, marks a major leap in time series forecasting for observability. Designed with a focus on accuracy, reliability, and responsible AI use, Toto helps improve infrastructure performance by generating secure, interpretable forecasts. While it’s not built for general-purpose tasks, its architecture and pre-training corpus set new benchmarks across observability datasets. Future work includes integrating multi-modal data, enhancing model scalability, and aligning forecasting models with conversational AI for more intuitive troubleshooting and planning.

Toto: Time Series Optimized Transformer for Observability

  1. Background
  2. Problem statement
  3. Model architecture
  4. Training data
  5. Results
  6. Conclusions
  7. Impact statement
  8. Future directions
  9. Contributions
  10. Acknowledgements and References

Appendix

6 Conclusions

Toto, through a novel architecture and pre-training corpus, demonstrates state-of-the-art performance both on public benchmarks and on the Datadog observability benchmark. We look forward to sharing many more technical details, experiments, and benchmark results in a forthcoming paper.

7 Impact statement

In developing Toto, Datadog follows a structured approach to ensure responsible development, focusing on identifying, assessing, and mitigating potential risks associated with the use of our model. Given that Toto is not intended for mass distribution and specifically generates time series forecasts for observability data, the potential harms are considerably lower compared to more general-purpose models. At Datadog, our primary focus is on ensuring the accuracy, reliability, and security of the forecasts generated by Toto, which are crucial for maintaining and optimizing infrastructure and application performance.

\ We carefully analyze the potential for Toto to produce incorrect or misleading forecasts that could impact decision-making processes in critical systems. Additionally, we consider the implications of Toto's performance across diverse datasets, ensuring it can generalize well without introducing significant errors.

8 Future directions

Many exciting areas of exploration remain for further study. If you are interested in working with us, please reach out to the authors by email.

\ Some future research questions that particularly intrigue us include:

\ • Multi-modal inputs: Incorporate additional input modalities such as query metadata and captions to enhance forecast performance.

\ • Autonomous troubleshooting agents: Augment Datadog's AI agents [50] for troubleshooting and incident response by integrating modality-specific foundation models like Toto to improve their reasoning and planning capabilities with telemetry data.

\ • Conversational interfaces: Align time series forecasting models with LLMs to develop conversational agents capable of interpreting and reasoning about time series data.

\ • Model enhancements and scaling: Explore ways to improve and scale model performance through optimizations such as new types of input embeddings, attention mechanisms, and examining alternative variate groupings to capture richer interactions.

9 Contributions

Contributors are listed in alphabetical order.

\ Othmane Abou-Amal, Joseph Banks, Mayeul Blanzat, Ben Cohen, Youssef Doubli, Ben Hinthorne, Emaad Khwaja, Jared Ledvina, Charles Masson, Sajid Mehmood, Elise Ram´e, Maxime Visonneau, Kan Wang.

10 Acknowledgements

Our work is made possible by the efforts of numerous teams at Datadog. Special thanks and acknowledgement to:

\ Johan Andersen, Roashan Ayene, Romoli Bakshi, Kevin Beach, Bill Birkholz, Rob Boll, Maxim Brown, Benedetto Buratti, Marion Chan-Renous, Jessica Cordonnier, Ben Donohue, Zakaria Fikrat, Quentin Franc¸ois, Erica Hale, Michael Hoang, Joe Jones, Max Livingston, Jesse Mack, Amine Naouas, Sean O'Connor, Brendan Rhoads, Phil Sarin, Vyom Shah, Aaron Taa, Bharath Vontimitta, Dominique West, Steven Zhou.

References

[1] Datadog. Observability platform, 2024. URL https://www.datadoghq. com/monitoring/observability-platform/.

\ [2] Datadog. Modern infrastructure monitoring, 2024. URL https://www. datadoghq.com/product/infrastructure-monitoring/.

\ [3] Rob J Hyndman and George Athanasopoulos. Forecasting: Principles and Practice. OTexts, 3rd edition, 2021. URL https://otexts.com/fpp3/.

\ [4] Robert Fildes, Mich`ele Hibon, Spyros Makridakis, and Nigel Meade. Generalising about univariate forecasting methods: further empirical evidence. International Journal of Forecasting, 14:339–358, 9 1998. ISSN 01692070. doi: 10.1016/S0169-2070(98)00009-0.

\ [5] Simon Stevenson. A comparison of the forecasting ability of arima models. Journal of Property Investment & Finance, 25:223–240, 5 2007. ISSN 1463-578X. doi: 10.1108/14635780710746902.

\ [6] Charisios Christodoulos, Christos Michalakelis, and Dimitris Varoutas. Forecasting with limited data: Combining arima and diffusion models. Technological Forecasting and Social Change, 77:558–565, 5 2010. ISSN 00401625. doi: 10.1016/j.techfore.2010.01.009.

\ [7] David Salinas, Valentin Flunkert, Jan Gasthaus, and Tim Januschowski. Deepar: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36:1181–1191, 2020. ISSN 0169-2070. doi: https://doi.org/10.1016/j.ijforecast.2019.07.001. URL https://www.sciencedirect.com/science/article/pii/ S0169207019301888.

\ [8] Eoin Brophy, Zhengwei Wang, Qi She, and Tom´as Ward. Generative adversarial networks in time series: A systematic literature review. ACM Computing Surveys, 55:1–31, 10 2023. ISSN 0360-0300. doi: 10.1145/3559540.

\ [9] Zhihao Jia, Sina Lin, Charles R Qi, and Alex Aiken. Exploring the hidden dimension in accelerating convolutional neural networks, 2018. URL https://openreview.net/forum?id=SJCPLLpaW.

\ [10] Weizheng Xu, Youtao Zhang, and Xulong Tang. Parallelizing dnn training on gpus: Challenges and opportunities. pages 174–178. ACM, 4 2021. ISBN 9781450383134. doi: 10.1145/3442442.3452055.

\ [11] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. volume 30. Curran Associates, Inc., 2017. URL https://papers.nips.cc/paper_files/paper/2017/ hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.

\ [12] Haixu Wu, Jiehui Xu, Jianmin Wang, and Mingsheng Long. Autoformer: Decomposition transformers with auto-correlation for longterm series forecasting. 2021. URL https://openreview.net/forum? id=J4gRj6d5Qm.

\ [13] Haoyi Zhou, Shanghang Zhang, Jieqi Peng, Shuai Zhang, Jianxin Li, Hui Xiong, and Wan Zhang. Informer: Beyond efficient transformer for long sequence time-series forecasting. 2020. URL https://api. semanticscholar.org/CorpusID:229156802.

\ [14] Yuqi Nie, Nam H Nguyen, Phanwadee Sinthong, and Jayant Kalagnanam. A time series is worth 64 words: Long-term forecasting with transformers. 2023. URL https://openreview.net/forum? id=Jbdc0vTOcol. [15] Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. 2024. URL https://openreview.net/forum? id=Yd8eHMY1wz.

\ [16] Yunhao Zhang and Junchi Yan. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=vSVLM2j9eie.

\ [17] Yong Liu, Tengge Hu, Haoran Zhang, Haixu Wu, Shiyu Wang, Lintao Ma, and Mingsheng Long. itransformer: Inverted transformers are effective for time series forecasting. 2024. URL https://openreview. net/forum?id=JePfAI8fah.

\ [18] Romain Ilbert, Ambroise Odonnat, Vasilii Feofanov, Aladin Virmaux, Giuseppe Paolo, Themis Palpanas, and Ievgen Redko. SAMformer: Unlocking the potential of transformers in time series forecasting with sharpness-aware minimization and channel-wise attention. In Fortyfirst International Conference on Machine Learning, 2024. URL https:// openreview.net/forum?id=8kLzL5QBh2.

\ [19] Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. A decoderonly foundation model for time-series forecasting. In Forty-first International Conference on Machine Learning, 2024. URL https://openreview. net/forum?id=jn2iTJas6h.

\ [20] Tianyang Lin, Yuxin Wang, Xiangyang Liu, and Xipeng Qiu. A survey of transformers. CoRR, abs/2106.04554, 2021. URL https://arxiv.org/ abs/2106.04554.

\ [21] Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, and Yuyang Wang. Chronos: Learning the language of time series, 2024. URL https:// arxiv.org/abs/2403.07815.

\ [22] Azul Garza and Max Mergenthaler-Canseco. Timegpt-1, 2023.

\ [23] Kashif Rasul, Arjun Ashok, Andrew Robert Williams, Arian Khorasani, George Adamopoulos, Rishika Bhagwatkar, Marin Biloˇs, Hena Ghonia, Nadhir Hassen, Anderson Schneider, Sahil Garg, Alexandre Drouin, Nicolas Chapados, Yuriy Nevmyvaka, and Irina Rish. Lagllama: Towards foundation models for time series forecasting. In R0- FoMo:Robustness of Few-shot and Zero-shot Learning in Large Foundation Models, 2023. URL https://openreview.net/forum?id=jYluzCLFDM.

\ [24] Nate Gruver, Marc Anton Finzi, Shikai Qiu, and Andrew Gordon Wilson. Large language models are zero-shot time series forecasters. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=md68e8iZK1.

\ [25] Alec Radford and Karthik Narasimhan. Improving language understanding by generative pre-training. 2018. URL https://api. semanticscholar.org/CorpusID:49313245.

\ [26] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. URL https://api.semanticscholar.org/CorpusID: 160025533.

\ [27] Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. On layer normalization in the transformer architecture, 2020. URL https:// openreview.net/forum?id=B1x8anVFPr.

\ [28] Biao Zhang and Rico Sennrich. Root Mean Square Layer Normalization. In Advances in Neural Information Processing Systems 32, Vancouver, Canada, 2019. URL https://openreview.net/references/pdf? id=S1qBAf6rr.

\ [29] Noam Shazeer. Glu variants improve transformer, 2020. URL https:// arxiv.org/abs/2002.05202.

\ [30] Jean-Baptiste Cordonnier, Andreas Loukas, and Martin Jaggi. On the relationship between self-attention and convolutional layers. In 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. URL https:// openreview.net/forum?id=HJlnC1rKPB. [

\ \ 31] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.

\ [32] Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. Msa transformer. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 8844–8856. PMLR, 18–24 Jul 2021. URL https://proceedings.mlr.press/v139/rao21a.html.

\ [33] Anurag Arnab, Mostafa Dehghani, Georg Heigold, Chen Sun, Mario Luci´c, and Cordelia Schmid. Vivit: A video vision transformer. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pages 6816–6826, 2021. doi: 10.1109/ICCV48922.2021.00676.

\ [34] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.

\ [35] Yutao Sun, Li Dong, Barun Patra, Shuming Ma, Shaohan Huang, Alon Benhaim, Vishrav Chaudhary, Xia Song, and Furu Wei. A length-extrapolatable transformer. In ACL 2023, December 2022. URL https://www.microsoft.com/en-us/research/publication/ a-length-extrapolatable-transformer/.

\ [36] Abhimanyu Das, Weihao Kong, Andrew Leach, Shaan K Mathur, Rajat Sen, and Rose Yu. Long-term forecasting with tiDE: Time-series dense encoder. Transactions on Machine Learning Research, 2023. ISSN 2835-8856. URL https://openreview.net/forum?id=pCbC3aQB5W.

\ [37] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.

\ [38] D. Peel and G.J. McLachlan. Robust mixture modelling using the t distribution. Statistics and Computing, 10(4):339–348, 2000.

\ [39] Mika Meitz, Daniel P. A. Preve, and Pentti Saikkonen. A mixture autoregressive model based on student’s t–distribution. Communications in Statistics - Theory and Methods, 52:499 – 515, 2018. URL https://api. semanticscholar.org/CorpusID:73615847.

\ [40] C. S. WONG, W. S. CHAN, and P. L. KAM. A student t -mixture autoregressive model with applications to heavy-tailed financial data. Biometrika, 96(3):751–760, 2009. ISSN 00063444, 14643510. URL http:// www.jstor.org/stable/27798861.

\ [41] Taesung Kim, Jinhee Kim, Yunwon Tae, Cheonbok Park, Jang-Ho Choi, and Jaegul Choo. Reversible instance normalization for accurate timeseries forecasting against distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum? id=cGDAkQo1C0p.

\ [42] Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=Bkg6RiCqY7.

\ [43] Datadog. Querying, 2024. URL https://docs.datadoghq.com/ dashboards/querying/.

\ [44] Haixu Wu, Tengge Hu, Yong Liu, Hang Zhou, Jianmin Wang, and Mingsheng Long. Timesnet: Temporal 2d-variation modeling for general time series analysis. In International Conference on Learning Representations, 2023.

\ [45] Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. Are transformers effective for time series forecasting? Proceedings of the AAAI Conference on Artificial Intelligence, 37(9):11121–11128, Jun. 2023. doi: 10.1609/aaai. v37i9.26317. URL https://ojs.aaai.org/index.php/AAAI/article/ view/26317.

\ [46] Minhao LIU, Ailing Zeng, Muxi Chen, Zhijian Xu, Qiuxia LAI, Lingna Ma, and Qiang Xu. SCINet: Time series modeling and forecasting with sample convolution and interaction. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/ forum?id=AyajSjTAzmg.

\ [47] Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proc. 39th International Conference on Machine Learning (ICML 2022), 2022.

\ [48] J. Scott Armstrong. Long-range Forecasting: From Crystal Ball to Computer. John Wiley & Sons, New York, 1985. ISBN 9780471822608.

\ [49] R. J Hyndman and A. B. Koehler. Another look at measures of forecast accuracy. International Journal of Forecasting, 22, 2006.

\ [50] Datadog. Bits ai: Reimagining the way you run operations with autonomous investigations, 2024. URL https://www.datadoghq.com/ blog/bits-ai-autonomous-investigations.

Appendix

A.1 Model architecture

\ \ Table A.1. Hyperparameters for Toto

\ \ A.2 Results

\ \ Table A.2. Performance metrics for various models. Key: Best results, Second-best results.

\ \ \

:::info Authors:

(1) Ben Cohen (ben.cohen@datadoghq.com);

(2) Emaad Khwaja (emaad@datadoghq.com);

(3) Kan Wang (kan.wang@datadoghq.com);

(4) Charles Masson (charles.masson@datadoghq.com);

(5) Elise Rame (elise.rame@datadoghq.com);

(6) Youssef Doubli (youssef.doubli@datadoghq.com);

(7) Othmane Abou-Amal (othmane@datadoghq.com).

:::


:::info This paper is available on arxiv under CC BY 4.0 license.

:::

\

Market Opportunity
null Logo
null Price(null)
--
----
USD
null (null) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

This Former Goldman Sachs Analyst Forecasts XRP’s Exponential Price Boom To $1,000 By 2030

This Former Goldman Sachs Analyst Forecasts XRP’s Exponential Price Boom To $1,000 By 2030

A Former Goldman analyst has ignited a firestorm across the X social media platform with his bold XRP price prediction of $1,000 by 2030.
Share
Coinstats2026/01/25 06:00
This was murder. It is time to rise against Trump

This was murder. It is time to rise against Trump

I don’t have all the details yet but it appears that Trump’s goons have murdered another American in Minneapolis.This is the third shooting involving federal agents
Share
Rawstory2026/01/25 05:53
BetFury is at SBC Summit Lisbon 2025: Affiliate Growth in Focus

BetFury is at SBC Summit Lisbon 2025: Affiliate Growth in Focus

The post BetFury is at SBC Summit Lisbon 2025: Affiliate Growth in Focus appeared on BitcoinEthereumNews.com. Press Releases are sponsored content and not a part of Finbold’s editorial content. For a full disclaimer, please . Crypto assets/products can be highly risky. Never invest unless you’re prepared to lose all the money you invest. Curacao, Curacao, September 17th, 2025, Chainwire BetFury steps onto the stage of SBC Summit Lisbon 2025 — one of the key gatherings in the iGaming calendar. From 16 to 18 September, the platform showcases its brand strength, deepens affiliate connections, and outlines its plans for global expansion. BetFury continues to play a role in the evolving crypto and iGaming partnership landscape. BetFury’s Participation at SBC Summit The SBC Summit gathers over 25,000 delegates, including 6,000+ affiliates — the largest concentration of affiliate professionals in iGaming. For BetFury, this isn’t just visibility, it’s a strategic chance to present its Affiliate Program to the right audience. Face-to-face meetings, dedicated networking zones, and affiliate-focused sessions make Lisbon the ideal ground to build new partnerships and strengthen existing ones. BetFury Meets Affiliate Leaders at its Massive Stand BetFury arrives at the summit with a massive stand placed right in the center of the Affiliate zone. Designed as a true meeting hub, the stand combines large LED screens, a sleek interior, and the best coffee at the event — but its core mission goes far beyond style. Here, BetFury’s team welcomes partners and affiliates to discuss tailored collaborations, explore growth opportunities across multiple GEOs, and expand its global Affiliate Program. To make the experience even more engaging, the stand also hosts: Affiliate Lottery — a branded drum filled with exclusive offers and personalized deals for affiliates. Merch Kits — premium giveaways to boost brand recognition and leave visitors with a lasting conference memory. Besides, at SBC Summit Lisbon, attendees have a chance to meet the BetFury team along…
Share
BitcoinEthereumNews2025/09/18 01:20