If you need something to manage and compare benchmarking results across multiple machines, Primate Labs is a good option. The company offers a variety of tools, including Geekbench, a widely used cross-platform processor benchmark, and Geekbench Browser for managing and publishing results. The suite lets you see how your hardware performs on different operating systems, so it's a good option for your needs.
If you want something more narrowly focused on testing and evaluating AI models, BenchLLM is designed to let developers create suites of tests for their models and generate quality reports. It supports automated, interactive and custom evaluation methods, which can be easily integrated into CI/CD pipelines and used to monitor performance regressions in production. It can help you keep track of your AI application's performance with less work.