If you own, operate, or invest in grid-scale batteries, you’ve probably asked at some point: who’s the best partner to optimize my battery?

As battery markets saturate and value shifts from ancillary to wholesale markets, there’s no “easy money” available, and optimizer capability—be it human led, software, or a combination of both—matters more. The difference between average and best-in-class optimization could easily double revenue achieved, while also minimizing risk and bringing other benefits.

That’s why we’re seeing more and more attempts to evaluate optimizer performance, both in ERCOT and other markets around the world. Some take the form of third-party leaderboards. Others live quietly inside investment committee proposals or internal dashboards. They usually start with the same logic: use public market data to calculate revenue per kilowatt, sort the results, and let the numbers speak for themselves.

It sounds straightforward, but is it? And what does it leave out beyond the numbers?

The illusion of simplicity

On paper, revenue ranking looks like a simple and objective way to evaluate optimizer performance. In ERCOT and Australia, the data is readily available. We know how much revenue each battery made and how big they are. Why not just rank them?

Revenue results do reflect optimizer skill, but they’re also shaped by many other factors outside the optimizer’s control.

Take battery duration, for example. A two-hour battery will typically earn more per kW than a one-hour battery, so ranking by $/kW biases toward longer-duration assets. An ideal leaderboard would normalize for this to make an apples-to-apples comparison, but that requires complex analysis and assumptions that can materially change the outcome.

The list of factors that need to be normalized for is long, and while some can reasonably be adjusted for, others are simply impossible. Location in particular makes a major difference in markets like ERCOT and Australia, where nodal prices and congestion vary widely. Adjusting for those effects isn’t straightforward, especially since larger batteries can influence congestion and prices through their own actions.

In the end, even a well-intentioned leaderboard quickly becomes a methodological balancing act. How you adjust—and what you choose to ignore—can dramatically shape the results.

So should we give up on leaderboards?

No. Despite the challenges, we believe leaderboards and performance analysis more generally can be valuable when done carefully.

A well-constructed performance analysis helps asset owners ask smarter questions. It highlights where strategy is driving returns. It pushes the industry to move beyond vague claims of “market leading” and toward measurable, comparable outcomes.

It’s about the journey, not just the destination 

You may find the insights that can be gained on the way to producing your own leaderboard are as valuable as the final results. Which ancillary markets offered best value in Q1? What effect are rule changes having? How much did batteries earn this year versus last year? Where should I put my next battery, and what duration should it be? The answers are right there in the data.

So get in there and see what you can learn. In ERCOT, we think $/kW approaches are the best place to start, so long as appropriate adjustments are applied. Other more complex approaches like percent of TB2 captured or Percent of Perfect can be a good compliment too (particularly in Australia, but more on that another day). 

How to build a better leaderboard

A good leaderboard reflects the revenue performance driven by what optimizers can actually influence, and controls for what they can’t.

So what do optimizers influence beyond the core trading decisions? An optimizer will likely be a big part of determining which ancillary services can be accessed, managing and scheduling around outages or the risk of outage, navigating and even renegotiating warranty terms, and adapting strategy to market conditions and rule changes. An ideal leaderboard captures these effects.

Then the factors optimizers don’t have control over. That means normalizing revenue data for battery duration, cycling rates (since heavier cycling accelerates degradation), and any unavailability or derates caused by physical performance issues. 

At the same time, it’s important to acknowledge what can’t be easily observed or adjusted for. This includes local congestion and nodal price elasticity, the impact of battery size on local prices, risk constraints imposed by the asset owner, hedging and portfolio optimization. Revenue coming from wholesale market reoptimization and ancillary service penalties aren’t usually visible in the public data either. Keep these in mind when you interpret leaderboard results.

No leaderboard will be perfect. But by normalizing what you can, disclosing what you can’t, and understanding what falls somewhere in between, you can build something far more useful: a tool that sparks better questions and sharper decision-making.

What optimizers bring beyond revenue results

Leaderboards are useful, but even the best ones only tell part of the story. The role of an optimizer goes far beyond maximizing trading revenues.

A strong optimization partner helps bring a battery to commercial operation on time. They navigate testing, qualification, and integration. They manage physical performance over time, working closely with O&M teams to maintain availability and asset health. They also play a role in managing risk—both through market strategy and longer-term hedging. And they serve as a critical source of insight, providing reporting, market context, and strategic input to help asset owners plan what’s next.

These aren’t merely ‘nice to haves.’ They drive investment returns and help asset owners meet their strategic objectives just as much as dispatch strategy.

Understanding optimizer performance in context takes more effort, but it leads to better questions, better decisions, and better partnerships.

By Ben Irons, Chief Strategy Officer