Managers often assess similar performance very differently, even when results and behaviours appear nearly identical.
This inconsistency is more common than many organisations realise, and it can have a significant impact on fairness, trust, and decision-making in the workplace.
In this final article in the Rethinking Performance series, Camille Rabier, Consultant at 21st Century, explores why these differences occur — and how organisations can create more consistent, credible approaches to evaluating performance.
When judgement diverges
Consider two managers assessing the performance of their respective teams in different contexts. Both teams have delivered comparable financial outcomes and exhibit similar patterns of behaviour, supported by the same set of indicators, yet the managers’ evaluations differ. The same information leads to different conclusions, reflecting differences in how performance is interpreted. One manager may place greater emphasis on the strength of the outcome, viewing the behaviour as sufficient given the result achieved. The other may focus more closely on how the outcome was delivered, identifying gaps in consistency, decision-making, or the way trade-offs were managed.
This underscores that, when not aligned, behavioural evaluation relies on interpretation and human judgement. Managers are required to identify what occurred and judge its relevance, quality, and impact in context, all through the lens of their individual interpretation and judgement. These differences, although expected, cannot remain unresolved. If similar performance leads to materially different evaluations, the system loses credibility, creating uncertainty about what is valued and raising questions of fairness.
The challenge, therefore, is to determine how these differences in interpretation should be reconciled across the organisation.
Why judgement diverges
Divergence in judgement reflects differences in how managers interpret and weigh the same evidence. Until criteria are fully aligned, managers retain discretion to decide what matters most in each situation, and differences typically arise from how they:
- Weight performance: some prioritise outcomes, particularly where delivery is critical, while others place greater emphasis on how those outcomes are achieved, especially where behaviours have longer-term implications.
- Interpret behaviour: indicators signal that a behaviour is present, but not its quality. What one manager sees as effective collaboration another may view as partial or inconsistent, depending on their expectations and experience.
- Account for context: the same behaviour may be appropriate in one situation and less effective in another. Without a shared understanding of how context should influence evaluation, its relevance and impact may be judged differently.
- Assess over time: some managers focus on immediate delivery, while others consider longer-term consequences such as sustainability, capability building, or team dynamics.
These differences are not inherently problematic; they reflect the role of judgement in behavioural evaluation. However, when they are not surfaced and aligned, they lead to materially different outcomes for similar performance, undermining the credibility of the system over time and highlighting the need for calibration to ensure that judgement is applied consistently.
Calibrating judgement in practice
In practice, this means that when managers arrive at different evaluations, the focus should not be on reconciling scores, but on aligning the reasoning behind them. Variation in judgement is expected; it reflects the inherently interpretive nature of behavioural evaluation. Consistency is not achieved by removing differences, but by calibrating how judgement is applied. This helps ensure that managers use a shared set of criteria and a consistent logic when evaluating performance.
Alignment does not require managers to reach identical conclusions, but to apply a consistent approach to how performance is interpreted and weighted. In practice, this means making judgement explicit: what counts as evidence, how it is interpreted, and how it is weighted in reaching a conclusion. The focus therefore shifts from conclusions alone to the reasoning behind them.
This process creates a common structure for evaluating performance, enabling meaningful comparison across different situations. Differences can then be examined as differences in interpretation, rather than as simple disagreement. The objective is not to determine who is right, but to assess whether similar situations are being evaluated consistently and to refine expectations where they are not. Through structured discussion, shared principles begin to emerge. Over time, this builds a shared understanding of what constitutes effective performance in practice, and managers begin to align not only how they evaluate performance, but how they interpret behaviour, account for context, and weigh trade-offs.
Consistency, therefore, does not come from uniformity, but from aligning how judgement is applied.
Implications for performance systems
Consistency in performance systems depends on how judgement is applied in practice across the organisation, not simply on the design of the model itself. Without this, the model risks becoming superficial, providing structure in form while judgement continues to be applied inconsistently. This shifts the focus from defining performance to how it is interpreted and evaluated in practice. Indicators are necessary to make behaviour observable and evaluable, but they are insufficient without mechanisms to ensure that judgement is applied consistently against shared principles. Performance systems must therefore do more than capture outcomes and behaviours; they must integrate both and make the reasoning behind evaluations visible, including how behaviour is interpreted, how context is accounted for, and how trade-offs are weighed.
This requires creating space for calibration, embedding structured discussion into evaluation processes, and establishing shared expectations for how evidence is interpreted and weighted in context. Through this, judgement is structured and aligned. Without alignment, even well-defined models will produce inconsistent outcomes. With it, variation in judgement becomes explainable, comparable, and defensible, strengthening fairness by ensuring that similar performance is evaluated through consistent reasoning.
Conclusion
The challenge of complete performance evaluation is therefore not only defining performance but ensuring it can be assessed consistently and fairly in practice. While behaviour can be made more visible through structured models and other tools, performance ultimately reflects both outcomes and how those outcomes are achieved, and its evaluation depends on human judgement. Differences in judgement are not a flaw to be eliminated, but they remain the variable that drives inconsistency when left unaligned. Performance systems should therefore aim to shape how judgement is applied.
It is only by structuring and aligning this judgement that performance can be evaluated more consistently, credibly, and meaningfully.


