Amazon workers are gaming the AI leaderboard. HR built it

Inside Amazon's "tokenmaxxing" scandal lies a textbook warning about what happens when organisations measure AI adoption rather than AI value

By Stephen Owens

13 May 2026

Amazon employees are using an internal AI tool to run unnecessary, low-value tasks – not because the work needs doing, but because the activity inflates their scores on a company leaderboard tracking artificial intelligence usage.

The practice, which workers have taken to calling "tokenmaxxing," was reported yesterday by the Financial Times and has since spread as a case study across the technology industry. It exposes, in unusually vivid terms, what happens when a productivity metric becomes a target: the metric stops measuring productivity, and starts measuring the human capacity for gaming.

The episode is not merely a Silicon Valley curiosity. For HR leaders who are currently designing, implementing or being asked to justify AI adoption programs – which is to say, most of them – it is a live demonstration of a failure mode that is easy to build and hard to reverse.

What Amazon built, and what went wrong

Amazon had been widely deploying MeshClaw, an in-house agentic AI product that allows employees to create software agents capable of connecting to workplace tools and completing tasks on a user's behalf. The bot can initiate code deployments, triage emails and interact with applications including Slack, according to the Financial Times. An internal memo described it in terms that will be familiar to anyone who has sat through an AI all-hands briefing: it "dreams overnight to consolidate what it learned, monitors your deployments while you're in meetings and triages your email before you wake up."

More than three dozen Amazon engineers worked on the tool. The company positioned it as empowering "thousands of Amazonians to automate repetitive tasks each day."

But Amazon had also introduced targets requiring more than 80 per cent of its developer workforce to use AI tools each week, and had begun tracking token consumption – the units of data processed by AI models, essentially a meter of how much the tools are being run – on internal leaderboards. Team-wide statistics were initially visible to all staff before being restricted so only employees and their managers could view them.

The result was predictable to anyone who has studied organisational behaviour. "There is just so much pressure to use these tools," one Amazon employee told the Financial Times. "Some people are just using MeshClaw to maximise their token usage." Another said the data was being watched regardless of official policy: "Managers are looking at it. When they track usage it creates perverse incentives and some people are very competitive about it."

Amazon told staff that token statistics would not be used in performance evaluations. Workers did not believe it.

A pattern across silicon valley

Amazon is not alone. Meta employees engaged in similar tokenmaxxing behaviour, competing on an internal leaderboard called "Claudeonomics" that ranked the company's roughly 85,000 workers by token consumption. In a 30-day window, total usage on the dashboard exceeded 60 trillion tokens. The leaderboard was taken down after reporting by The Information, but Meta's CTO Andrew Bosworth publicly endorsed the underlying logic – pointing to his best engineer spending the equivalent of their salary in AI tokens as evidence of a productivity multiplier.

At Microsoft, a senior leader sent an internal memo stating AI use was "no longer optional, it's core to every role and every level." A company spokesperson later clarified there was "no formal review of an employee's AI usage" – the kind of clarification issued when an original message lands harder than intended.

A May 2026 CNBC report noted that "almost every Fortune 500 is tracking overall AI usage," with tokens, prompt counts, licence activations and seat-utilisation rates becoming standard surveillance inputs alongside older metrics like badge-swipe and keyboard activity.

The financial stakes behind the pressure are enormous. Combined 2026 capital expenditure from Amazon, Microsoft, Alphabet and Meta is tracking between $650 billion and $700 billion, with some Wall Street projections exceeding $1 trillion for 2027. Every executive who has made those commitments has an investor relations problem if adoption numbers look weak. Token counts are the answer – unless employees are manufacturing the counts themselves.

The HR problem at the centre of this

The Amazon story is being described by analysts as a textbook case of Goodhart's Law: the principle that when a measure becomes a target, it ceases to be a good measure. The moment token consumption was tied to leaderboards that managers could see, it stopped measuring AI productivity and started measuring competitive anxiety.

HR leaders designed this. Not maliciously – but the incentive structure that produced tokenmaxxing is a people management structure, not a technology one. Weekly usage targets, visible leaderboards, ambiguous signals about whether the numbers feed into performance reviews: these are HR design choices, and they have produced a predictable human response.

As HRD Australia has reported, only 4% of employers see employee resistance as a barrier to AI adoption – yet nearly a quarter of workers say they would consider leaving a job if forced to use AI tools in ways they did not support. The gap between those two numbers describes the same dynamic playing out at Amazon: employees complying visibly and resisting quietly. Tokenmaxxing is simply a more industrious version of that quiet resistance.

This matters in Australia because the Albanese government's approach to workplace AI regulation is built, in significant part, on the premise that consultation and transparency will prevent exactly this kind of dysfunction. HRD has reported on the government's growing regulatory agenda, with Minister Rishworth explicitly noting that "always on" performance metrics that erode psychological safety are precisely the kinds of workplace harms she is seeking to address. Tokenmaxxing is what that harm looks like in practice.

The Australian legal dimension

There is a specific legal dimension to this story that Australian HR leaders should not overlook.

Under the Fair Work Act, employees have rights to consultation on major workplace changes, including changes to the way work is performed. The deployment of AI tools that materially alter how performance is measured – including through token consumption leaderboards – may trigger those consultation obligations, whether or not organisations have thought of them in those terms. Modern awards and enterprise agreements extend those duties further.

A parliamentary inquiry into workplace digital transformation has already recommended that AI systems used in employment-related decisions be classified as high-risk, with stronger requirements around consultation, transparency and bias auditing. If a token consumption leaderboard is informing – even informally – decisions about who is performing and who is not, it sits in that high-risk category. The fact that Amazon told employees their token data would not feed into reviews, while workers widely disbelieved it, is precisely the transparency failure that regulators are preparing to act on.

As HRD has reported, Australian CHROs now need to audit their AI-in-people-processes footprint for Fair Work exposure – identifying every place where AI is informing decisions about the workforce and assessing both the human oversight in place and whether consultation obligations have been met.

The measurement problem is also a security problem

Multiple Amazon employees told the Financial Times they were alarmed by the security profile of MeshClaw itself. The tool was granted permission to act on a user's behalf – initiating code deployments, interacting with internal systems, sending communications. One employee said: "The default security posture terrifies me. I'm not about to let it go off and just do its own thing."

This concern sits alongside the gaming problem rather than beneath it. An AI agent that employees are running on unnecessary tasks to inflate usage scores is an agent taking real actions in real systems – creating code deployments that did not need to happen, sending emails that did not need to be sent. The perverse incentive structure does not just produce misleading productivity data; it produces real operational noise.

HRD has reported extensively on shadow AI risks in Australian workplaces, where only 26% of organisations have fully documented and enforced AI governance policies, even as 94% of operations professionals use AI to assist with tasks. Tokenmaxxing is a corporate-sanctioned version of exactly this ungoverned activity – AI running in the background, taking actions, generating data, with no clear accountability for what it produces.

Research covered by HRD found that for every 10 hours of efficiency gained through AI, nearly four are lost correcting, clarifying or rewriting AI-generated content. Add leaderboard pressure that rewards running AI more rather than running it better, and that rework figure compounds further.

What Australian HR leaders should do now

The Amazon episode arrives at a moment when CEOs are facing board pressure to deliver measurable AI-driven outcomes, and that pressure flows downstream to people teams through KPIs, adoption targets and the implicit understanding that usage statistics will be scrutinised. That pressure is not going away. But the way it is currently being transmitted into the workforce is producing the opposite of what it intends.

Several practical observations for people professionals:

Measuring usage is not measuring value. Token counts, weekly active users and seat-utilisation rates tell you whether employees are running the tools. They tell you nothing about whether the tools are producing better work. HRD has reported that only 2.7% of the Australian-comparable workforce qualify as genuine "AI practitioners" – people who have embedded AI into their workflows and are seeing significant productivity gains. The remaining 97% are using AI in shallow, low-value ways. A token consumption leaderboard does nothing to change that. It may make it worse.

Leaderboards drive performance theatre, not performance. The mechanism at Amazon is structurally identical to the Wells Fargo fake accounts scandal – aggressive targets tied to evaluation producing the appearance of success regardless of whether anything of value was delivered. HR can see this coming. It is much harder to reverse once the behaviour is embedded.

The ambiguity about whether metrics feed into reviews is the problem, not the solution. Amazon told employees that token statistics would not inform performance evaluations. Workers did not believe it, and behaved accordingly. HRD's analysis of shadow AI consistently finds that employees act on what they believe management is watching, not on what policy documents say. If there is any possibility that a metric feeds into decisions about someone's career, they will optimise for it.

Transparency is the variable that changes behaviour. Reporting by HRD found that 59% of Australian organisations must demonstrate AI impact within the next 12 months, yet most initiatives continue to prioritise automation over workforce considerations. The organisations performing best on AI adoption are those in which employees understand why the technology is being deployed and what it is expected to produce – not those with the most aggressive targets.

Regulators are watching. Australia's regulatory trajectory on workplace AI is clearly moving towards greater transparency, consultation and accountability. An organisation that builds tokenmaxxing-style incentive structures now, before mandatory consultation requirements are in place, is creating exactly the kind of paper trail that will be uncomfortable when those requirements arrive.

Amazon spent $200 billion this year to make AI central to how its employees work. The tokenmaxxing problem did not cost a dollar to build. It came free, with the leaderboard.