Google Metrax introduces predefined model evaluation metrics for JAX AI NEWS

Home
AInews
Google Metrax introduces predefined model evaluation metrics for JAX

Google Metrax introduces predefined model evaluation metrics for JAX

2025-12-21

Google has recently open-sourced Metrax, a JAX-based library that delivers standardized and high-performance implementations of evaluation metrics for machine learning models across classification, regression, natural language processing, computer vision, and audio domains.

Metrax addresses a notable gap in the JAX ecosystem. According to Google, this gap has led many teams migrating from TensorFlow to JAX to independently implement common metrics such as accuracy, F1 score, and root mean square error, resulting in duplicated effort and inconsistent practices.

While creating metrics may seem straightforward to some, it becomes significantly more complex when considering large-scale training and evaluation in distributed computing environments typical of data centers.

The library offers a comprehensive suite of pre-built evaluation metrics tailored for various ML model types—including classification, regression, recommendation systems, image processing, and audio analysis—with built-in support for distributed and large-scale setups. For vision tasks, Metrax includes specialized metrics like Intersection over Union (IoU), Signal-to-Noise Ratio (SNR), and Structural Similarity Index (SSIM). It also integrates robust NLP-focused measures such as perplexity, BLEU, and ROUGE.

One of Metrax’s core objectives, Google emphasizes, is ensuring all metrics are correctly implemented and aligned with industry best practices. Where applicable, the library leverages advanced JAX features like vmap and jit to optimize performance. These capabilities are particularly useful in implementing "at K" style metrics, enabling parallel computation across multiple K values—leading to faster and more thorough model evaluations.

You can use PrecisionAtK to evaluate model precision at several K values (e.g., K=1, K=8, K=20) in a single forward pass, eliminating the need to repeatedly call PrecisionAtK for each individual parameter.

A DevOps engineer writing on Substack under the name Neural Foundry commented:

The ability to compute multiple K values in one pass is a game-changer for ranking systems. This kind of standardization has been long overdue—every time I switch projects, I end up rewriting metric utilities from scratch. The API also appears clean and intuitive. I'm curious whether they've benchmarked against custom implementations for specific use cases, such as large-scale recommendation pipelines.

The following code example demonstrates how to compute a precision metric given predictions and labels, with an optional threshold to convert probabilistic outputs into binary decisions:

import metrax

# Directly compute the metric state.
metric_state = metrax.Precision.from_model_output(
    predictions=predictions,
    labels=labels,
    threshold=0.5
)

# The result is then readily available by calling compute().
result = metric_state.compute()
result

In addition, Google released a companion notebook featuring a full set of practical examples, including multi-device scaling and integration with Flax NNX—a streamlined API designed to simplify building, inspecting, debugging, and analyzing neural networks in JAX.

Vizcom AI

Transform sketches into 3D models and edit them

Keploy

Automated testing made easy with AI technology

Figma Make

Create prototype apps from existing designs

Doctronic

AI platform providing personalized health guidance

3D Look AI

AI body scanner for accurate body measurements

VulnZap

AI code vulnerability scanner

The Furnisher

AI room design tool for quick makeovers

RECENT AI TOOLS

Plaud

Vizcom AI

Keploy

Figma Make

Doctronic

RECENT AI NEWS

Kling 2.6 Adds Voice Control and Dynamic Upgrades as AI Video Tools Get More Realistic

Alibaba's Qwen Releases AI Model to Split Images into Editable Layers, Similar to Photoshop

NVIDIA NitroGen Model Launches with Impact

Google Metrax introduces predefined model evaluation metrics for JAX

OpenAI Allows Users to Adjust ChatGPT's Enthusiasm Level Directly

AWS Introduces ECS Fargate Mode to Simplify Deployment of Containerized Applications

Cursor Acquires AI Code Review Startup Graphite

OpenAI Releases Most Advanced Agent Programming Model GPT-5.2-Codex

RECENT AI TOOLS