In a recent update to the Linux kernel, a vulnerability has been resolved in the net/mlx5 subsystem. This post will discuss the details of the vulnerability, the code changes that were made to fix it, and link to the original references. The vulnerability, identified as CVE-2025-21675, impacted the handling of port select structures, which could potentially lead to a kernel crash.

The issue arose when the mlx5_lag_destroy_definers() function attempted to destroy all lag definers in the tt_map but ended up double-destroying certain lag definers, leading to a kernel crash. The problem occurred because the port select structure was not cleared on error, leaving stale values in tt_map.

Here is the original code snippet, showcasing the problematic flow

mlx5_lag_port_sel_create()
  mlx5_lag_create_definers()
    mlx5_lag_create_definer()     <- Failed on tt 1
      mlx5_lag_destroy_definers() <- definers[tt=] gets destroyed
mlx5_lag_port_sel_create()
  mlx5_lag_create_definers()
    mlx5_lag_create_definer()     <- Failed on tt 
      mlx5_lag_destroy_definers() <- definers[tt=] gets double-destroyed

To fix this vulnerability, the port select structure must be cleared on error so that no stale values are left after definers are destroyed. The code change ensures that the mlx5_lag_destroy_definers() function only attempts to destroy the lag definers once, preventing the double-destruction scenario that led to the kernel crash.

The updated code snippet is as follows

// Clearing the port select structure on error
mlx5_lag_port_sel_create()
  mlx5_lag_create_definers()
    mlx5_lag_create_definer()     <- Failed on tt 1
      mlx5_lag_destroy_definers() <- definers[tt=] gets destroyed
  // Clear port select structure
mlx5_lag_port_sel_create()
  mlx5_lag_create_definers()
    mlx5_lag_create_definer()     <- Failed on tt 
      mlx5_lag_destroy_definers() <- definers[tt=] destroyed only once

With this change, the Linux kernel no longer experiences crashes due to the double-destruction of lag definers in the tt_map. For more information on this vulnerability and its resolution, refer to the kernel commit that resolved it.

In conclusion, CVE-2025-21675 was a vulnerability in the Linux kernel that affected the net/mlx5 subsystem, which has now been resolved. By clearing the port select structure on error and preventing the double-destruction of lag definers, the kernel crash issue has been eliminated. Users are encouraged to update their Linux kernel to incorporate the latest security fixes and improvements.

Timeline

Published on: 01/31/2025 12:15:28 UTC
Last modified on: 02/04/2025 15:30:22 UTC