Codex: OAuth Token Refresh Fails In Rmcp Streamable HTTP
Introduction
This article delves into a critical issue encountered while using the rmcp streamable HTTP transport with OAuth in the Codex environment. Specifically, it addresses a problem where OAuth tokens fail to refresh before their expiry, leading to connection failures, particularly in long-running SSE (Server-Sent Events) connections. This issue significantly impacts user experience, requiring frequent re-authentication. Let's explore the details of this problem, the steps to reproduce it, the expected behavior, and the actual observed behavior, along with potential solutions.
Understanding the OAuth Token Refresh Problem
When working with the rmcp streamable HTTP transport and OAuth, the OAuthTokenResponse that is cached retains the original expires_in duration indefinitely. The AuthorizationManager::get_access_token() function is designed to refresh the token only when this duration reaches zero. Consequently, tokens that are still considered valid locally do not trigger a refresh, even if the server has already rotated them. This discrepancy leads to a situation where long-running SSE connections, such as those used by the Notion MCP server, abruptly fail with an Auth required error when the provider rotates the access token and refresh token pair. The crux of the problem lies in the client's inability to proactively refresh tokens based on the server-side expiry, causing disruptions in continuous data streams.
Steps to Reproduce the Issue
To replicate this issue, follow these steps:
- Initiate the login process using the command:
codex mcp login notion. - Allow the SSE client to remain idle for approximately one hour. Alternatively, you can run the ignored test
cargo test -p codex-rmcp-client notion_tests::notion_refreshes_after_tampered_access_token -- --ignored. - Observe that the subsequent
notion.notion-fetchcall fails immediately with anAuth requirederror during the initialization handshake. This failure indicates that the client did not refresh the token before the server invalidated it.
This sequence of steps clearly demonstrates the scenario where the lack of token refresh leads to authentication failures, disrupting the intended seamless operation of the application.
Expected vs. Actual Behavior
Expected Behavior:
The client should proactively refresh the OAuth tokens using the stored refresh_token as soon as the server-side expiry (or a small skew window) is reached. This ensures that the session continues seamlessly without interruption. The refresh process should be transparent to the user, maintaining a continuous and authenticated connection.
Actual Behavior:
In reality, the AuthorizationManager continues to report the original expires_in value, preventing the rmcp client from initiating a token refresh. As a result, when the access token expires on the server-side, every request, including the handshake, fails with an Auth required error. This forces the user to manually re-run the codex mcp login notion command to re-authenticate and re-establish the connection. This behavior is far from ideal, as it disrupts the user experience and introduces unnecessary friction.
Environmental Context
The issue occurs under the following environment:
codex-rmcp-client @ 004264457a46d0424b58e2200fe3e472c0e39623rmcp 0.8.5- MCP server:
https://mcp.notion.com/mcp
These versions and the specified MCP server configuration provide the context in which the OAuth token refresh problem manifests. Understanding the environment is crucial for diagnosing and resolving the issue effectively.
Proposed Solution
A potential solution involves tracking the expires_at timestamp, refreshing the token when the remaining lifetime falls below a defined skew (e.g., 30 seconds), and immediately persisting the refreshed credential set. This approach ensures that the client proactively refreshes the token before it expires on the server, preventing authentication failures. Implementing this solution would significantly improve the reliability and user experience of applications relying on rmcp streamable HTTP transport with OAuth.
Detailed Analysis and Potential Causes
To fully grasp the implications of this OAuth token refresh failure, it's important to delve deeper into the potential underlying causes and provide a more detailed analysis.
Root Causes
-
Incorrect
expires_inHandling: The primary cause seems to stem from the way theexpires_invalue is handled in theAuthorizationManager. Instead of recalculating the expiry time based on the current time, it retains the original duration from the initial OAuth token response. This means that even as time progresses, the client still believes the token is valid until the originalexpires_inperiod has elapsed, regardless of what's happening on the server-side. -
Lack of Server-Side Expiry Awareness: The client doesn't seem to have any mechanism to check the token's validity against the server or to account for potential server-side token rotation. Modern OAuth implementations often include features like JWT (JSON Web Token) introspection endpoints or mechanisms for the server to signal token invalidation. The absence of such mechanisms in the current implementation means the client is effectively operating in the dark regarding the true validity of the token.
-
Insufficient Skew Window: Even if the client attempted to refresh the token based on the
expires_invalue, a small skew window is necessary to account for potential clock drift between the client and the server. Without a sufficient skew window, the client might attempt to use the token right up until the moment it believes it expires, potentially leading to a race condition where the server has already invalidated the token.
Impact Assessment
The consequences of this OAuth token refresh failure can be quite significant, especially for applications that rely on long-lived SSE connections:
-
Service Disruption: As demonstrated in the reproduction steps, long-running connections are prone to abrupt termination due to authentication failures. This can lead to data loss, incomplete transactions, or a degraded user experience.
-
Increased User Friction: Requiring users to manually re-authenticate every time the token expires is not only inconvenient but also undermines the principles of Single Sign-On (SSO) and seamless authentication. Users may become frustrated with the application and seek alternatives.
-
Security Implications: While not directly causing a security vulnerability, the lack of proper token refresh can potentially increase the window of opportunity for malicious actors. If a token is compromised but remains valid for an extended period due to the refresh failure, it could be exploited for unauthorized access.
Proposed Solution in Detail
The proposed solution aims to address the identified root causes and mitigate the impact of the OAuth token refresh failure. Here's a more detailed breakdown of the key components:
-
Tracking
expires_at: Instead of relying solely on theexpires_induration, the client should calculate and store the absolute expiry timestamp (expires_at) based on the current time and theexpires_invalue received from the OAuth server. This provides a more accurate representation of when the token is expected to expire.let expires_at = Utc::now() + Duration::seconds(oauth_token.expires_in as i64); -
Proactive Refresh with Skew: Before making an API request, the client should check if the current time is approaching the
expires_attimestamp. A skew window (e.g., 30 seconds) should be used to ensure that the token is refreshed before it actually expires on the server.let now = Utc::now(); let skew = Duration::seconds(30); if now + skew >= expires_at { // Refresh the token } -
Immediate Persistence: After successfully refreshing the token, the client should immediately persist the new access token, refresh token (if applicable), and the updated
expires_attimestamp to a secure storage location. This ensures that the client has the latest credentials available even if the application restarts. -
Error Handling and Retry: The refresh process should include robust error handling to deal with potential issues such as network connectivity problems or invalid refresh tokens. In case of failure, the client should attempt to retry the refresh operation, possibly with exponential backoff, before giving up and prompting the user to re-authenticate.
-
Consider JWT Introspection: For more robust token validation, consider implementing JWT introspection. This involves sending the access token to a dedicated introspection endpoint on the authorization server, which then returns information about the token's validity and associated claims. This allows the client to verify the token's status in real-time.
Implementation Considerations
When implementing the proposed solution, keep the following considerations in mind:
- Secure Storage: Ensure that the access token, refresh token, and
expires_attimestamp are stored securely using appropriate encryption and access control mechanisms. - Thread Safety: If the token refresh process is performed in a multithreaded environment, ensure that it is properly synchronized to prevent race conditions and data corruption.
- Test Coverage: Thoroughly test the implementation to verify that the token refresh process works correctly under various scenarios, including token expiry, network failures, and server-side token revocation.
Conclusion
The OAuth token refresh issue in the rmcp streamable HTTP transport poses a significant challenge to maintaining seamless and reliable SSE connections in Codex. By understanding the root causes, impact, and implementing the proposed solution with careful attention to detail, it is possible to mitigate the problem and provide a much better user experience. The key lies in proactively managing token expiry, securely storing credentials, and handling potential errors gracefully.
For more information on OAuth 2.0 and token management, refer to the official OAuth 2.0 specification.