Simple Prompting Tips for OpenAI's New Weighted Model

2025-08-07

OpenAI's latest open-weight gpt-oss models introduce a groundbreaking prompt technique. Simply adding "Reasoning: high" activates deep reasoning mode, while "reasoning: low" provides faster responses without comprehensive analysis (with "reasoning: medium" serving as the balanced default setting). Here's how this works in LM Studio implementations.

These gpt-oss models segment their outputs into multiple channels - "Analysis" displays raw thought processes while "Final" contains polished answers. This architecture allows users to observe models' step-by-step problem-solving approaches during high reasoning mode executions.

Developer-Focused Insights

Hugging Face has published an official gpt-oss implementation guide. Developers must employ the harmonized response format for proper prompt structuring. OpenAI's reference architecture demonstrates this format's structure:

OpenAI designed this framework specifically to enable multi-channel outputs from open-source models, supporting thought chains, tool invocations, and standard responses. While they've open-sourced the harmony renderer for this purpose, their official documentation provides detailed instructions for manual implementation outside API providers or platforms like Ollama/LM Studio.