OpenAI's latest open-weight gpt-oss models introduce a groundbreaking prompt technique. Simply adding "Reasoning: high" activates deep reasoning mode, while "reasoning: low" provides faster responses without comprehensive analysis (with "reasoning: medium" serving as the balanced default setting). Here's how this works in LM Studio implementations.
These gpt-oss models segment their outputs into multiple channels - "Analysis" displays raw thought processes while "Final" contains polished answers. This architecture allows users to observe models' step-by-step problem-solving approaches during high reasoning mode executions.
Developer-Focused Insights
Hugging Face has published an official gpt-oss implementation guide. Developers must employ the harmonized response format for proper prompt structuring. OpenAI's reference architecture demonstrates this format's structure:
OpenAI designed this framework specifically to enable multi-channel outputs from open-source models, supporting thought chains, tool invocations, and standard responses. While they've open-sourced the harmony renderer for this purpose, their official documentation provides detailed instructions for manual implementation outside API providers or platforms like Ollama/LM Studio.