Key Highlights
- Three proprietary AI models from Microsoft—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—are now accessible via Microsoft Foundry.
- The transcription model demonstrates superior accuracy across 25 languages, surpassing both OpenAI’s Whisper and Google Gemini Flash in benchmark testing.
- A late 2025 contract revision with OpenAI granted Microsoft authorization to develop frontier AI models independently.
- Development teams consisted of under 10 engineers per model, utilizing approximately 50% fewer GPU resources than rival solutions.
- Microsoft AI CEO Mustafa Suleiman announced intentions to create a frontier large language model, pursuing complete AI autonomy.
Microsoft advanced its artificial intelligence strategy significantly on Wednesday by unveiling three proprietary models, positioning itself as a direct competitor to OpenAI, Google, and emerging AI ventures.
The newly released models—MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2—are currently accessible through Microsoft Foundry alongside a new MAI Playground interface. These tools encompass speech recognition, voice synthesis, and visual content generation capabilities. Mustafa Suleiman, CEO of Microsoft AI, characterized this release as the inaugural deployment from his “superintelligence team,” established merely six months prior.
MSFT shares concluded their weakest quarter since 2008, declining approximately 17% throughout the year. This model introduction marks Suleiman’s initial public response to shareholder demands for measurable returns on substantial AI investments.
MAI-Transcribe-1 stands as the flagship offering. The system achieves the lowest average Word Error Rate on the FLEURS benchmark spanning the top 25 languages by Microsoft product usage, registering an average rate of 3.8%. According to Microsoft’s testing, the model surpasses OpenAI’s Whisper-large-v3 across all 25 languages and exceeds Google’s Gemini 3.1 Flash performance on 22 of the 25 languages evaluated. The system handles MP3, WAV, and FLAC files reaching 200MB, delivering batch processing speeds 2.5 times faster than Azure’s current solution. Internal testing is underway within Teams and Copilot Voice.
MAI-Voice-1 produces 60 seconds of natural audio output within one second and enables custom voice generation from minimal sample audio. The service carries a price of $22 per million characters. MAI-Image-2 achieved top-three placement on the Arena.ai leaderboard and is being integrated into Bing and PowerPoint, with pricing set at $5 per million input tokens and $33 per million image output tokens. WPP represents one of the initial enterprise clients implementing the technology at scale.
Contract Revision Enabled Independent Development
This launch would have been impossible twelve months earlier. Microsoft faced contractual restrictions preventing independent artificial general intelligence development under its original 2019 agreement with OpenAI until October 2025.
When OpenAI pursued additional compute infrastructure beyond Microsoft—establishing agreements with SoftBank and other partners—Microsoft initiated renegotiations. The amended terms granted Microsoft permission to develop frontier models independently while maintaining licensing rights to all OpenAI creations through 2032.
Suleiman explained to VentureBeat: “Back in September of last year, we renegotiated the contract with OpenAI, and that enabled us to independently pursue our own superintelligence.” He emphasized the OpenAI partnership continues through at least 2032.
Lean Development Teams Deliver Major Results
Among the more remarkable aspects of this launch: development teams for each model numbered under 10 engineers. Suleiman revealed the audio model was created by 10 people, with performance improvements stemming from architectural design and data strategy rather than workforce size.
“Our image team, equally, is less than 10 people,” he stated. This methodology contrasts sharply with prevailing industry practices, where organizations like Meta have reportedly extended individual researcher compensation packages valued between $100 million and $200 million.
Microsoft indicates pricing was strategically set to undercut competitors Amazon and Google. Suleiman characterized the rates as “the cheapest of any of the hyperscalers.” The company has already begun planning frontier-scale GPU cluster deployments spanning the next 12 to 18 months.
Suleiman verified a large language model appears on the development timeline, stating Microsoft aims to achieve “completely independent” capabilities while delivering “state of the art models across all modalities.”

