Patrick Phillips - AI Strategy & Engineering Leadership

Two years ago everyone was excited about chatbots that could write emails and pass bar exams. In late 2025 that feels almost quaint. The center of gravity has shifted from "chat with a model" to "run your business on agents."

The short version: Generative AI was the warm up. The main event is agentic systems that reason, act, and plug directly into your core workflows. At the same time, the physical world is starting to push back. Power, data centers, and infrastructure economics are now just as strategic as model choice.

If you have 10 minutes. Please read this; it won't disappoint.

From Generative Era to Agentic Enterprise

From 2023 through 2024, enterprises mostly played with standalone large language models. They lived at the edge of the business: copilots, chat interfaces, content helpers.

Late 2025 looks very different.

Frontier models now separate "thinking" from "retrieving" and treat reasoning as a resource that can be dialed up and down. Architecturally, this is a big deal. Instead of a single static pass over a prompt, systems can choose when to reflect, when to call tools, and when to run extended multi-step plans before they ever answer a user.

At the same time, companies like Intuit have moved from "AI features" to AI as an operating system. Their GenOS platform is not a chatbot sitting on top of QuickBooks and TurboTax. It is the execution layer for financial logic, embedded end-to-end in tax, payroll, and cash flow.

The catch: all of this collides with a one point four trillion dollar infrastructure buildout that is running into power, grid, and data center constraints. Software is racing ahead, physics and finance are tapping the brakes. Your AI strategy now has to live in that tension.

Adaptive Reasoning: Metered Intelligence Becomes Normal

The key technical breakthrough of this moment is adaptive compute. Models no longer spend the same effort on "write a cute out of office reply" as they do on "restructure my global supply chain contract."

You see this most clearly in two stacks.

Gemini 3 and deep thinking as a service

Google's Gemini 3 introduces what they call Deep Think. Instead of marching token by token in a straight line, the model can spin up internal planning cycles, check its own work, and simulate multi step workflows before you see an answer.

In practice, that means things like:

• Long horizon planning. Benchmarks that look like simulated businesses rather than trivia quizzes, for example tests where the model runs a vending machine business, manage inventory, and optimizes prices over time. That maps directly to real use cases such as supply chain, pricing, and operations.

• Tool based execution. Gemini 3 is strong at using terminals, writing and running code, navigating file systems, and debugging. It is not just writing scripts and handing them to a human, it is increasingly capable of playing the DevOps engineer that runs them.

Wrapped around this is Antigravity, Google's new agentic development environment. When a developer uses Antigravity they are really working with a small swarm of agents powered by Gemini 3, each with different roles: reasoning, planning, tool calling, self repair. The system leans heavily on shared context protocols and "circuit breakers" so agents can learn from failures without bringing production down.

The promise for enterprises is autonomous software maintenance rather than simple assisted coding.

GPT 5.1 and the economics of thought

OpenAI has taken a slightly different angle with GPT 5.1. They turned reasoning into a product knob.

There are two main runtimes:

• GPT 5. Instant, which aims for very low latency, roughly twice as fast as GPT 5 on routine queries. Think of this as the workhorse for customer chat, lightweight knowledge retrieval, and anything where drop off rises with each additional second of wait time.

• GPT 5.1 Thinking, which slows down deliberately for hard problems, runs internal chains of thought and tool calls, and prioritizes accuracy on complex tasks such as legal analysis, financial modeling, and multi step coding.

On top of that sits a router that classifies the complexity of each query and sends it to the cheap path or the expensive path. For enterprises, this is exactly what finance teams wanted: intelligence that is metered. A simple FAQ answer is cheap, a multiday legacy refactor is not, and the bill maps more closely to business value.

A specialized branch, often described as a Codex tier, focuses on legacy migration. It can keep state over long coding sessions, reason inside Windows and PowerShell environments, and iteratively break big monoliths into microservices while running tests in a sandbox. That is the first serious attempt to treat technical debt at Global 2000 scale as an AI native problem.

Vibe Coding and the New Software Supply Chain

Now that models can reason and use tools, the bottleneck in software shifts. It used to be writing code. Now it is describing what you want precisely enough that the machine can run with it.

Google has been loudest about this with the idea of "vibe coding." You describe intent and feel, the system chooses stack, layout, and implementation details.

Example:

You type "Build a retro styled dashboard for a vinyl record store, nostalgic but fast to use."

Gemini 3 and Antigravity will:

• Parse the intent and business logic. • Choose a modern web stack that matches the request. • Interpret "nostalgic" as specific colors, fonts, and spacing using its visual understanding. • Generate the code, spin up a local environment, render the UI, critique its own result, and iteratively patch until it matches the requested vibe.

For product teams this is a super power. High fidelity prototypes that used to take weeks can appear in an afternoon. Engineers become curators and reviewers more than typists.

The risk is shadow IT on steroids. When anyone can summon a working app by describing it, you very quickly get unvetted tools touching customer data and production systems.

The emerging answer in mature organizations is explicit governance for agent built software. Think controlled sandboxes, mandatory human review before deployment, and automated security scanning wired into every AI generated pull request. Many companies are now formalizing "Green Zones" where AI is allowed to act, and "Red Zones" where it can propose but not execute. That line is becoming as important as the old production versus staging split.

Emotional Intelligence Becomes a Strategic Differentiator

As reasoning quality converges across models, the next frontier for customer facing systems is emotional intelligence.

xAI's Grok 4 point 1 is the flagship in this category. It scores very high on EQ focused benchmarks and, more importantly, has been tuned to be both emotionally aware and factually strict.

The interesting thing is how it got there. Traditional reinforcement learning from human feedback depended on armies of human labelers ranking model responses. That is expensive and slow. Grok 4 point 1 uses other advanced models as critics that grade for empathy, tone, and nuance at massive scale.

Two things drop out of that:

• Much richer feedback on conversational style. You can optimize for warmth, humor, directness, or restraint with a level of control that was not possible with sparse human labels.

• A reduction in the usual trade off between empathy and truth. Sycophancy, where a model agrees with the user even when they are wrong, is a known failure mode. Grok 4 point 1 manages to hold the line on facts while still handling upset or anxious users in a human way.

Strategic use cases: crisis management bots for angry customers, branded assistants that keep a consistent tone at scale, and creative teams that want a distinct voice rather than generic AI copy. If your brand relies on relationship quality, you will almost certainly end up with an EQ optimized engine somewhere in your stack.

The Visual and Multimodal Enterprise

Text is no longer the main event. Image, video, and mixed media are now first class citizens in AI stacks.

Nano Banana Pro and visual edge computing

Google's image stack, nicknamed Nano Banana Pro in this research, is a good example of where things are going. It runs partially on device through a dedicated vision engine and partially in the cloud for heavy work such as four K assets.

The headline features for enterprises:

• A visual context window. You can upload a full brand style guide as a set of reference images and have the model maintain character consistency, colors, and logo treatment across campaigns.

• Accurate text inside images. The long-running problem of gibberish text in generated visuals is largely solved, which opens the door to automatically generated infographics, posters, and product shots that do not have to be rebuilt by humans.

This is already wired into tools like Google Slides. Commands like "beautify this slide" now generate new layouts, icons, and charts aligned to your corporate template. Marketing teams can create a single master asset and then authorize the model to localize for dozens of languages, adjusting layout and typography for each script, while staying on brand.

Critically, a lot of the drafting phase can run locally. That matters for privacy-conscious organizations working with unreleased products or sensitive visual content. The final polish can go to the cloud, but the early iterations never have to leave your firewall.

ERNIE 5.0 and surveillance native multimodality

In China, Baidu's ERNIE 5.0 takes a different path. It is designed from the ground up to process text, images, audio, and video in a single model rather than stitched modules.

To make that feasible it uses a mixture of experts design, where the giant total parameter count is mostly dormant and only a small slice activates for each task. The result is a system that can reason deeply about charts, dashboards, camera feeds, and instructions all at once.

It is being pointed at exactly the use cases you would expect: smart surveillance, safety and quality monitoring in factories, and industrial automation where instructions come as a mix of diagrams and text.

For global companies this is not just a technical story. Baidu is open sourcing earlier generations and building a parallel ecosystem across parts of Asia and the Global South. That means your tooling and integrations may vary significantly by geography, and "model choice" becomes entangled with geopolitics.

MMCTAgent and unlocking dark video data

Microsoft's MMCTAgent tackles the biggest unlabeled dataset in most enterprises: video. Town halls, training, compliance footage, customer calls, plant cameras, all of it mostly inert today.

MMCTAgent uses a planner critic structure. One agent decomposes your query and hunts through semantically chunked videos, transcripts, and key frames. Another agent quality checks that what it found actually answers the question with the right timing and context.

That turns terabytes of past meetings and recordings into a searchable knowledge base. Instead of "go scrub that two-hour all-hands for what the chief executive said about Asia in Q3," you ask the system the question and jump straight to the relevant sentence in the playback. For compliance and investigations, you can search for specific behaviors or policy violations across surveillance feeds in a much more targeted way.

Economics Get Real: Mega Deals and Green Zones

The commercial model for AI is growing up fast. The famous example in this report is the roughly one hundred million dollar per year deal between Intuit and OpenAI. The important detail is not the number, it is the structure.

Intuit is buying a joint operating system. Its products are becoming agents inside ChatGPT, and OpenAI's models are becoming agents inside Intuit's ecosystem. The goal step is to move from "we help you with your taxes" to "we execute your taxes, run your payroll, manage your working capital, and explain it as we go."

To make that acceptable to regulators and boards, a new reference pattern is emerging: the Green Zone architecture. Common elements include:

• Strong data isolation guarantees. Enterprise data is used to power your instance and fine tuned behaviors, but not to train the general model.

• Domain grounding. Models are constrained by validated internal logic and datasets, which reduces hallucination risk on high stakes decisions such as taxes, health care, or payroll.

• Indemnification. Cloud providers are now routinely offering IP and safety indemnities at this tier, which is becoming table stakes for nine figure agreements.

Expect more of these vertically deep, structurally tight partnerships anchored by shared economics and shared roadmaps, not just metered API calls.

The Infrastructure Crunch: Physics Enters The Chat

While all this software magic is happening, the hardware and energy story is becoming uncomfortable.

Cloud and AI leaders have committed an estimated one point four trillion dollars to infrastructure. Projects like massive hyperscale builds for OpenAI and huge capex commitments from Meta are the visible tips of that iceberg.

The friction is no longer just chip supply. It is power and buildings with enough cooling and grid connectivity to host those chips. Leaders like Satya Nadella have been blunt that the constraint is "warm shells" ready to plug racks into.

You get a few nasty dynamics out of this:

• Stranded assets, where you can buy accelerators but cannot deploy them at scale fast enough.

• Rising regional inequality, as places with power, water, and favorable regulation suck in data center builds and everyone else waits.

• Valuation risk, because a lot of the projected revenue growth for AI assumes that the underlying capacity arrives on time. Regulators and central banks are already flagging this as a systemic exposure.

The hard question for enterprises is whether agentic systems will deliver enough real productivity by 2026 to justify the underlying infrastructure bill. If deployment lags, or if power limits slow rollout, the market could reprice AI names sharply.

What To Do In 2025 If You Lead An Enterprise

Given all of that, here is a practical agenda for the next year.

Decide your agentic stack by domain. You will not pick a single model for everything. Gemini 3 plus Antigravity might own the frontend and rapid app work. GPT 5 point 1 Codex tiers may be ideal for Windows and legacy modernization. Grok 4 point 1 could anchor your support and brand voice. Nano Banana-style systems might sit with marketing. Start mapping model families to problem domains explicitly.

Design your Green Zone now. Treat AI access like network segmentation. Define which data can be touched by which agents, what they are allowed to execute, and where human review is mandatory. Bake in logging, security scanning, and approval workflows before usage goes from dozens of users to thousands.

Align with infrastructure reality. Talk to your cloud and data center providers about actual capacity, not just roadmaps. Understand where warm shells are, what regional constraints exist, and how edge or hybrid patterns can offload some workloads to local hardware to reduce exposure to grid bottlenecks.

Turn dark data into an asset. Inventory your video, audio, and document archives. Pick a beachhead, such as "make all past town halls and training content queryable," and pilot tools like MMCTAgent. The organizations that win this phase will not just generate new content, they will mine their history more effectively than competitors.

Reskill for product thinking, not prompting. As vibe coding and agent orchestration mature, the scarce skill is not typing better prompts, it is specifying systems. Train product managers, domain experts, and architects to write rigorous requirements, design safe workflows, and think in terms of life cycle, observability, and failure modes for agents.

Looking Ahead To 2026

The myth of a single universal "god model" running everything is fading. The future looks more like a portfolio. Specialized reasoning engines for code, contracts, conversations, images, and video, each embedded deeply into the workflows where they shine.

The leaders in 2026 will not just have the cleverest copilots or the flashiest demos. They will be the ones who secured their Green Zones early, negotiated serious platform partnerships, locked in access to literal power, and built the organizational muscle to design, govern, and continuously improve fleets of agents that do real work.

This is the moment when AI stops being a lab toy and becomes infrastructure. The choices you make in the next twelve months set the constraints for the next decade.

The Agentic Enterprise Arrives