Benedict Evans may be right after all; frontier models look more and more like telecom companies in the 90s. Billions and billions of investment in infrastructure while others further up the stack captured all the value.
There will be frontier models that are non-commoditized, but they'll be kept guarded and hidden away, and you'll only get the final result, so that they can't be distilled and their harness can't be reverse engineered. They'll be billed like employees, rather than like a tool.
In spite of their deeper pockets, massive datacenters, colosal amounts of user data, and hundreds of thousands of top developers, even Amazon, Meta, Microsoft, and Google are well behind.
I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
I use both Claude and Codex and don’t see any meaningful difference between the two. My use case is modeling semi complex physical processes (energy and manufacturing) in code for simulations. I also have to do a good fair of automation via scripting in Python or PowerShell for manipulating data as well as legacy code analysis (C, Fortran, COBOL). Given I provide the models with the information and documentation they need, both perform very similarly. I recently did a full codebase review (for design patterns and vulnerabilities) and both Codex and Fable agreed 100% about the most critical findings. I do very little front end development, although some of my automation scripts have TUIs and again no problem with either Claude or Codex generating them for me. At this point I go with the less expensive, which seems to be Codex. With the $100 plan I rarely hit the limits. With Claude I max out my plan in about 4-6 hours of work.
I wish there was a case where I find Evans is wrong. As far as my memory served me, I failed to record a single one.
I disagree that Amazon, Meta, Microsoft, and Google are "well" behind. If anything the frontier model advantage seems to be at best 6 - 9 months. And that the Chinese model are all doing well.
One of Steve Jobs's line, "It is a feature, not a product." Even if Apple were a generation behind or 1 year behind frontier model. The advantage of default is enough to hold a lot of its user.
To put it simply, even if OpenAI or Anthropic were better, there is zero chances they would topple Apple in hardware sales, user or ecosystem. On the other hand, even if Apple's AI were 6 - 9 months or a generation behind, most user would settle for it and damage OpenAI / Anthropic.
Maybe I’m alone in thinking this but I think the long term victor will be the one that works out pricing best.
Fable might well be a better model but it’s too expensive for everyday AI use. Definitely if we’re talking about the kind of stuff you’re going to want to do on your phone. Even for coding, I’m not going to reach for Fable (well, when I can…) for 95% of the work I do.
I don’t believe a mature AI industry is going to have a one size fits all, single winner.
> I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.
And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.
I agree with what you're saying.
I have a Claude plan for work and I prefer using Claude more than any other LLM I've tried.
Having recently tried the Codex 100€ plan with GPT-5.5 in high/xhigh, I don't think it's worse that the Opus models, just different.
I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
> I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things.
Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.
You mean the model that was available for a whole of three days? No, I had played around with it a tiny bit, but not much than that. I guess time will tell if it gets close.
The play here seems pretty evidence, if I may assume. Apple creates an interface that is generalized enough so you can easily swap models, and while Claude is preferred by Apple today, it may be any provider or even local models in the future, and the APIs the developers use remain the same, so "migration" becomes easier.
The betas of the next OS's include a Siri AI chatbot, and the AI features are built into various parts of the OS. A user has no idea what model is powering any of it - Apple controls the UX.
The article is about (from the eyes of a user) white-labeled usage of Claude models on Apple devices, this subthread is about white-labeled usage of LLMs on Apple devices, how is it not relevant?
> The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.
That's the point! That's the whole "white-labeling" part, and what the commentator earlier is talking about. You're very close in understanding the context here!
I think you're taking the written words a bit too literally here. Read it with a more lax filter and less literal word-meaning, and I think the original comment will become a bit clearer.
You know what, I've been a bit too snipe-y in my previous comments, and it led to to discussion devolving in unproductive ways.
I'd genuinely like to understand where you're coming from more.
I think we're all in agreement that this framework is very much about letting developers swap the models easily, and treat them as commodities. That seems pretty obvious.
I do however still don't see how this has anything to do with controlling the UX (or the new Siri for that matter! The new Siri doesn't use Anthropic models, and there are no extensions point for it to do so — that's pretty much the whole reason why it won't be available in the EU).
I think that's what they are trying to avoid. If you need on-device intelligence, their pitch was "The model the device already has is best", and if you need something more specific an adapter (aka, a fine-tune/lora) is best.
They were wrong when their on-device model was way behind. They still might be right in the long term.
While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.
Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.
- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.
- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.
- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.
- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)
You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.
I see an id based ability suggesting `modelId`. but in current docs I cannot find any context to it. The other limit is that it suggests Swift Packages. but I'm not seeing any model management hints similar to Docker/Ollama/etc where:
- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)
- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).
I have a Mac with 4TB of storage but it’s still annoying when every new AI app I try installs its own virtual environment with a fresh copy of Python, PyTorch, other duplicate libraries, and then models on top of that.
As an occasional python user I'm always amazed and frustrated that it seems that the only way to be able to use/build anything is to create a whole separate environment.
And now given everybody now does this I guess the incentive to stop breaking stuff reduces even further.
The meme phrase “it’s fractally wrong” applies to the entire python ecosystem, IMHO. Virtual environments are just another layer of this fractal wrongness in the layer cake of ecosystem awfulness.
I have a couple small apps that have a (non-LLM) model, and originally the models and code were in PyTorch, built by Python devs.
The original plan was to ship Python. However I found out I can migrate them to CoreML, and now it's a model file + Swift code. I got some massive performance improvements as well.
Of course, this doesn't work at all for non-Mac environments, but it was nice to be able to do it. (Also doesn't solve the duplicate large models problem)
The apps can use the system provided on-device model using the same framework and APIs; but there's no affordances to deduplicate custom models between apps.
I think Apple has a fairly good plan for supplying a common API and default on device models.
What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?
Is this Apple encouraging developers to go through their api abstraction layer to use LLMs so that when they launch their own (which I think we’ve heard they’ve been spending lots of money on training and might be somehow involved with Siri or current Apple AI?) that they can easily help devs make a seamless transition? Or is it just a developer nicety or something else?
Apple has some clever mechanics to protect user data. I had to work with App tracking stuff lately and their approach to keeping user details private with anonymized cohorts (SKAN, Differential Privacy) before reporting tracking events to third party platforms was surprisingly well thought out. There is value in having them in your loop if you care about privacy.
This is support for a new framework that ships with reality/mac/iPad/watch/tv/iOS 27 (and that they've promised to open-source later in the year, so presumably you'll also be able to lean on this if you ship Swift on your backend).
The framework's whole deal is that it lets you use the same API to target either the device built-in models, the Apple-hosted online models (Private Cloud Computer), or write your own shims to call out to arbitrarily hosted online models.
You can then dynamically route your calls to a different kind of model/provider, using system APIs, without having to write your own abstraction layer over "I want to use local model for this, but I want to use Claude for that", or having to integrate your own API integration with Anthropic/OpenAI APIs.
It abstracts things like tool calling in one place; and has a bunch of other niceties/oddities (it keeps the same "transcript" going, even if you dynamically switch providers/models during a session) and some other things.
A dark, but not totally unfair take: It makes it easier for Apple to take payment for the models others provide, and even allows Apple, if they want to, to use the data to build a dataset for training their own models based on how users use third party models. It's only on Apple devices this API is used, so they split up the market by not letting developers use the same system if they want things to work on iOS, locking users even more in.
The cynic (or realist?) in my thinks this abstraction layer is Apple's way of making sure that users give their own Apple Intelligence credit for the underlying LLM functionality, even if another company is actually providing the LLM.
Yeah, Apple just designs and writes the SoC, CPU, graphics unit, neural unit, compiler (Swift), OS, graphics layer, 3D API, core libs from graphics to persistence, filesystem, broadband chip, and a few more things besides...
Maybe they plan to have the providers pay for being the default model? So basically, what Google is doing right now for search engines. The difference however is that Google is making money with additional search requests while AIs are (as of now) losing money with additional requests. I don't see the business case for them yet though.
I think this is just Apple planning for their on-device models getting better, which makes sense given they have access to Gemini now. If developers use this for all their code calling an external LLM, then as Apple's model becomes more capable and covers more use cases it'll be easy to switch to it at individual call sites. That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.
> That'll give apps better UX and save developers money on a bill that Apple doesn't get a cut of.
With other words, it's unlikely to happen as there is no money in it. Better for Apple to create some new subscription "AI" and "AI-lite" plans people can subscribe to, and since Apple is a company and we all know what those care about, it's unlikely to become a utopia of local models running on your phone.
How can you practically use this in software if you're to deploy this to users? Asking a user to create and enter their own API key is a bar too high for good UX.
Ugh. It really is. I have allihat.com which is the only safari extension (i think still) that talks to claude. And it's well sought for. But you as a user have to enter a friggin claude api key. :( And I still don't grok their TOS around this. Like you can still type: ```setup-token Set up a long-lived authentication token (requires Claude subscription)``` but this seems like a trap? :) Whose using this? Doesn't this like insta break their TOS if you use that anywhere?
Right now for allihat.com I just let people use the Apple model locally if you don't feel like using the claude key. And my conversions to paying user shot up like 3x! But it really isn't a replacement obviously to claude. I was hoping Apple would make proxying to Claude some kind of thing they do for me so I also don't have to proxy to my own server just to try and manage API to Claude usage.
Coding agent itself an imposed layer. Now they are adding one more layer? Many times I think of coding agent as the vendor supervisor from the body shops of the 90's who promise the customer everything under the sky and thrash the poor contractor to deliver. Coding agents consume 10x more tokens just like how body shops charged their customers vs how they paid the contractors. For a simple test, the same task that makes the model to go out of context length when used via a coding agent, runs fine when prompted directly.
Layers are luxury and remove control and transparency.
From app developer standpoint why would anyone ship claude keys like that ... or am I missing something? From consumer standpoint - I guess they can use their own keys but it is not something that is very user friendly as you can imagine.
For production, route requests through your own back end with .proxied. The relay at baseURL adds the Claude API credential server-side, so the app ships no key. The headers you provide are sent on every request so your proxy can authorize the caller.
I’m surprised to see the model names hardcoded as an enum (e.g. `.sonnet4_6`), instead of a string with model discovery so that the user can select their preferred model without having to get a new app version through the App Store to support newer models.
>Model identifiers are values of ClaudeModel. Use a compiled-in constant, or construct one with explicit capabilities for an ID that isn't compiled in yet (see Capabilities):
Special emphasis on the "isn't compiled in yet" and "or construct one" bit.
This was expected.
Apple will carefully choose what & how people can use AI in their ecosystem and will make sure of it. I hope "Apple Foundation Models" Eco-system grows with support from major model providers.
This seems smart. Apple, despite not really leading in AI themselves, are right on the hot path of where developers are going to yolo slop into the ecosystem. Make a tonne of sense to define a nice clean API that places like Anthropic can build on top of and expose to developers.
It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.
> A key bundled into an app is extractable from the shipping binary, and anyone who extracts it can make requests billed to your account. Use .apiKey for development only, and switch to a proxy before release.
I don't like this model. Then all the user data is visible to the proxy.
Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.
We just need to live in the alternate universe where micro payments succeeded.
Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.
// Apple's on-device model
let session = LanguageModelSession(model: SystemLanguageModel.default)
// Claude — same API, just different model constructor
let session = LanguageModelSession(model: ClaudeLanguageModel(name: .sonnet4_6, auth: auth))
One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.
You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.
What I'm curious about is whether this is actually on-device. Apple's framework caps local models around 3B params last I looked, and Claude is way bigger than that. So either there's some hybrid setup I haven't seen documented, or this is mostly a Claude SDK in FM clothing. Anyone tried it on a plane?
They are a hardware company and will keep selling the best machine for AI use. Well done.
I think Evans is completely wrong. There are only 2 truly frontier models. (at least for now). And Anthropic seems to be leaving OpenAI behind so there might be only 1 in the near future. (which is scary/dangerous)
I wish there was a case where I find Evans is wrong. As far as my memory served me, I failed to record a single one.
I disagree that Amazon, Meta, Microsoft, and Google are "well" behind. If anything the frontier model advantage seems to be at best 6 - 9 months. And that the Chinese model are all doing well.
One of Steve Jobs's line, "It is a feature, not a product." Even if Apple were a generation behind or 1 year behind frontier model. The advantage of default is enough to hold a lot of its user.
To put it simply, even if OpenAI or Anthropic were better, there is zero chances they would topple Apple in hardware sales, user or ecosystem. On the other hand, even if Apple's AI were 6 - 9 months or a generation behind, most user would settle for it and damage OpenAI / Anthropic.
Fable might well be a better model but it’s too expensive for everyday AI use. Definitely if we’re talking about the kind of stuff you’re going to want to do on your phone. Even for coding, I’m not going to reach for Fable (well, when I can…) for 95% of the work I do.
I don’t believe a mature AI industry is going to have a one size fits all, single winner.
Truly fascinating ecosystem and community in general, as experiences differ so wildly. Anthropic's models seems far behind OpenAI to me, especially when you get into "Pro" territory, and there doesn't seem to be any worthy competition to Pro Mode available at all.
And this is said with someone who use both platforms, and spend a lot of my day interacting with agents and LLMs in various ways. The interesting part is that probably so do you too, and probably your experience and what you share lines up with what you experience! Yet we come away with basically opposite takeaways :) I don't think either of us are wrong either, somehow.
I've noticed that depending on how you talk to it, you get wildly different outputs. This seems to happen less with Opus: it mostly understand what I want. GPT is often a bit too literal.
Just my two cents.
Yeah, exact prompting matters a lot, seemingly more than people think. There is definitely tradeoffs between how literal the models takes the prompts, on one hand it's useful for the model to ignore their own instinct when you know better, so they don't go chasing geese randomly, but on the other hand it's useful sometimes when they self-direct, when you misworded something and it's obvious you meant something different because of the context, and similar things. They're basically good at different things.
Really agree every model isn't equal and they aren't as interchangeable without adjusting how you prompt them as people seem to think.
From a user’s perspective, it doesn’t matter.
That API has no user-facing components, and has no influence over UX of what the end-users are interacting with.
The users won't know if you used Foundation Models API or integrated with OpenAI/Anthropic/Gemini SDK directly.
That's the point! That's the whole "white-labeling" part, and what the commentator earlier is talking about. You're very close in understanding the context here!
I'd genuinely like to understand where you're coming from more.
I think we're all in agreement that this framework is very much about letting developers swap the models easily, and treat them as commodities. That seems pretty obvious.
I do however still don't see how this has anything to do with controlling the UX (or the new Siri for that matter! The new Siri doesn't use Anthropic models, and there are no extensions point for it to do so — that's pretty much the whole reason why it won't be available in the EU).
Help me see your point of view!
I'd love using Gemma4 as an example. but thinking of a user. if 10 Apps each uses same model and downloads it, the phone will be bloated.
I still didn't understand if Apple provided a way for multiple apps uses same on-device model (without tricky namespaces and permissions).
I didn't see anything suggesting that's the case.
They were wrong when their on-device model was way behind. They still might be right in the long term.
While multiple app I use might need Gemma 4 E4B, I use dozens of apps and app devs can choose from hundreds of models. A shared cache might reduce size a little when there's overlap, but the core problem still exists. If each app chooses a model disk and memory-swapping explode.
Its probably be better for device manufacturers to bake in a default. I'm not proposing they limit you from using others, but one shared default might be best developer/user experience for 99% of apps.
- Being warm in memory is the single biggest perf speedup you can get, and a default is much more likely to be warm.
- "Best model" is usually "best model for this device" given both RAM and compute. A developer can't test every device but Apple can/will.
- Each model needs to be optimized for the hardware (what's running on ANE, what's running on Metal, what's running on CPU). The default gets optimized.
- If you need custom model, a Lora is probably best (30MB, benefits from all of the above)
You could say the default should be swappable, but that's more a linux ideal than an Apple one so I doubt we ever see that. Plus there are real downsides: intentional or not, prompts end up optimized to the model they are developed for, so swapping the default system model would degrade every app.
- Application can ask for specific model, if available use it. if not, ask to download it (or try some fallback / alternative)
- User can manage models. So as a user I can clean unused models (and for non-techie have something similar to offloading apps when unused for some period of time).
And now given everybody now does this I guess the incentive to stop breaking stuff reduces even further.
Might as well have static binaries.
It’s a nice language though.
The original plan was to ship Python. However I found out I can migrate them to CoreML, and now it's a model file + Swift code. I got some massive performance improvements as well.
Of course, this doesn't work at all for non-Mac environments, but it was nice to be able to do it. (Also doesn't solve the duplicate large models problem)
What confuses me about this article is: The code examples Python, Ruby, etc.) look to me like the original Anthropic APIs, not Apple’s abstraction. Did I miss something?
The framework's whole deal is that it lets you use the same API to target either the device built-in models, the Apple-hosted online models (Private Cloud Computer), or write your own shims to call out to arbitrarily hosted online models.
You can then dynamically route your calls to a different kind of model/provider, using system APIs, without having to write your own abstraction layer over "I want to use local model for this, but I want to use Claude for that", or having to integrate your own API integration with Anthropic/OpenAI APIs.
It abstracts things like tool calling in one place; and has a bunch of other niceties/oddities (it keeps the same "transcript" going, even if you dynamically switch providers/models during a session) and some other things.
Lol bro this is literally it this is the model they've been training (was Apple Foundation model not a big enough hint?)
With other words, it's unlikely to happen as there is no money in it. Better for Apple to create some new subscription "AI" and "AI-lite" plans people can subscribe to, and since Apple is a company and we all know what those care about, it's unlikely to become a utopia of local models running on your phone.
Right now for allihat.com I just let people use the Apple model locally if you don't feel like using the claude key. And my conversions to paying user shot up like 3x! But it really isn't a replacement obviously to claude. I was hoping Apple would make proxying to Claude some kind of thing they do for me so I also don't have to proxy to my own server just to try and manage API to Claude usage.
Apple is offering developers with less than 2 million downloads free AI models via their servers https://techcrunch.com/2026/06/08/apple-bets-cheaper-ai-will...
I know this is from a developer perspective. But as a consumer this is just funny.
Layers are luxury and remove control and transparency.
Proxy (production)
For production, route requests through your own back end with .proxied. The relay at baseURL adds the Claude API credential server-side, so the app ships no key. The headers you provide are sent on every request so your proxy can authorize the caller.
https://platform.claude.com/docs/en/cli-sdks-libraries/libra...
They are.
Special emphasis on the "isn't compiled in yet" and "or construct one" bit.
It's also smart for them to make sure the billing is going direct from Anthropic to the developer. The initial thought is "That means Apple's not taking a cut", but from the other side of it, developers who use this API are going to have to expose that cost to customers somehow, and that translates to subscription/InAppPurchase etc. on top of which Apple will get it's 30%.
While expected, it’s still a bummer.
I don't like this model. Then all the user data is visible to the proxy.
Far better would be some kind of micro payment architecture where a wallet is on the users device and coins are attached to each request.
We just need to live in the alternate universe where micro payments succeeded.
Apple's Foundation Models framework (shipping in iOS 27 / macOS 27 this fall) is the standard Swift API for on-device AI — the same API Apple uses for their own small model. This package makes Claude plug into that same API as a drop-in swap.
One API, two tiers. You write your app once against the Foundation Models protocol. On-device model handles fast/free/private tasks; Claude handles heavy reasoning, long context, or capability gaps — you swap the model, not your code.You don't call the Anthropic API directly. Apple's framework handles streaming, tool calling, and structured output (@Generable) — you just get Claude's capability through it.