The 2026 AI landscape with sources under every claim

Jason Voorhees

Jason Voorhees

๐•ธ๐–Š๐–—๐–ˆ๐–Š๐–“๐–†๐–—๐–ž ๐•ฎ๐–”๐–—๐–• โ€ข ๐Ÿ๐ŸŽ๐Ÿ๐Ÿ’๐Ÿฅ‡
Joined
May 15, 2020
Posts
93,839
Reputation
284,070
I have been following this since years and here's my list of Obsidian Notes with research papers going back till 2022.

1782366967834


Nothing I say will be my own words, I've read a lot of these papers and every objective statement I make in this thread will come with a link to a study which validates it and in the end I'll give you my take which you can agree or disagree with. I've condensed all the information in an easy to read and understand format no math formulas, no deep research or technical things and I have simplified all the graphs to make them less academic

First, some terms so this is understandable for someone non technical

LLMโ€” a program trained on a huge pile of text that predicts the next word over and over. That's how it writes sentences. LLM= Large Language Model.

Parameters โ€” you can think of them as the internal adjustable dials that it tunes during training More dials = a bigger model and generally better. When you see 7B written nextthat means 7 billion of these dials. Claude/GPT models have hundreds of billions of these parameters..

Big models vs small models (SLMs) โ€” I'll use the famous analogy. A big model is like a giant general hospital knows everything, expensive, slow to get to vs a local clinic that's fast, cheap, and handles 90% of what you actually walk in with. Small Language Models are roughly 1โ€“14B dials, big models above that

Cloud vs local / on-device โ€” Cloud means the AI runs on a company's servers. Local / on-device means it runs on the chip inside your phone or laptop. Your data never leaves your hand can run without internet.

Tokens โ€” a token is a chunk of text, around a word-piece.

NPU โ€” a dedicated AI chip built into phones and laptops these days.

Quantization โ€” compressing a model so it fits on a small device like how people compress big ass photos into smaller pictures to send. Some quality lost, but same model.

Distillation โ€” training a small "student" model to imitate a big "teacher" model. This is how tiny models get surprisingly smart. Remember this word I've bolded and colored this and the next term because this is where a lot of the recent innovation has happened.

Images


Pretraining vs test-time (inference) compute โ€” pretraining is the studying the model does in advance. Test time compute is how much "thinking" it's allowed to do in the moment you ask it something.


Images


AGIโ€” Artificial General Intelligence: the hypothetical AI that can do basically any intellectual task a human can. Nobody agrees on the exact definition which is actually half the problem.

Part 1: Local Models

This is headline feature of 2026, the shift toward small and local models

This app I built for a user. This was only possible because of advancements in local LLMs in 2025โ€“26.

You can freely download very capable open models now, the Gemma and Qwen families handle everyday tasks at a quality comparable to cloud models from a year or two earlier.

A small (7B) model costs roughly 10โ€“30x less to run than a large one, and current phone chips can run 8-billion-dial models at conversational speed (20+ tokens/sec)

Deployment has also matured a lot. I was able to do it in little time because now running a model locally isn't a research project taking weeks anymore. You have Ollama, LM Studio and llama.cpp etc.

Part 2: the scaling debate

One thing people noticed in the early days of AI is that when you started throwing more compute at a model, it gained new abilities and more intelligence on its own.

This is called the scaling law and in 2026 we're now seeing its limits.

The original scaling-laws result performance improving predictably as you add size, data, and compute

Kaplan et al. (2020), "Scaling Laws for Neural Language Models"

The new abilities appear as you scale" observation:

Wei et al. (2022), "Emergent Abilities of Large Language Models"

Pretraining scaling (bigger model + more data) in 2026 has reached the point of diminishing returns. GPT-5 dead on arrival is the headline proof



1782366809074


HEC Paris (2025), "AI Beyond the Scaling Laws."

Test-time compute letting a model reason longer before answering. This is a separate entirely that has kept yielding gains. Research shows a smaller model given more thinking time can outperform a much larger model that answers instantly in some cases a reasoning model can beat one ~14x its size.

Snell, Lee, Xu & Kumar (2024), "Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters."

(the 14x framing) Vardey (2026), "The Frontier: Reasoning Models, Scaling Laws โ€” What's Actually Coming Next."



So the approach shifted from make it bigger to make it reason and that's how frontier models kept gaining ability. Claude's thinking effort control is exactly this

Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models."

OpenAI (2024), "Learning to Reason with LLMs" (o1)"


1782366012476


A reasoning model comes at a cost, it first generates a hidden chain of thinking tokens, then writes the visible reply. Those thinking tokens are billed at the same output rate even though you never see them.



The whole market moved this way. The share of tokens served by reasoning models went from small chunk in early 2025 to over half average prompt length roughly quadrupled since early 2024, and completion length nearly tripled largely reasoning


1782365608226


In pure scaling era train one giant model once and done. In the reasoning era the cost moved to the meter: Every hard query spends more tokens this is why people these days are saying AI is too expensive


But on the bigger question experts still disagree:

Some (Like Yann LeCun) argue current methods alone won't reach AGI and new breakthroughs are needed

The International AI Safety Report 2026 states that scaling the key inputs (compute, data, algorithms) is technically feasible to around 2030 before hitting a fundamental bottleneck.

Bengio et al. (2026), "International AI Safety Report 2026."

These are genuinely open ended and a contested questions with strong arguments from both sides but one thing everyone agrees on is that at the frontier this is too expensive.

In April 2026, claude released mythos and then said the Fable 5's lead over its other models is that it grows the longer and more complex the task. In other words it is the long-horizon reasoning/agentic lever from Part 2 paying off but the cost and token usage is too much. It's priced like premium reasoning which is a big part of why local models are being preferred e.g. a fine-tuned 7B handling a document at ~$0.02 vs ~$0.30 for a frontier cloud API call.


1782365884940

Part 3: How the small models and the big ones are connected

Remember distillation. The clever little phone sized model is usually a student trained to imitate a giant teacher model. No frontier teacher, no smart student, so the big models are the factory that makes the small ones good.

On top of that, the field keeps finding cleverer training tricks, so the compute needed to hit a given capability keeps dropping roughly 3x per year by some measures

"Compute Requirements for Algorithmic Innovation in Frontier AI Models" (2025).

That's why the common 2026 setup isn't local replacing cloud, entirely it's hybrid. The on device model handles routine stuff, and it falls back to a cloud model for the genuinely hard queries and more intensive parts.

AI Magicx (2026), "On-Device AI in 2026.

I actually implemented this with the Voice feature of that App I built for the user. This is the 2026 pragmatic approach instead of fighting ghosts

1782367181934



Part 4: Meek models shall inherit the earth

There's a 2025 paper literally titled Meek Models Shall Inherit the Earth. Its claim: because returns to raw compute are shrinking, the capability gap between the most expensive models and cheaper ones opens up and then closes โ€” the little guys catch up. If that holds, frontier-level ability ends up everywhere, in everyone's pocket.

1782365540684


Gundlach, Lynch & Thompson (2025), "Meek Models Shall Inherit the Earth."

The same direction shows up in forecasts of how many models cross each capability bar over time.

Epoch AI (2025), "How Many AI Models Will Exceed Compute Thresholds?

TLDR-
  • Small/local AI is a real, current 2026 trend. Prioritizing efficiency, speed, privacy, and better hardware
  • Pretraining scaling shows diminishing returns; test-time/reasoning compute is still producing results at the cost of token limits and costs.
  • Frontier capability kept advancing in 2026, whether/when it reaches AGI is contested.
  • Small models are generally derived from big ones (distillation) plus steady efficiency gains.
  • Everyone agrees frontier inference is getting too expensive per token

My take

From what I've seen is that in silicon valley companies have taken a hands off and side line approach to the frontier dream. Not we gave up but more like we are not betting on it happening anytime soon. If you talk to a lot of the founders and CEOs, you mostly get a shrug. AGI might happen, might not, but either way the bills are brutal right now because of this most people are moving towards pocket AI that is cheap per query. That's the strongest trend and the one I'd put the most money on.

The Replaced with AI was the headline story in 2024, 2025 but in 2026 the new headline is the walkback not because of AI failed but once you add up the errors, the cleanup, and rehiring people at higher salaries, a lot of those "AI savings" evaporated


Where I think people overreach is the scaling is dead, so AGI is far off line. It's true for the bigger is not always better but think longer is still alive and well. when as I keep saying the big models are exactly what make the small ones good. So to me this looks less like the moonshot failed and more like the moonshot's results are going straight into your phone fast. I've always been AI optimist and always embraced new technology but feel free to disagree with me. I've laid out all the facts and you can form an opinion for yourself
 
Last edited:
  • +1
  • Woah
Reactions: FunnyVALENTINE, TheGreatDetective, primal_shitmuncher and 16 others
It's too early to predict anything, only in 20 to 30 years we will see the effect of AI. It's like the early days of the internet.
 
  • +1
Reactions: 1966Ford, primal_shitmuncher, Sycophant and 5 others
@NorwoodAscender @petsmart @FunnyVALENTINE @xaxanibber @Sycophant

 
  • +1
Reactions: FunnyVALENTINE, primal_shitmuncher, Sycophant and 4 others
@Chadeep @LXR @dhusc @chang cypionate
 
  • +1
Reactions: primal_shitmuncher, Sycophant, callard and 4 others
  • +1
Reactions: primal_shitmuncher, callard, Jason Voorhees and 1 other person
ใ…ค @exvh @primal_shitmuncher @Souleth
 
  • +1
Reactions: primal_shitmuncher, exvh, Chadeep and 1 other person
will read after cs game
 
  • +1
Reactions: primal_shitmuncher, Chadeep and Jason Voorhees
Why did they ban Mythos
 
  • +1
Reactions: primal_shitmuncher, Jason Voorhees and xaxanibber
@Rick_bozo @milkshake_addict @hollywoodngl @Aใ…คใ…คใ…ค
 
  • +1
Reactions: primal_shitmuncher, Rick_bozo and xaxanibber
nice guide

will read in a min
 
  • +1
Reactions: primal_shitmuncher, callard, Jason Voorhees and 3 others
Thank u for the read
 
  • +1
Reactions: Chadeep and Jason Voorhees
Regular Show Muscle Man GIF

Gud
 
  • +1
  • JFL
Reactions: primal_shitmuncher, Chadeep and Jason Voorhees
I have been following this since years and here's my list of Obsidian Notes with research papers going back till 2022.

View attachment 5270171

Nothing I say will be my own words, I've read a lot of these papers and every objective statement I make in this thread will come with a link to a study which validates it and in the end I'll give you my take which you can agree or disagree with. I've condensed all the information in an easy to read and understand format no math formulas, no deep research or technical things and I have simplified all the graphs to make them less academic

First, some terms so this is understandable for someone non technical

LLMโ€” a program trained on a huge pile of text that predicts the next word over and over. That's how it writes sentences. LLM= Large Language Model.

Parameters โ€” you can think of them as the internal adjustable dials that it tunes during training More dials = a bigger model and generally better. When you see 7B written nextthat means 7 billion of these dials. Claude/GPT models have hundreds of billions of these parameters..

Big models vs small models (SLMs) โ€” I'll use the famous analogy. A big model is like a giant general hospital knows everything, expensive, slow to get to vs a local clinic that's fast, cheap, and handles 90% of what you actually walk in with. Small Language Models are roughly 1โ€“14B dials, big models above that

Cloud vs local / on-device โ€” Cloud means the AI runs on a company's servers. Local / on-device means it runs on the chip inside your phone or laptop. Your data never leaves your hand can run without internet.

Tokens โ€” a token is a chunk of text, around a word-piece.

NPU โ€” a dedicated AI chip built into phones and laptops these days.

Quantization โ€” compressing a model so it fits on a small device like how people compress big ass photos into smaller pictures to send. Some quality lost, but same model.

Distillation โ€” training a small "student" model to imitate a big "teacher" model. This is how tiny models get surprisingly smart. Remember this word I've bolded and colored this and the next term because this is where a lot of the recent innovation has happened.

View attachment 5270162


Pretraining vs test-time (inference) compute โ€” pretraining is the studying the model does in advance. Test time compute is how much "thinking" it's allowed to do in the moment you ask it something.


View attachment 5270170

AGIโ€” Artificial General Intelligence: the hypothetical AI that can do basically any intellectual task a human can. Nobody agrees on the exact definition which is actually half the problem.

Part 1: Local Models

This is headline feature of 2026, the shift toward small and local models

This app I built for a user. This was only possible because of advancements in local LLMs in 2025โ€“26.

You can freely download very capable open models now, the Gemma and Qwen families handle everyday tasks at a quality comparable to cloud models from a year or two earlier.

A small (7B) model costs roughly 10โ€“30x less to run than a large one, and current phone chips can run 8-billion-dial models at conversational speed (20+ tokens/sec)

Deployment has also matured a lot. I was able to do it in little time because now running a model locally isn't a research project taking weeks anymore. You have Ollama, LM Studio and llama.cpp etc.

Part 2: the scaling debate

One thing people noticed in the early days of AI is that when you started throwing more compute at a model, it gained new abilities and more intelligence on its own.

This is called the scaling law and in 2026 we're now seeing its limits.

The original scaling-laws result performance improving predictably as you add size, data, and compute

Kaplan et al. (2020), "Scaling Laws for Neural Language Models"

The new abilities appear as you scale" observation:

Wei et al. (2022), "Emergent Abilities of Large Language Models"

Pretraining scaling (bigger model + more data) in 2026 has reached the point of diminishing returns. GPT-5 dead on arrival is the headline proof



View attachment 5270158

HEC Paris (2025), "AI Beyond the Scaling Laws."

Test-time compute letting a model reason longer before answering. This is a separate entirely that has kept yielding gains. Research shows a smaller model given more thinking time can outperform a much larger model that answers instantly in some cases a reasoning model can beat one ~14x its size.

Snell, Lee, Xu & Kumar (2024), "Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters."

(the 14x framing) Vardey (2026), "The Frontier: Reasoning Models, Scaling Laws โ€” What's Actually Coming Next."



So the approach shifted from make it bigger to make it reason and that's how frontier models kept gaining ability. Claude's thinking effort control is exactly this

Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models."

OpenAI (2024), "Learning to Reason with LLMs" (o1)"


View attachment 5270118

A reasoning model comes at a cost, it first generates a hidden chain of thinking tokens, then writes the visible reply. Those thinking tokens are billed at the same output rate even though you never see them.



The whole market moved this way. The share of tokens served by reasoning models went from small chunk in early 2025 to over half average prompt length roughly quadrupled since early 2024, and completion length nearly tripled largely reasoning


View attachment 5270106

In pure scaling era train one giant model once and done. In the reasoning era the cost moved to the meter: Every hard query spends more tokens this is why people these days are saying AI is too expensive


But on the bigger question experts still disagree:

Some (Like Yann LeCun) argue current methods alone won't reach AGI and new breakthroughs are needed

The International AI Safety Report 2026 states that scaling the key inputs (compute, data, algorithms) is technically feasible to around 2030 before hitting a fundamental bottleneck.

Bengio et al. (2026), "International AI Safety Report 2026."

These are genuinely open ended and a contested questions with strong arguments from both sides but one thing everyone agrees on is that at the frontier this is too expensive.

In April 2026, claude released mythos and then said the Fable 5's lead over its other models is that it grows the longer and more complex the task. In other words it is the long-horizon reasoning/agentic lever from Part 2 paying off but the cost and token usage is too much. It's priced like premium reasoning which is a big part of why local models are being preferred e.g. a fine-tuned 7B handling a document at ~$0.02 vs ~$0.30 for a frontier cloud API call.


View attachment 5270112
Part 3: How the small models and the big ones are connected

Remember distillation. The clever little phone sized model is usually a student trained to imitate a giant teacher model. No frontier teacher, no smart student, so the big models are the factory that makes the small ones good.

On top of that, the field keeps finding cleverer training tricks, so the compute needed to hit a given capability keeps dropping roughly 3x per year by some measures

"Compute Requirements for Algorithmic Innovation in Frontier AI Models" (2025).

That's why the common 2026 setup isn't local replacing cloud, entirely it's hybrid. The on device model handles routine stuff, and it falls back to a cloud model for the genuinely hard queries and more intensive parts.

AI Magicx (2026), "On-Device AI in 2026.

I actually implemented this with the Voice feature of that App I built for the user. This is the 2026 pragmatic approach instead of fighting ghosts

View attachment 5270184


Part 4: Meek models shall inherit the earth

There's a 2025 paper literally titled Meek Models Shall Inherit the Earth. Its claim: because returns to raw compute are shrinking, the capability gap between the most expensive models and cheaper ones opens up and then closes โ€” the little guys catch up. If that holds, frontier-level ability ends up everywhere, in everyone's pocket.

View attachment 5270104

Gundlach, Lynch & Thompson (2025), "Meek Models Shall Inherit the Earth."

The same direction shows up in forecasts of how many models cross each capability bar over time.

Epoch AI (2025), "How Many AI Models Will Exceed Compute Thresholds?

TLDR-
  • Small/local AI is a real, current 2026 trend. Prioritizing efficiency, speed, privacy, and better hardware
  • Pretraining scaling shows diminishing returns; test-time/reasoning compute is still producing results at the cost of token limits and costs.
  • Frontier capability kept advancing in 2026, whether/when it reaches AGI is contested.
  • Small models are generally derived from big ones (distillation) plus steady efficiency gains.
  • Everyone agrees frontier inference is getting too expensive per token

My take

From what I've seen is that in silicon valley companies have taken a hands off and side line approach to the frontier dream. Not we gave up but more like we are not betting on it happening anytime soon. If you talk to a lot of the founders and CEOs, you mostly get a shrug. AGI might happen, might not, but either way the bills are brutal right now because of this most people are moving towards pocket AI that is cheap per query. That's the strongest trend and the one I'd put the most money on.

The Replaced with AI was the headline story in 2024, 2025 but in 2026 the new headline is the walkback not because of AI failed but once you add up the errors, the cleanup, and rehiring people at higher salaries, a lot of those "AI savings" evaporated


Where I think people overreach is the scaling is dead, so AGI is far off line. It's true for the bigger is not always better but think longer is still alive and well. when as I keep saying the big models are exactly what make the small ones good. So to me this looks less like the moonshot failed and more like the moonshot's results are going straight into your phone fast. I've always been AI optimist and always embraced new technology but feel free to disagree with me. I've laid out all the facts and you can form an opinion for yourself

high iq thread
 
  • +1
Reactions: primal_shitmuncher, callard, Chadeep and 1 other person
@Centurion Hunter @afroheadluke @callard
 
  • +1
Reactions: primal_shitmuncher, callard, afroheadluke and 1 other person
I have been following this since years and here's my list of Obsidian Notes with research papers going back till 2022.

View attachment 5270171

Nothing I say will be my own words, I've read a lot of these papers and every objective statement I make in this thread will come with a link to a study which validates it and in the end I'll give you my take which you can agree or disagree with. I've condensed all the information in an easy to read and understand format no math formulas, no deep research or technical things and I have simplified all the graphs to make them less academic

First, some terms so this is understandable for someone non technical

LLMโ€” a program trained on a huge pile of text that predicts the next word over and over. That's how it writes sentences. LLM= Large Language Model.

Parameters โ€” you can think of them as the internal adjustable dials that it tunes during training More dials = a bigger model and generally better. When you see 7B written nextthat means 7 billion of these dials. Claude/GPT models have hundreds of billions of these parameters..

Big models vs small models (SLMs) โ€” I'll use the famous analogy. A big model is like a giant general hospital knows everything, expensive, slow to get to vs a local clinic that's fast, cheap, and handles 90% of what you actually walk in with. Small Language Models are roughly 1โ€“14B dials, big models above that

Cloud vs local / on-device โ€” Cloud means the AI runs on a company's servers. Local / on-device means it runs on the chip inside your phone or laptop. Your data never leaves your hand can run without internet.

Tokens โ€” a token is a chunk of text, around a word-piece.

NPU โ€” a dedicated AI chip built into phones and laptops these days.

Quantization โ€” compressing a model so it fits on a small device like how people compress big ass photos into smaller pictures to send. Some quality lost, but same model.

Distillation โ€” training a small "student" model to imitate a big "teacher" model. This is how tiny models get surprisingly smart. Remember this word I've bolded and colored this and the next term because this is where a lot of the recent innovation has happened.

View attachment 5270162


Pretraining vs test-time (inference) compute โ€” pretraining is the studying the model does in advance. Test time compute is how much "thinking" it's allowed to do in the moment you ask it something.


View attachment 5270170

AGIโ€” Artificial General Intelligence: the hypothetical AI that can do basically any intellectual task a human can. Nobody agrees on the exact definition which is actually half the problem.

Part 1: Local Models

This is headline feature of 2026, the shift toward small and local models

This app I built for a user. This was only possible because of advancements in local LLMs in 2025โ€“26.

You can freely download very capable open models now, the Gemma and Qwen families handle everyday tasks at a quality comparable to cloud models from a year or two earlier.

A small (7B) model costs roughly 10โ€“30x less to run than a large one, and current phone chips can run 8-billion-dial models at conversational speed (20+ tokens/sec)

Deployment has also matured a lot. I was able to do it in little time because now running a model locally isn't a research project taking weeks anymore. You have Ollama, LM Studio and llama.cpp etc.

Part 2: the scaling debate

One thing people noticed in the early days of AI is that when you started throwing more compute at a model, it gained new abilities and more intelligence on its own.

This is called the scaling law and in 2026 we're now seeing its limits.

The original scaling-laws result performance improving predictably as you add size, data, and compute

Kaplan et al. (2020), "Scaling Laws for Neural Language Models"

The new abilities appear as you scale" observation:

Wei et al. (2022), "Emergent Abilities of Large Language Models"

Pretraining scaling (bigger model + more data) in 2026 has reached the point of diminishing returns. GPT-5 dead on arrival is the headline proof



View attachment 5270158

HEC Paris (2025), "AI Beyond the Scaling Laws."

Test-time compute letting a model reason longer before answering. This is a separate entirely that has kept yielding gains. Research shows a smaller model given more thinking time can outperform a much larger model that answers instantly in some cases a reasoning model can beat one ~14x its size.

Snell, Lee, Xu & Kumar (2024), "Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters."

(the 14x framing) Vardey (2026), "The Frontier: Reasoning Models, Scaling Laws โ€” What's Actually Coming Next."



So the approach shifted from make it bigger to make it reason and that's how frontier models kept gaining ability. Claude's thinking effort control is exactly this

Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models."

OpenAI (2024), "Learning to Reason with LLMs" (o1)"


View attachment 5270118

A reasoning model comes at a cost, it first generates a hidden chain of thinking tokens, then writes the visible reply. Those thinking tokens are billed at the same output rate even though you never see them.



The whole market moved this way. The share of tokens served by reasoning models went from small chunk in early 2025 to over half average prompt length roughly quadrupled since early 2024, and completion length nearly tripled largely reasoning


View attachment 5270106

In pure scaling era train one giant model once and done. In the reasoning era the cost moved to the meter: Every hard query spends more tokens this is why people these days are saying AI is too expensive


But on the bigger question experts still disagree:

Some (Like Yann LeCun) argue current methods alone won't reach AGI and new breakthroughs are needed

The International AI Safety Report 2026 states that scaling the key inputs (compute, data, algorithms) is technically feasible to around 2030 before hitting a fundamental bottleneck.

Bengio et al. (2026), "International AI Safety Report 2026."

These are genuinely open ended and a contested questions with strong arguments from both sides but one thing everyone agrees on is that at the frontier this is too expensive.

In April 2026, claude released mythos and then said the Fable 5's lead over its other models is that it grows the longer and more complex the task. In other words it is the long-horizon reasoning/agentic lever from Part 2 paying off but the cost and token usage is too much. It's priced like premium reasoning which is a big part of why local models are being preferred e.g. a fine-tuned 7B handling a document at ~$0.02 vs ~$0.30 for a frontier cloud API call.


View attachment 5270112
Part 3: How the small models and the big ones are connected

Remember distillation. The clever little phone sized model is usually a student trained to imitate a giant teacher model. No frontier teacher, no smart student, so the big models are the factory that makes the small ones good.

On top of that, the field keeps finding cleverer training tricks, so the compute needed to hit a given capability keeps dropping roughly 3x per year by some measures

"Compute Requirements for Algorithmic Innovation in Frontier AI Models" (2025).

That's why the common 2026 setup isn't local replacing cloud, entirely it's hybrid. The on device model handles routine stuff, and it falls back to a cloud model for the genuinely hard queries and more intensive parts.

AI Magicx (2026), "On-Device AI in 2026.

I actually implemented this with the Voice feature of that App I built for the user. This is the 2026 pragmatic approach instead of fighting ghosts

View attachment 5270184


Part 4: Meek models shall inherit the earth

There's a 2025 paper literally titled Meek Models Shall Inherit the Earth. Its claim: because returns to raw compute are shrinking, the capability gap between the most expensive models and cheaper ones opens up and then closes โ€” the little guys catch up. If that holds, frontier-level ability ends up everywhere, in everyone's pocket.

View attachment 5270104

Gundlach, Lynch & Thompson (2025), "Meek Models Shall Inherit the Earth."

The same direction shows up in forecasts of how many models cross each capability bar over time.

Epoch AI (2025), "How Many AI Models Will Exceed Compute Thresholds?

TLDR-
  • Small/local AI is a real, current 2026 trend. Prioritizing efficiency, speed, privacy, and better hardware
  • Pretraining scaling shows diminishing returns; test-time/reasoning compute is still producing results at the cost of token limits and costs.
  • Frontier capability kept advancing in 2026, whether/when it reaches AGI is contested.
  • Small models are generally derived from big ones (distillation) plus steady efficiency gains.
  • Everyone agrees frontier inference is getting too expensive per token

My take

From what I've seen is that in silicon valley companies have taken a hands off and side line approach to the frontier dream. Not we gave up but more like we are not betting on it happening anytime soon. If you talk to a lot of the founders and CEOs, you mostly get a shrug. AGI might happen, might not, but either way the bills are brutal right now because of this most people are moving towards pocket AI that is cheap per query. That's the strongest trend and the one I'd put the most money on.

The Replaced with AI was the headline story in 2024, 2025 but in 2026 the new headline is the walkback not because of AI failed but once you add up the errors, the cleanup, and rehiring people at higher salaries, a lot of those "AI savings" evaporated


Where I think people overreach is the scaling is dead, so AGI is far off line. It's true for the bigger is not always better but think longer is still alive and well. when as I keep saying the big models are exactly what make the small ones good. So to me this looks less like the moonshot failed and more like the moonshot's results are going straight into your phone fast. I've always been AI optimist and always embraced new technology but feel free to disagree with me. I've laid out all the facts and you can form an opinion for yourself

Interesting
To be honest we shall see how it moves forward . Not much point in trying to predict when many of the AI companies dont even know themselves how much they could optimize their models.
 
  • +1
Reactions: primal_shitmuncher, callard and Jason Voorhees
I have been following this since years and here's my list of Obsidian Notes with research papers going back till 2022.

View attachment 5270171

Nothing I say will be my own words, I've read a lot of these papers and every objective statement I make in this thread will come with a link to a study which validates it and in the end I'll give you my take which you can agree or disagree with. I've condensed all the information in an easy to read and understand format no math formulas, no deep research or technical things and I have simplified all the graphs to make them less academic

First, some terms so this is understandable for someone non technical

LLMโ€” a program trained on a huge pile of text that predicts the next word over and over. That's how it writes sentences. LLM= Large Language Model.

Parameters โ€” you can think of them as the internal adjustable dials that it tunes during training More dials = a bigger model and generally better. When you see 7B written nextthat means 7 billion of these dials. Claude/GPT models have hundreds of billions of these parameters..

Big models vs small models (SLMs) โ€” I'll use the famous analogy. A big model is like a giant general hospital knows everything, expensive, slow to get to vs a local clinic that's fast, cheap, and handles 90% of what you actually walk in with. Small Language Models are roughly 1โ€“14B dials, big models above that

Cloud vs local / on-device โ€” Cloud means the AI runs on a company's servers. Local / on-device means it runs on the chip inside your phone or laptop. Your data never leaves your hand can run without internet.

Tokens โ€” a token is a chunk of text, around a word-piece.

NPU โ€” a dedicated AI chip built into phones and laptops these days.

Quantization โ€” compressing a model so it fits on a small device like how people compress big ass photos into smaller pictures to send. Some quality lost, but same model.

Distillation โ€” training a small "student" model to imitate a big "teacher" model. This is how tiny models get surprisingly smart. Remember this word I've bolded and colored this and the next term because this is where a lot of the recent innovation has happened.

View attachment 5270162


Pretraining vs test-time (inference) compute โ€” pretraining is the studying the model does in advance. Test time compute is how much "thinking" it's allowed to do in the moment you ask it something.


View attachment 5270170

AGIโ€” Artificial General Intelligence: the hypothetical AI that can do basically any intellectual task a human can. Nobody agrees on the exact definition which is actually half the problem.

Part 1: Local Models

This is headline feature of 2026, the shift toward small and local models

This app I built for a user. This was only possible because of advancements in local LLMs in 2025โ€“26.

You can freely download very capable open models now, the Gemma and Qwen families handle everyday tasks at a quality comparable to cloud models from a year or two earlier.

A small (7B) model costs roughly 10โ€“30x less to run than a large one, and current phone chips can run 8-billion-dial models at conversational speed (20+ tokens/sec)

Deployment has also matured a lot. I was able to do it in little time because now running a model locally isn't a research project taking weeks anymore. You have Ollama, LM Studio and llama.cpp etc.

Part 2: the scaling debate

One thing people noticed in the early days of AI is that when you started throwing more compute at a model, it gained new abilities and more intelligence on its own.

This is called the scaling law and in 2026 we're now seeing its limits.

The original scaling-laws result performance improving predictably as you add size, data, and compute

Kaplan et al. (2020), "Scaling Laws for Neural Language Models"

The new abilities appear as you scale" observation:

Wei et al. (2022), "Emergent Abilities of Large Language Models"

Pretraining scaling (bigger model + more data) in 2026 has reached the point of diminishing returns. GPT-5 dead on arrival is the headline proof



View attachment 5270158

HEC Paris (2025), "AI Beyond the Scaling Laws."

Test-time compute letting a model reason longer before answering. This is a separate entirely that has kept yielding gains. Research shows a smaller model given more thinking time can outperform a much larger model that answers instantly in some cases a reasoning model can beat one ~14x its size.

Snell, Lee, Xu & Kumar (2024), "Scaling LLM Test-Time Compute Optimally Can Be More Effective Than Scaling Model Parameters."

(the 14x framing) Vardey (2026), "The Frontier: Reasoning Models, Scaling Laws โ€” What's Actually Coming Next."



So the approach shifted from make it bigger to make it reason and that's how frontier models kept gaining ability. Claude's thinking effort control is exactly this

Wei et al. (2022), "Chain-of-Thought Prompting Elicits Reasoning in Large Language Models."

OpenAI (2024), "Learning to Reason with LLMs" (o1)"


View attachment 5270118

A reasoning model comes at a cost, it first generates a hidden chain of thinking tokens, then writes the visible reply. Those thinking tokens are billed at the same output rate even though you never see them.



The whole market moved this way. The share of tokens served by reasoning models went from small chunk in early 2025 to over half average prompt length roughly quadrupled since early 2024, and completion length nearly tripled largely reasoning


View attachment 5270106

In pure scaling era train one giant model once and done. In the reasoning era the cost moved to the meter: Every hard query spends more tokens this is why people these days are saying AI is too expensive


But on the bigger question experts still disagree:

Some (Like Yann LeCun) argue current methods alone won't reach AGI and new breakthroughs are needed

The International AI Safety Report 2026 states that scaling the key inputs (compute, data, algorithms) is technically feasible to around 2030 before hitting a fundamental bottleneck.

Bengio et al. (2026), "International AI Safety Report 2026."

These are genuinely open ended and a contested questions with strong arguments from both sides but one thing everyone agrees on is that at the frontier this is too expensive.

In April 2026, claude released mythos and then said the Fable 5's lead over its other models is that it grows the longer and more complex the task. In other words it is the long-horizon reasoning/agentic lever from Part 2 paying off but the cost and token usage is too much. It's priced like premium reasoning which is a big part of why local models are being preferred e.g. a fine-tuned 7B handling a document at ~$0.02 vs ~$0.30 for a frontier cloud API call.


View attachment 5270112
Part 3: How the small models and the big ones are connected

Remember distillation. The clever little phone sized model is usually a student trained to imitate a giant teacher model. No frontier teacher, no smart student, so the big models are the factory that makes the small ones good.

On top of that, the field keeps finding cleverer training tricks, so the compute needed to hit a given capability keeps dropping roughly 3x per year by some measures

"Compute Requirements for Algorithmic Innovation in Frontier AI Models" (2025).

That's why the common 2026 setup isn't local replacing cloud, entirely it's hybrid. The on device model handles routine stuff, and it falls back to a cloud model for the genuinely hard queries and more intensive parts.

AI Magicx (2026), "On-Device AI in 2026.

I actually implemented this with the Voice feature of that App I built for the user. This is the 2026 pragmatic approach instead of fighting ghosts

View attachment 5270184


Part 4: Meek models shall inherit the earth

There's a 2025 paper literally titled Meek Models Shall Inherit the Earth. Its claim: because returns to raw compute are shrinking, the capability gap between the most expensive models and cheaper ones opens up and then closes โ€” the little guys catch up. If that holds, frontier-level ability ends up everywhere, in everyone's pocket.

View attachment 5270104

Gundlach, Lynch & Thompson (2025), "Meek Models Shall Inherit the Earth."

The same direction shows up in forecasts of how many models cross each capability bar over time.

Epoch AI (2025), "How Many AI Models Will Exceed Compute Thresholds?

TLDR-
  • Small/local AI is a real, current 2026 trend. Prioritizing efficiency, speed, privacy, and better hardware
  • Pretraining scaling shows diminishing returns; test-time/reasoning compute is still producing results at the cost of token limits and costs.
  • Frontier capability kept advancing in 2026, whether/when it reaches AGI is contested.
  • Small models are generally derived from big ones (distillation) plus steady efficiency gains.
  • Everyone agrees frontier inference is getting too expensive per token

My take

From what I've seen is that in silicon valley companies have taken a hands off and side line approach to the frontier dream. Not we gave up but more like we are not betting on it happening anytime soon. If you talk to a lot of the founders and CEOs, you mostly get a shrug. AGI might happen, might not, but either way the bills are brutal right now because of this most people are moving towards pocket AI that is cheap per query. That's the strongest trend and the one I'd put the most money on.

The Replaced with AI was the headline story in 2024, 2025 but in 2026 the new headline is the walkback not because of AI failed but once you add up the errors, the cleanup, and rehiring people at higher salaries, a lot of those "AI savings" evaporated


Where I think people overreach is the scaling is dead, so AGI is far off line. It's true for the bigger is not always better but think longer is still alive and well. when as I keep saying the big models are exactly what make the small ones good. So to me this looks less like the moonshot failed and more like the moonshot's results are going straight into your phone fast. I've always been AI optimist and always embraced new technology but feel free to disagree with me. I've laid out all the facts and you can form an opinion for yourself

My prediction is that Google will win because it is the only company that is still in contention and has proved that it can last. AI is not its most important part, but it certainly can become. Till then Google labs will keep pumping research and find use cases most will never bother about. Anthropic and OpenAi simply cant compete in those niches because they have no MOAT apart from AI
 
Last edited:
  • +1
Reactions: AustrianMogger, primal_shitmuncher, callard and 1 other person
Nice
Anyone know about Ai bots and the science behind that. Youtube comments
 
  • +1
Reactions: primal_shitmuncher and Jason Voorhees
@callard @Sayori @
 
  • +1
Reactions: primal_shitmuncher and callard
It's too early to predict anything, only in 20 to 30 years we will see the effect of AI. It's like the early days of the internet.
Itโ€™s crazy to believe that the person who created the World Wide Web is still well alive today meaning it isnโ€™t even that old and historically we are still in the early days of the Internet, yet we have one more addition of AI now thatโ€™s changes the Internet exponentially.
 
  • +1
Reactions: SplashJuice, primal_shitmuncher, Jason Voorhees and 1 other person
Itโ€™s crazy to believe that the person who created the World Wide Web is still well alive today meaning it isnโ€™t even that old and historically we are still in the early days of the Internet, yet we have one more addition of AI now thatโ€™s changes the Internet exponentially.
In some years we will talk about the people behind the Transformers paper as the originators of the LLM boom
 
  • +1
Reactions: primal_shitmuncher and Jason Voorhees
@blinkers @Pony
 
  • +1
Reactions: primal_shitmuncher, Pony and blinkers
Mirim the high iq thread always love reading them


Bump
 
  • +1
  • Love it
Reactions: Jason Voorhees and primal_shitmuncher
  • +1
Reactions: blinkers
@TheGreatDetective
 
@zennn
 
  • +1
Reactions: zennn
@Mast @buccalfatremoval
 
20$ a month, never hit limits, make 100s of photos a day

1782376546702
1782376586819
1782376621986
 
  • +1
Reactions: Jason Voorhees
@Gomez @munnabhai
 
@Hess @Luquier
 

Similar threads

Jason Voorhees
Replies
15
Views
74
Aใ…คใ…คใ…ค
Aใ…คใ…คใ…ค
Joeseminate
Replies
15
Views
60
Joeseminate
Joeseminate
jaymxes
Replies
1
Views
9
lrqz
lrqz
Ren Hoek
Replies
2
Views
22
D69mo
D69mo
Nectar
Replies
22
Views
119
BigJimsWornOutTires
BigJimsWornOutTires

Users who are viewing this thread

  • Mob Boss
  • IronMike
  • munnabhai
  • JeffreyDahmer
  • Emerald189
  • birthdefect
  • zennn
  • FunnyVALENTINE
  • TheGreatDetective
  • blinkers
Back
Top