That’s basically model routing, and has existed a while. Open AI’s GPT-5 and llama-swap do that, for example. If the task is simple, it uses a smaller, less intensive model, and only uses the slower, larger one of the task is more complex.
Though most tend to operate with models on the same device/service, rather than a model run elsewhere.
That’s basically model routing, and has existed a while. Open AI’s GPT-5 and llama-swap do that, for example. If the task is simple, it uses a smaller, less intensive model, and only uses the slower, larger one of the task is more complex.
Though most tend to operate with models on the same device/service, rather than a model run elsewhere.