Guess we can always rely on the good old fashioned ways to make money…

Honestly, I think its pretty awful but im not surprised.

  • brucethemoose@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    arrow-down
    1
    ·
    edit-2
    1 day ago

    TBH most ‘average’ people don’t have GPUs or even know what that is, but they do have smartphones.

    And they can already run these models pretty well too, albeit suboptimally. That’s my point. But the software needs to catch up.

    • tal@lemmy.today
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 day ago

      But the software needs to catch up.

      Honestly, there is a lot of potential room for substantial improvements.

      • Gaining the ability to identify edges of the model that are not-particularly-relevant relevant to the current problem and unloading them. That could bring down memory requirements a lot.

      • I don’t think — though I haven’t been following the area — that current models are optimized for being clustered. Hell, the software running them isn’t either. There’s some guy, Jeff Geerling, who was working on clustering Framework Desktops a couple months back, because they’re a relatively-inexpensive way to get a ton of VRAM attached to parallel processing capability. You can have multiple instances of the software active on the hardware, and you can offload different layers to different APUs, but currently, it’s basically running sequentially — no more than one APU is doing compute presently. I’m pretty sure that that’s something that can be eliminated (if it hasn’t already been). Then the problem — which he also discusses — is that you need to move a fair bit of data from APU to APU, so you want high-speed interconnects. Okay, so that’s true, if what you want is to just run very models designed for very expensive, beefy hardware on a lot of clustered, inexpensive hardware…but you could also train models to optimize for this, like use a network of neural nets that have extremely-sparse interconnections between them, and denser connections internal to them. Each APU only runs one neural net.

      • I am sure that we are nowhere near being optimal just for the tasks that we’re currently doing, even using the existing models.

      • It’s probably possible to tie non-neural-net code in to produce very large increases in capability. To make up a simple example, LLMs are, as people have pointed out, not very good at giving answers to arithmetic questions. But…it should be perfectly viable to add a “math unit” that some of the nodes on the neural net interfaces with and train it to make use of that math unit. And suddenly, because you’ve just effectively built a CPU into the thing’s brain, it becomes far better than any human at arithmetic…and potentially at things that makes use of that capability. There are lots of things that we have very good software for today. A human can use software for some of those things, through their fingers and eyes — not a very high rate of data interchange, but we can do it. There are people like Musk’s Neuralink crowd that are trying to build computer-brain interfaces. But we can just build that software directly into the brain of a neural net, have the thing interface with it at the full bandwidth that the brain can operate at. If you build software to do image or audio processing in to help extract information that is likely “more useful” but expensive for a neural net to compute, they might get a whole lot more efficient.

      • brucethemoose@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        edit-2
        1 day ago

        Whooa nellie.

        I don’t care to get into that, really all I meant was ‘they need some low level work’

        Popular models and tools/augmentations needed to be quantized better and ported from CUDA to MLX/CoreML… that’s it.

        That’s all, really.

        They’d run many times faster and fit in RAM then, as opposed to the ‘hacked in’ PyTorch frameworks meant for research they run on now. And all Apple needs to do is sick a few engineers on it.

        I dunno about Android. That situation is much more complicated, and I’m not sure what the ‘best’ Vulkan runtime to port to is these days.