• Riskable@programming.dev
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    2
    ·
    4 hours ago

    stole all that licensed code.

    Stealing is when the owner of a thing doesn’t have it anymore; because it was stolen.

    LLMs aren’t “stealing” anything… yet! Soon we’ll have them hooked up to robots then they’ll be stealing¹ 👍

    1. Because a user instructed it to do so.
    • mesa@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      4 hours ago

      I think I get what your saying. LOL LLM bots stealing all the things.

      You may note, im not arguing the ethical concerns of LLMs, just the way it was pulled. Its why open source models that pull data and let others have full access to said data could be argued as more ethical. For practical purposes, it means we can just pull them off hugging face and use them on our home setups. And reproduce them with the “correct” datasets. As always garbage in/ garbage out. I wish my work would allow me to put all the SQL over a 30(?) year period into a custom LLM just for our proprietary BS. Thats something I would have NO ethical concerns about at all.

      • Riskable@programming.dev
        link
        fedilink
        English
        arrow-up
        1
        ·
        3 hours ago

        For reference, every AI image model uses ImageNET (as far as I know) which is just a big database of publicly accessible URLs and metadata (classification info like, “bird” <coordinates in the image>).

        The “big AI” companies like Meta, Google, and OpenAI/Microsoft have access to additional image data sets that are 100% proprietary. But what’s interesting is that the image models that are constructed from just ImageNET (and other open sources) are better! They’re superior in just about every way!

        Compare what you get from say, ChatGPT (DALL-E 3) with a FLUX model you can download from civit.ai… you’ll get such superior results it’s like night and day! Not only that, but you have an enormous plethora of LoRAs to choose from to get exactly the type of image you want.

        What we’re missing is the same sort of open data sets for LLMs. Universities have access to some stuff but even that is licensed.