• tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    27
    arrow-down
    1
    ·
    2 months ago

    Well, you’ve got a timestamped copy of much of the Web that existed up until latent-diffusion models at archive.org. That may not give you access to newer information, but it’s a pretty whopping big chunk of data to work with.

    • palordrolap@kbin.run
      link
      fedilink
      arrow-up
      21
      ·
      2 months ago

      Hopefully archive.org have measures in place to stop people from yanking all their data too quickly. As least not without a hefty donation or something. As a user it can chug a bit, and I’m hoping that’s the rate-limiting I’m talking about and not that they’re swamped.

      • Grimy@lemmy.world
        link
        fedilink
        English
        arrow-up
        9
        arrow-down
        2
        ·
        edit-2
        2 months ago

        That would go against the principal of the archive imo but regardless, if you take away all means of acquiring data freely, you are just giving companies like OpenAI and Google who already have copies of it an insane advantage.

        AI isn’t going away, we need to make sure we have free access to it as to not give our whole economy to a handful of companies.