OpenAI moves to allow “mature apps” on its platforms

1984@lemmy.today · 24 days ago

OpenAI moves to allow “mature apps” on its platforms

FartMaster69@lemmy.dbzer0.com · 24 days ago

Except these AI models need data to train on, they cannot improve without an industry to leach off of.

Halcyon@discuss.tchncs.de · 24 days ago

As if we didn’t already have more than enough pornographic material on all the hard drives worldwide for training. There’s nothing new to come in the image material from this industry, porn is infinite repetitions.

brucethemoose@lemmy.world · edit-2 24 days ago

Lol, this too.

It’s honestly too much already.

tal@lemmy.today · edit-2 24 days ago

While I don’t disagree with your overall point, I would point out that a lot of that material has been lossily-compressed to a degree that significantly-degrades quality. That doesn’t make it unusable for training, but it does introduce a real complication, since your first task has to be being able to deal with compression artifacts in the content. Not to mention any post-processing, editing, and so forth.

One thing I’ve mentioned here — it was half tongue-in-cheek — is that it might be less-costly than trying to work only from that training corpus, to hire actors specifically to generate video to train an AI for any weak points you need. That lets you get raw, uncompressed data using high-fidelity instruments in an environment with controlled lighting, and you can do stuff like use LIDAR or multiple cameras to make reducing the scene to a 3D model simpler and more-reliable. The existing image and video generation models that people are running around with have a “2D mental model” of the world. Trying to bridge the gap towards having a 3D model is going to be another jump that will have to come to solve a lot of problems. The less hassle there is with having to deal with compression artifacts and such in getting to 3D models, probably the better.

Halcyon@discuss.tchncs.de · 24 days ago

There’s loads of hi-res ultra HD 4k porn available. If someone professional wants to train on that it’s not hard to find. If someone wants to play a leading role in the field of AI training, then of course they invest the necessary money and don’t use the shady material from the peer-to-peer network.

tal@lemmy.today · 24 days ago

There’s loads of hi-res ultra HD 4k porn available.

It’s still gonna have compression artifacts. Like, the point of lossy compression having psychoacoustic and psychovisual models is to degrade the stuff as far as you can without it being noticeable. That doesn’t impact you if you’re viewing the content without transformation, but it does become a factor if you don’t. Like, you’re viewing something in a reduced colorspace with blocks and color shifts and stuff.

I can go dig up a couple of diffusion models finetuned off SDXL that generate images with visible JPEG artifacts, because they were trained on a corpus that included a lot of said material and didn’t have some kind of preprocessing to deal with it.

I’m not saying that it’s technically-impossible to build something that can learn to process and compensate for all that. I (unsuccessfully) spent some time, about 20 years back, on a personal project to add neural net postprocessing to reduce visibility of lossy compression artifacts, which is one part of how one might mitigate that. Just that it adds complexity to the problem to be solved.

brucethemoose@lemmy.world · edit-2 24 days ago

It’s easy to get rid of that with prefiltering/culling and some preprocessing. I like BM3D+deblock, but you could even run them though light GAN or diffusion passes.

A lot of the amateur lora makers aren’t careful about that, but I’d hope someone shelling out for a major fine tune would.

brucethemoose@lemmy.world · 24 days ago

Also “minor” compression from high quality material isn’t so bad, especially if starting with a pre trained model. A light denoising step will mix it into nothing.

brucethemoose@lemmy.world · edit-2 24 days ago

Except these AI models need data to train on, they cannot improve without an industry to leach off of.

Not anymore.

The new trend in ML is training on synthetic data, alongside more refined sets of curated data.

And, honeslty the open base models we have now are ‘good enough’ with some finetuning, and maybe QAT.

FartMaster69@lemmy.dbzer0.com · 24 days ago

Ah sweet model collapse.

brucethemoose@lemmy.world · edit-2 24 days ago

That’s certainly something I’ve observed myself training GANs on their own output. It’s definitely a problem for the stupid (like Tech Bros).

But it doesn’t happen like you think, as long as the augmentations are clever, and their scope is narrow. Hence the success of several recent distillations and ‘augmented’ LLMs, and the failure of huge dataset trains like Llama4.

…And synthetic data generation/augmentation is getting clever, and is already being used in newer trains. See this, or newer papers if your search for them on arixv: https://github.com/qychen2001/Awesome-Synthetic-Data

Or Nvidia’s HUGE focus on this, combining it with their work in computer graphics: https://www.nvidia.com/en-us/use-cases/synthetic-data-physical-ai/

OpenAI moves to allow “mature apps” on its platforms

OpenAI moves to allow “mature apps” on its platforms

Sam Altman Pushes Back on OpenAI's Foray Into Smut