A software developer and Linux nerd, living in Germany. I’m usually a chill dude but my online persona doesn’t always reflect my true personality. Take what I say with a grain of salt, I usually try to be nice and give good advice, though.

I’m into Free Software, selfhosting, microcontrollers and electronics, freedom, privacy and the usual stuff. And a few select other random things as well.

  • 4 Posts
  • 1.07K Comments
Joined 5 years ago
cake
Cake day: August 21st, 2021

help-circle

  • It’s a broad topic. Everytime I see some new AI-coded project linked in the selfhosted community, it’s kinda shit… I had hallucinated installation instructions. Very overexagerrated claims of what it’s supposed to do… Sometimes it looks okay but some buttons don’t do anything and then I look at the code and everything is more of a stub. Some projects have ridiculous security issues like someone finds a master key buried in the code, and of course none of the “developers” ever noticed because noone ever had a look at the code…

    You’re somewhere in the same territory. Maybe you’re the one who gets it applied properly. But once I’m going to notice the tell-tale signs of vibe-coding, I’m going to start looking at it with the prejudice that got shaped by my prior experience. And I tend to be right most of the times.

    But with that said, I don’t think it’s healthy to have a war over it, ban people and yell at each other. Most I want is transparency. I think all software projects should just disclose if and how they use AI, to what extent. And the users can make up their mind.

    And with cryptography code… Isn’t that a bit dangerous? From my own experience, AI models tend to learn a lot of example code and the standard documentation of libraries… Wikipedia articles and such… And then generate responses closer to that, than completely new thoughts… But(!) all these examples, tutorials and boilerplate code use a lot of shortcuts to explain it in simpler terms. Shortcuts that weaken security. And I wouldn’t be surprised if your AI is then going ahead to reproduce that, and casually forget about the steps to prepare the numbers and follow up on the next steps if that wasn’t ever in the Wikipedia example code. And I’ve seen a lot of wrong advice on StackOverflow and Reddit, so you better hope it also didn’t internalize that. There’s some fairly common myths about security or cryptography details out there. And I never know if your average Claude learned more from Reddit discussions, or from computer science technical literature… And you probably used Claude to skip reading the computer science books as well (and have a really close look at the code), or you probably would have just typed it down yourself. So I’d expect your software to be roughly as sound as newbie code, up to the average of projects that’s out there on GitHub, which your AI has probably learned from. Not any better than that.



  • I think there’s pros and cons to everything. That way would have been less of a dickhead move towards the Forgejo developers. But a big letdown to admins as they don’t know what’s up with the software they’re running on their servers. The way the author chose gives some new intelligence to admins, and they can now act on it, since it’s public knowledge. But it’s annoying to the devs.

    I guess I as a Forgejo user am kinda greatful they did it this way. Now I got to learn the story and can allocate 2h on the weekend to see if my personal Forgejo container is isolated enough and whether the backups still work.

    (But that’s just my opinion after reading one side of the story. Maybe there’s more to the story and they’re being a dick nonetheless…)

    Edit: And regarding just dropping the security team an informal mail… I don’t know if that’s clever. You’d normally either follow some security policy, or don’t engage. Sending them other kinds of mails which violate their policy (an internal carrot) might not be the best choice.





  • Yes. I’ve been somewhat lucky as well. Upgraded my homeserver to 48GB to run a few virtual machines and maxed out my old laptop well before prices skyrocketed. Got to check if I still pay the ~8€ a month for my netcup VPS or if they increased price for existing customers as well…



  • Did you read the Wiki? You need to either pass the compress_extension option when mounting it. The Arch Wiki lists how to enable compression on all text files. And I gave you the version with a ‘*’, which enables compression for all files. Or you do a chattr -R +c ... on specific files or directories to compress them. Maybe you missed that and that’s why it doesn’t compress?!

    There’s probably also a way to debug it and somehow figure out what it does and how many files/sectors got compressed on the filesystem. Linux usually buries that kind of information somewhere in /sys or /proc, or there’s special commands to figure it out. But I’m not really an expert on it.

    And there’s also files which just can not be compressed any further because they’re already compressed. Most images, for example. Or music or ZIP archives. If you try to compress those, they’ll usually stay the same size.









  • The issue with the tools I’ve seen is, they either don’t factor in how language models are trained and datasets are prepared in reality. Or they’re based on some outdated information. I’ve never seen any specific tool backed by science or even with a plausible way of working against current data gathering processes… So for all intents and purposes, they’re a bit more alike homeopathy or alternative medicine. Sure, you’re perfectly fine taking sugar pills, there’s nothing wrong with that. But don’t confuse it with actual science-backed medicine.

    And I mean the poisoning goes even further than that. There’s not just people trying to make a LLM output gibberish. There’s also lots of people with a vested (commercial) interest in sneaking in false information, their political agenda, or even a tire company who wants ChatGPT to say “Company XY” is the most trustworthy shop for new tires for your car. Judging by the public information out there, we’re already way past simple attacks. And the AI companies are aware of it. It’s an ongoing cat and mouse game. And while there’s all these sweatshops, they’ll also use other AI to sift through the data, natural language processing. From what I remember they have secret watermarking in place in a lot of commecial chatbots and image generators… So unless people come up with very clever mechanisms, the “poisoning” attempt will probably be detected with some very basic (fully automated) plausibility checks and they’ll just discard your data without wasting a lot of resources on it.



  • I think a few people already mentioned some good solutions. I just wanted to add: A port forwarding in the firewall of your router is the basically the same thing as a port forwarding on your Linux computer’s firewall. You could just set up any VPN, SSH tunnel or whatever and then use your firewall (nftables, iptables) and forward the VPS’ extetnal port to the internal port on the VPN. It’s the same thing you do on your router, just that you don’t get a graphical interface to configure it.