• 1 Post
  • 259 Comments
Joined 3 years ago
cake
Cake day: July 2nd, 2023

help-circle
  • Firstly, I wish you the best of luck in your community’s journey away from Discord. This may be a good time to assess what your community needs from a new platform, since Discord targeted various use-cases that no single replacement platform can hope to replace in full. Instead, by identifying exactly what your group needs and doesn’t need, that will steer you in the right direction.

    As for Element, bear in mind that their community and paid versions do not exactly target a hobbyist self-hosting clientele. Instead, Element is apparently geared more for enterprise on-premises deployment (like Slack, Atlassian JIRA, Asterisk PBX) and that’s probably why the community version is also based on Kubernetes. This doesn’t mean you can’t use it, but their assumptions about deployments are that you have an on-premises cloud.

    Fortunately, there are other Matrix homeservers available, including one written in Rust that has both bare metal and Docker deployment instructions. Note that I’m not endorsing this implementation, but only know of it through this FOSDEM talk describing how they dealt with malicious actors.

    As an aside, I have briefly considered Matrix before as a group communications platform, but was put off by their poor E2EE decisions, for both the main client implementation and in the protocol itself. Odd as it sounds, poor encryption is worse than no encryption, because of the false assurance it gives. If I did use Matrix, I would not enable E2EE because it doesn’t offer me many privacy guarantees, compared to say, Signal.


  • My Ecobee thermostat – which is reasonably usable without an Internet connection – has one horrific flaw: the built in clock seems to drift by a minute per month, leading to my programmed schedules shifting ever so slightly.

    I could have it connected to a dedicated IoT SSID and live in a VLAN jail so that it only has access to my NTP server… or I just change the time manually every six months as part of DST.


  • I don’t currently have any sort of notebook. Instead, for general notes, I prefer A3-sized loose sheets of paper, since I don’t really want to use double the table surface to have both verso and recto in front of me, I don’t like writing on spiral or perfect bound notebooks, and I already catalog my papers into 3-ring binders.

    if I’m debugging something, and I’m putting silly print statements to quickly troubleshoot, should I document that?

    My read of the linked post is that each discrete action need not be recorded, but rather the thought process that leads to a series of action. Rather than “added a printf() in constructor”, the overall thrust of that line of investigation might be “checking the constructor for signs of malformed input parameters”.

    I don’t disagree with the practice of “printf debugging”, but unless you’re adding a printf between every single operative line in a library, there’s always going to be some internal thought that goes into where a print statement is placed, based on certain assumptions and along a specific line of inquiry. Having a record of your thoughts is, I think, the point that the author is making.

    That said, in lieu of a formal notebook, I do make frequent Git commits and fill in the commit message with my thoughts, at every important juncture (eg before compiling, right before logging off or going to lunch).





  • Admittedly, I haven’t finished reflashing my formerly-Meshtastic LoRA radios with MeshCore yet, so I haven’t been able to play around with it yet. Although both mesh technologies are decent sized near me, I was swayed to MeshCore because I started looking into how the mesh algorithm works for both. No extra license, since MeshCore supports roughly the same hardware as Meshtastic.

    And what I learned – esp from following the #meshtastic and #meshcore hashtags on Mastodon – is that Meshtastic has some awful flooding behavior to send messages. Having worked in computer networks, this is a recipe for limiting the max size and performance of the mesh. Whereas MeshCore has a more sensible routing protocol for passing messages along.

    My opinion is that mesh networking’s most important use-case should be reliability, since when everything else (eg fibre, cellular, landlines) stops working, people should be able to self organize and build a working communications system. This includes scenarios where people are sparsely spaced (eg hurricane disaster with people on rooftops awaiting rescue) but also extremely dense scenarios (eg a protest where the authorities intentionally shut off phone towers, or a Taylor Swift concert where data networks are completely congested). Meshtastic’s flooding would struggle in the latter scenario, to send a distress message away from the immediate vicinity. Whereas MeshCore would at least try to intelligently route through nodes that didn’t already receive the initial message.


  • I personally started learning microcontrollers using an Arduino dev kit, and then progressed towards compiling the code myself using GCC and loading it directly to the Atmel 328p (the microcontroller from the original Arduino dev kits).

    But nowadays, I would recommend the MSP430 dev kit (which has excellent documentation for its peripherals) or the STM32 dev kit (because it uses the ARM32 architecture, which is very popular in the embedded hardware industry, so would look good on your resume).

    Regarding userspace drivers, because these are outside of the kernel, such drivers are not kept in the repositories for the kernel. You won’t find any userspace drivers in the Linux or FreeBSD repos. Instead, such drivers are kept in their own repos, maintained separately, and often does unusual things that the kernel folks don’t want to maintain, until there is enough interest. For example, if you’ve developed an unproven VPN tunnel similar to Wireguard, you might face resistance to getting that into the Linux kernel. But you could write a userspace driver that implements your VPN tunnel, and others can use that driver without changing their kernel. If it gets popular enough, other developers might put the effort into getting it reimplemented as a mainline kernel driver.

    For userspace driver development, a VM running the specific OS is fine. For kernel driver development, I prefer to run the OS within QEMU, since that allows me to attach a debugger to the VM’s “hardware”, letting me do things like adding breakpoints wirhin my kernel driver.


  • Very interesting! Im no longer pursuing Meshtastic – I’m changing over my hardware to run MeshCore now – but this is quite a neat thing you’ve done here.

    As an aside, if you later want to have full networking connectivity (Layer 2) using the same style of encoding the data as messages, PPP is what could do that. If transported over Meshtastic, PPP could give you a standard IP network, and on top of that, you could use SSH to securely access your remote machine.

    It would probably be very slow, but PPP was also used for dial-up so it’s very accommodating. The limiting factor would be whether the Meshtastic local mesh would be jammed up from so many messages.


  • This answer is going to go in multiple directions.

    If you’re looking for practice on using C to implement ways to talk to devices and peripherals, the other commenter’s suggested to start with an SBC (eg Raspberry Pi, Orange Pi) or with a microcontroller dev kit (eg Arduino, MSP430, STM32) is spot-on. That gives you a bunch of attached peripherals, the datasheet that documents the register behavior, and so you can then write your own C functions that fill in and read those registers. In actual projects, you would probably use the provided libraries that already do this, but there is educational value in trying it yourself.

    However, just because you write a C function named “put_char_uart0()”, that isn’t enough to prepare for writing full-fledged drivers, such as those in the Linux and FreeBSD kernel. This next step is more about software design, where you structure your C code so that rather than being very hardware-specific (eg for the exact UART peripheral in your microcontroller) you have code which works for a more generic UART (which abstracts general details) but is common-code to all the UARTs made by the same manufacturer. This is about creating reusable code, about creating abstraction layers, and about writing extensible code. Not all code can be reusable, not every abstraction layer is desirable, and you don’t necessarily want to make your code super extensive if it starts to impact your core requirements. Good driver design means you don’t ever paint yourself into a corner, and the best way to learn how to avoid this is through sheer experience.

    For when you do want to write a full-and-proper driver for any particular peripheral – maybe one day you’ll create one such device, such as by using an FPGA attached via PCIe to a desktop computer – then you’ll need to work within an existing driver framework. Linux and FreeBSD drivers use a framework so that all drivers have access to what they need (system memory, I/O, helper functions, threads, etc), and then it’s up to the driver author to implement the specific behavior (known in software engineering as “business logic”). It is a learned skill – also through experience – to work within the Linux or FreeBSD kernels. So much so that both kernels have gone through great lengths to enable userspace drivers, meaning the business logic runs as a normal program on the computer, saving the developer from having to learn the strange ways of kernel development.

    And it’s not like user space drivers are “cheating” in any way: they’re simply another framework to write a device driver, and it’s incumbent on the software engineer to learn when a kernel or user space driver is more appropriate for a given situation. I have seen kernel drivers used for sheer computational performance, but have also seen userspace drivers that were developed because nobody on that team was comfortable with kernel debugging. Those are entirely valid reasons, and software engineering is very much about selecting the right tool from a large toolbox.



  • I’ll take a stab at the question. But I’ll need to lay some foundational background information.

    When an adversarial network is blocking connections to the Signal servers, the Signal app will not function. Outbound messages will still be encrypted, but they can’t be delivered to their intended destination. The remedy is to use a proxy, which is a server that isn’t blocked by the adversarial network and which will act as a relay, forwarding all packets to the Signal servers. The proxy cannot decrypt any of the messages, and a malicious proxy is no worse than blocking access to the Signal servers directly. A Signal proxy specifically forwards only to/from the Signal servers; this is not an open proxy.

    The Signal TLS Proxy repo contains a Docker Compose file, which will launch Nginx as a reverse proxy. When a Signal app connects to the proxy at port 80 or 443, the proxy will – in the background – open a connection to the Signal servers. That’s basically all it does. They ostensibly wrote the proxy as a Docker Compose file, because that’s fairly easy to set up for most people.

    But now, in your situation, you already have a reverse proxy for your selfhosting stack. While you could run Signal’s reverse proxy in the background and then have your main reverse proxy forward to that one, it would make more sense to configure your main reverse proxy to directly do what the Signal reverse proxy would do.

    That is, when your main proxy sees one of the dozen subdomains for the Signal server, it should perform reverse proxying to those subdomains. Normally, for the rest of your self hosting arrangement, the reverse proxy would target some container that is running on your LAN. But in this specific case, the target is actually out on the public Internet. So the original connection comes in from the Internet, and the target is somewhere out there too. Your reverse proxy simply is a relay station.

    There is nothing particularly special about Signal choosing to use Nginx in reverse proxy mode, in that repo. But it happens to be that you are already using Nginx Proxy Manager. So it’s reasonable to try porting Signal’s configuration file so that it runs natively with your Nginx Proxy Manager.

    What happens if Signal updates that repo to include a new subdomain? Well, you wouldn’t receive that update unless you specifically check for it. And then update your proxy configuration. So that’s one downside.

    But seeing as the Signal app demands port 80 and 443, and you already use those ports for your reverse proxy, there is no way to avoid programming your reverse proxy to know the dozen subdomains. Your main reverse proxy cannot send the packets to the Signal reverse proxy if your main proxy cannot even identify that traffic.



  • There can be, although some parts may still need to be written in assembly (which is imperative, because that’s ultimately what most CPUs do), for parts like a kernel’s context switching logic. But C has similar restrictions, like how it is impossible to start a C function without initializing the stack. Exception: some CPUs (eg Cortex M) have a specialized mechanism to initialize the stack.

    As for why C, it’s a low-level language that maps well to most CPU’s native assembly language. If instead we had stack-based CPUs – eg Lisp Machines or a real Java Machine – then we’d probably be using other languages to write an OS for those systems.


  • The other commenters correctly opined that encryption at rest should mean you could avoid encryption in memory.

    But I wanted to expand on this:

    I really don’t see a way around this, to make the string searchable the hashing needs to be predictable.

    I mean, there are probabilistic data structures, where something like a Bloom filter will produce one of two answers: definitely in the set, or possibly in the set. In the context of search tokens, if you had a Bloom filter, you could quickly assess if a message does not contain a search keyword, or if it might contain the keyword.

    A suitably sized Bloom filter – possibly different lengths based on the associated message size – would provide search coverage for that message, at least until you have to actually access and decrypt the message to fully search it. But it’s certainly a valid technique to get a quick, cursory result.

    Though I think perhaps just having the messages in memory unencrypted would be easier, so long as that’s not part of the attack space.


  • I have a UniFi EdgeRouter (old, and I’m looking into replacing it with a FreeBSD box) and I have a similar issue where the router – but maybe the ISP? – misses a DHCP renewal, resulting in the wholesale loss of connectivity. It’s even more annoying because the ISP simultaneously rejects follow-up DHCP requests, on the theory that if the renewal was missed, the device cannot possibly exist anymore, at least for a few minutes.

    Since this router takes 12 minutes to manually reboot, that’s usually enough time for the ISP to clear their cache and everything comes back up properly. But it’s terribly annoying, hence why I’m looking to finally replace this router.


  • Upvoting because the FAQ genuinely is worthwhile to read, and answers the question I had in mind:

    7.9 Why not just use a subset of HTTP and HTML?

    I don’t agree with their answer though, since if the rough, overall Gemini experience:

    is roughly equivalent to HTTP where the only request method is “GET”, the only request header is “Host” and the only response header is “Content-type”, plus HTML where the only tags are <p>, <pre>, <a>, <h1> through <h3>, <ul> and <li> and <blockquote>

    Then it stands to reason – per https://xkcd.com/927/ – to do exactly that, rather than devise new protocol, client, and server software. Instead, some of their points have few or no legs to stand on.

    The problem is that deciding upon a strictly limited subset of HTTP and HTML, slapping a label on it and calling it a day would do almost nothing to create a clearly demarcated space where people can go to consume only that kind of content in only that kind of way.

    Initially, my reply was going to make a comparison to the impossibility of judging a book by its cover, since that’s what users already do when faced with visiting a sketchy looking URL. But I actually think their assertion is a strawman, because no one has suggested that we should immediately stop right after such a protocol has been decided. Very clearly, the Gemini project also has client software, to go with their protocol.

    But the challenge of identifying a space is, quite frankly, still a problem with no general solution. Yes, sure, here on the Fediverse, we also have the ActivityPub protocol which necessarily constrains what interactions can exist, in the same way that ATProto also constrains what can exist. But even the most set-in-stone protocol (eg DICT) can be used in new and interesting ways, so I find it deeply flawed that they believe they have categorically enumerated all possible ways to use the Gemini protocol. The implication is that users will never be surprised in future about what the protocol enables, and that just sounds ahistoric.

    It’s very tedious to verify that a website claiming to use only the subset actually does, as many of the features we want to avoid are invisible (but not harmless!) to the user.

    I’m failing how to see how this pans out, because seeing as the web is predominantly client-side (barring server side tracking of IP address, etc), it should be fairly obvious when a non-subset website is doing something that the subset protocol does not allow. Even if it’s a lay-in-wait function, why would a subset-compliant client software honor that?

    When it becomes obvious that a website is not compliant with the subset, a well-behaved client should stop interacting with the website, because it has violated the protocol and cannot be trusted going forward. Add it to an internal list of do-not-connect and inform the user.

    It’s difficult or even impossible to deactivate support for all the unwanted features in mainstream browsers, so if somebody breaks the rules you’ll pay the consequences.

    And yet, Firefox forks are spawning left and right due to Mozilla’s AI ambitions.

    Ok, that’s a bit blithe, but I do recognize that the web engines within browsers are now incredibly complex. Even still though, the idea that we cannot extricate the unneeded sections of a rendering engine and leave behind the functionality needed to display a subset of HTML via HTTP, I just can’t accept that until someone shows why that is the case.

    Complexity begats complexity, whereas this would be an exercise in removing complexity. It should be easier than writing new code for a new protocol.

    Writing a dumbed down web browser which gracefully ignores all the unwanted features is much harder than writing a Gemini client from scratch.

    Once again, don’t do that! If a subset browser finds even one violation of the subset protocol, it should halt. That server is being malicious. Why would any client try to continue?

    The error handling of a privacy-respecting protocol that is a subset of HTML and HTTP would – in almost all cases – assume the server is malicious, and to disconnect. It is a betrayal of the highest order. There is no such thing as a “graceful” betrayal, so we don’t try to handle that situation.

    Even if you did it, you’d have a very difficult time discovering the minuscule fraction of websites it could render.

    Is this about using the subset browser to look at regular port-80 web servers? Or is this about content discovery? Only the latter has a semblance of logic behind it, but that too is an unsolved problem to this day.

    Famously, YouTube and Spotify are drivers of content discovery, based in part due to algorithms that optimize for keeping users on those platforms. Whereas the Fediverse eschews centralized algorithms and instead just doesn’t have one. And in spite of that, people find communities. They find people, hashtags, images, and media. Is it probably slower than if an algorithm could find these for the user’s convenience? Yes, very likely.

    But that’s the rub: no one knows what they don’t know. They cannot discover what they don’t even imagine could exist. That remains the case, whether the Gemini protocol is there or not. So I’m still not seeing why this is a disadvantage against an HTTP/HTML subset.

    Alternative, simple-by-design protocols like Gopher and Gemini create alternative, simple-by-design spaces with obvious boundaries and hard restrictions.

    ActivityPub does the same, but is constructed atop HTTP, while being extensible to like-for-like replace any existing social media platform that exists today – and some we haven’t even thought of yet – while also creating hard and obvious boundaries which forment a unique community unlike any other social media platform.

    The assertion that only simple protocols can foster community spaces is belied by ActivityPub’s success; ActivityPub is not exactly a simple protocol either. And this does not address why stripping down HTML/HTTP wouldn’t also do the same.

    You can do all this with a client you wrote yourself, so you know you can trust it.

    I sure as heck do not trust the TFTP client I wrote at uni, and that didn’t even have an encryption layer. The idea that every user will write their own encryption layer to implement the mandatory encryption for Gemini protocol is farcical.

    It’s a very different, much more liberating and much more empowering experience than trying to carve out a tiny, invisible sub-sub-sub-sub-space of the web.

    So too would browsing a sunset of HTML/HTTP using a browser that only implements that subset. We know this because if your reading this right now, you’re either viewing this comment through a web browser frontend for Lemmy, or using an ActivityPub client of some description. And it is liberating! Here we all are, on this sub sub sub sub space of the Internet, hanging out and commenting about protocols and design.

    But that doesn’t mean we can’t adapt already-proven, well-defined protocols into a subset that matches an earlier vision of the internet, while achieving the same.


  • I’m going off what I remember from a decade ago when working on embedded CPUs that have an Ethernet interface. IIRC, the activity LED – whether a separate LED than the link LED, or combined as a single LED – is typically wired to the PHY (the chip which converts analog signals on the wire/fibre into logical bits), as part of its transceiver functions. But some transceivers use a mechanism separate from the typical interface (eg SGMII) to the MAC (the chip which understands Ethernet frames; may be integrated into the PHY, or integrated into the CPU SoC). That auxiliary interface would allow the MAC to dictate what the LED should indicate.

    In either case, there isn’t really a prescribed algorithm for what level of activity should warrant faster blinking, and certainly not any de facto standard between switch and NIC manufacturers. But generally, there will be something like 4 different “speeds” of blinking, based on whatever criteria the designers chose to use


  • The full-blown solution would be to have your own recursive DNS server on your local network, and to block or redirect any other DNS server to your own, and possibly blocking all know DoH servers.

    This would solve the DNS leakage issue, since your recursive server would learn the authoritative NS for your domain, and so would contact that NS directly when processing any queries for any of your subdomains. This cuts out the possibility of any espionage by your ISP/Google/Quad9’s DNS servers, because they’re now uninvolved. That said, your ISP could still spy in the raw traffic to the authoritative NS, but from your experiment, they don’t seem to be doing that.

    Is a recursive DNS server at home a tad extreme? I used to think so, but we now have people running Pi-hole and similar software, which can run in recursive mode (being built atop Unbound, the DNS server software).

    /<minor nitpick>

    “It was DNS” typically means that name resolution failed or did not propagate per its specification. Whereas I’m of the opinion that if DNS is working as expected, then it’s hard to pin the blame on DNS. For example, forgetting to renew a domain is not a DNS problem. And setting a bad TTL or a bad record is not a DNS problem (but may be a problem with your DNS software). And so too do I think that DNS leakage is not a DNS problem, because the protocol itself is functioning as documented.

    It’s just that the operators of the upstream servers see dollar-signs by selling their user’s data. Not DNS, but rather a capitalism problem, IMO.

    /</minor nitpick>


  • I loaded True Nas onto the internal SSD and swapped out the HDD drive that came with it for a 10tb drive.

    Do I understand that you currently have a SATA SSD and a 10TB SATA HDD plugged into this machine?

    If so, it seems like a SATA power splitter that divides the power to the SSD would suffice, in spite of the computer store’s admonition. The reason for splitting power from the SSD is because an SSD draws much less power than spinning rust.

    Can it still go wrong? Yes, but that’s the inherent risk when pushing beyond the design criteria of what this machine was originally built for. That said, “going wrong” typically means “won’t turn on”, not “halt and catch fire”.