It would be a shame to lose the wealth of knowledge with easy-ish search that subreddits like datahoarder provide if the subreddit is taken down or stays locked forever. Sure it is currently accessible, but will it stay that way?
I know it is being archived, but the accessibility part is the problem.
There’s actually already a Reddit to Lemmy importer that lets you bring threads including comments https://github.com/rileynull/RedditLemmyImporter
Yeah, that appears to be what I had in mind. Good find!
I have wondered is there an easy way to perform search through wayback machine for archived reddit data?
And for comments people back up to csv with stuff like power suite delete is there a nice way that displays them as opposed to excel?
It could be done, but that really isn’t the best possible solution in my opinion. What I was thinking was having a bot migrate all the comments and posts here (or another instance). So the bot would take all the names of the users and replace them with the bot’s names (instead of trying to create new users on lemmy) and put the old usernames in their comment. Like “Bread commented” and their comment. So we know who said it still.
If the bot maker had control of the instance, we probably might be able to put everything in chronological order by timestamp. So it would look like the comments were all made here orginally. The only indicator it wasn’t would be the bot name as the username. So search algorithms would be able to search it just like reddit.
I believe the best way to archive a forum style website, would be on a forum where things have one to one equals.
As for moving Datahoarder to a new instance, that sure would make backups a lot nicer if a datahoarder ran it. I am surprised that it isn’t on its own already considering the topic. Same thing with self-hosted.
I love this idea. It raises some issues to think about, too. Like, who “owns” that data? Would Reddit file a lawsuit against the Lemmy instance arguing that the data belongs to Reddit? Does the data belong to the users who posted? What TOS do we agree to when signing up for a Reddit account? Are we giving them ownership of all content we post?
If you’re just interested in searching:
http://redarc.basedbin.org/search
/r/datahoarder is indexed and searchable
That is good to know that exists, thanks! Although I still personally believe it being in a forum like lemmy is the best way to preserve it in its original format.
The-eye has a nice archive of Reddit: https://the-eye.eu/redarcs/