Ensuring perpetuation of non scene.org files
category: general [glöplog]
Not to be that guy, but I am that guy, so I might as well own up to it.
I am worried about non-mirrored demoscene items in pouet's links. You know, the ones where instead of being downloadable from a scene.org mirror, it's at supergroup.kl/~barhop/demoz/001.ZIP - I see the system notes when they're not accessible, but often there's not a mirror.
I offer to make at least a simple mirror hosted at Internet Archive for these files, if someone wants to work with me to host the collection. It is probably only a tiny bit lost but knowing they are and then reaching out to find copies before they're scattered to the winds would be a nice project.
I'm here or jscott@archive.org, either works.
I am worried about non-mirrored demoscene items in pouet's links. You know, the ones where instead of being downloadable from a scene.org mirror, it's at supergroup.kl/~barhop/demoz/001.ZIP - I see the system notes when they're not accessible, but often there's not a mirror.
I offer to make at least a simple mirror hosted at Internet Archive for these files, if someone wants to work with me to host the collection. It is probably only a tiny bit lost but knowing they are and then reaching out to find copies before they're scattered to the winds would be a nice project.
I'm here or jscott@archive.org, either works.
this is a good msg as I have ensured ModArchive's future beyond my rather soonish EOF. You do great work Mr Scott, and my upcoming demise/tragedy is a precise example of a big scene site hypothetically suddenly going dark with thousands of links from pouet.
SceneSat is taking care of the current and (hopefully) future hosting as well as the organization and implementation of a new ModArchive version TMA Version 5, when that is complete, there could always be an extra backup...
SceneSat is taking care of the current and (hopefully) future hosting as well as the organization and implementation of a new ModArchive version TMA Version 5, when that is complete, there could always be an extra backup...
Just for the heck of it, I mirrored Modarchive at textfiles.com: http://modarchive.textfiles.com/
hey jscott, please have a closer look at
https://pouet-mirror.sesse.net/
and
https://cardboard.pouet.net/broken_links.php?
with these tools in hand, and a bunch of googlin' and pre-emptive backups over several years, we managed to get deadlinks under 1% and keep that number there ever since. so of course extra backups are always appreciated, especially if they're publically accessible, so any willing user can contribute to this ongoing conservation effort.
i'm not sure about working on hosting a collection at IA, first priority should imho be on working on hosting the collection on this very site. if we do that, and IA would be willing to host a mirror not just of the files, but also the metadata- that'd be awesome :)
https://pouet-mirror.sesse.net/
and
https://cardboard.pouet.net/broken_links.php?
with these tools in hand, and a bunch of googlin' and pre-emptive backups over several years, we managed to get deadlinks under 1% and keep that number there ever since. so of course extra backups are always appreciated, especially if they're publically accessible, so any willing user can contribute to this ongoing conservation effort.
i'm not sure about working on hosting a collection at IA, first priority should imho be on working on hosting the collection on this very site. if we do that, and IA would be willing to host a mirror not just of the files, but also the metadata- that'd be awesome :)
First, that's a great endeavor here, and what I was hoping was being done.
What might be easy in a simple way would be to make a .zip file of the contents of pouet-mirror and I can put that on IA as a big drop-dead backup being held by the non-profit, with links to the current endeavors. Obviously it goes out of date from the second it goes up, but that's fine, it's a start.
With the deadlinks, hopefully someone will take the tool as a way to begin the hunt. I will do so as well. We have a tool for searching many of the CD-ROMs on archive.org, at https://discmaster.textfiles.com, which can help here and there for finding "lost" material.
What might be easy in a simple way would be to make a .zip file of the contents of pouet-mirror and I can put that on IA as a big drop-dead backup being held by the non-profit, with links to the current endeavors. Obviously it goes out of date from the second it goes up, but that's fine, it's a start.
With the deadlinks, hopefully someone will take the tool as a way to begin the hunt. I will do so as well. We have a tool for searching many of the CD-ROMs on archive.org, at https://discmaster.textfiles.com, which can help here and there for finding "lost" material.
for that zip with all files from the mirror, i would recommend you to speak to sesse. his address is on the mirror site i linked to, he answers emails, and is generally supportive of other people and entities who are serious about preservation- so i'm pretty d. sure something can be worked out :)
about deadlink hunting, once upon a time the number was close to 30%. after we got it down from there to under 1%, progress has been slow, every few months i used to go in and weeded out the recently lost files that sesse's mirror had caught. i'll get back to that when i'm physically more fit in a while i hope. until then, some unsung heroes who know who they are taking care of business in their own ways. but more is always better of course, different perspectives and levels of experience in the preservation field have almost always reinforced the overall effort.
discmaster- i tried to master it before but never really managed to do that. maybe i should give it another swing, again most likely this will happen after a bunch more recovery has taken place. i did find a couple dozen "lost" prods by strategic browsing of IA, like for example the gba/gbc files i grabbed from some brazilian gaming coverdisks :)
last, not sure if you were maybe already aware- we mentioned IA and even you personally being on both sides of the fence in our application document for demoscene to be added to the list of officially recognized cultural heritage in the netherlands. the researchers who helped us writing the application obviously knew IA and liked that mention, which i thought was cool. our application was granted, and hopefully soon will be followed up by another application, this time at international level, together with the teams from other countries where demoscene is now recognized as cultural heritage. ultimately the goal for this combined effort would be to get our mutual hobby recognized by unesco. i read that the usa not so long ago decided to rejoin that organisation, so i urge you to please have a look at the AoC effort, and perhaps even join the discord where we discuss this mutual effort. the site for AoC is here: https://demoscene-the-art-of-coding.net/
about deadlink hunting, once upon a time the number was close to 30%. after we got it down from there to under 1%, progress has been slow, every few months i used to go in and weeded out the recently lost files that sesse's mirror had caught. i'll get back to that when i'm physically more fit in a while i hope. until then, some unsung heroes who know who they are taking care of business in their own ways. but more is always better of course, different perspectives and levels of experience in the preservation field have almost always reinforced the overall effort.
discmaster- i tried to master it before but never really managed to do that. maybe i should give it another swing, again most likely this will happen after a bunch more recovery has taken place. i did find a couple dozen "lost" prods by strategic browsing of IA, like for example the gba/gbc files i grabbed from some brazilian gaming coverdisks :)
last, not sure if you were maybe already aware- we mentioned IA and even you personally being on both sides of the fence in our application document for demoscene to be added to the list of officially recognized cultural heritage in the netherlands. the researchers who helped us writing the application obviously knew IA and liked that mention, which i thought was cool. our application was granted, and hopefully soon will be followed up by another application, this time at international level, together with the teams from other countries where demoscene is now recognized as cultural heritage. ultimately the goal for this combined effort would be to get our mutual hobby recognized by unesco. i read that the usa not so long ago decided to rejoin that organisation, so i urge you to please have a look at the AoC effort, and perhaps even join the discord where we discuss this mutual effort. the site for AoC is here: https://demoscene-the-art-of-coding.net/
FWIW, pouet-mirror.sesse.net is available by rsync, so it should be easy to mirror. (I know that at least one person already takes external backups of it.)
The hard part is, of course, crawling. It's easy to think you've mirrored something and then it was a 404 page served with 200, or a video player instead of the actual video.
The hard part is, of course, crawling. It's easy to think you've mirrored something and then it was a 404 page served with 200, or a video player instead of the actual video.
Hey m0d. This is off topic, but it sounds like you've had a bad diagnosis or something? I was not aware... I'm hoping for a miracle, but in any event I'm hoping for the best possible outcome for you 🫂
@jscott: Just wanted to say thank you for bringin the DiscMaster search back, one year ago it was absolutely invaluable in finding old ModPlug Tracker / Player versions that have been lost until now... the earliest archived MPT release is now MPT 1.0 alpha 4, just one week newer than the still-missing alpha 1 which I'm still looking for.
I'm delighted to see the ease of RSYNC for mirroring these. I've gone ahead and started doing that for Gathering and the Pouet-Mirror.
I've decided not to take the 18tb of Chess Tables at this time. :)
I've decided not to take the 18tb of Chess Tables at this time. :)
Note that ftp.gathering.org contains a mirror of scene.org, so if you already have that, you may want to ignore the scene.org directory :-)
I'll refine but err on the side of both for now - but thanks for the tip.
OK, I just re-started my scene.org mirror (I try to dry-run and not automatically do deletes) and it's catching up, and you're correct, I already have a mirror going, so that might as well not be duped. Thank for the tips!
What set me down this path this week was my adding a new range of emulated-in-browser console demos here, by the way:
https://archive.org/details/consoledemos
What set me down this path this week was my adding a new range of emulated-in-browser console demos here, by the way:
https://archive.org/details/consoledemos
Yeah, ftp.gathering.org is 2.7TB, but the scene.org mirror is 2.4TB of it :-)
I really wish I started mirroring Pouët twenty years earlier. I largely had the means, I just didn't think of the problem. We could have saved so much material that is now irrevocably lost. (For that matter, Pouët could also just have taken an internal copy of whatever URL was added, even if it didn't serve it. But that also obviously didn't happen.)
I really wish I started mirroring Pouët twenty years earlier. I largely had the means, I just didn't think of the problem. We could have saved so much material that is now irrevocably lost. (For that matter, Pouët could also just have taken an internal copy of whatever URL was added, even if it didn't serve it. But that also obviously didn't happen.)
to be fair though, having spent more time than any other individual on searching for missing files afaik, what has gone missing and hasn't been recovered yet at this moment is largely bottom of the barrel crap. i'm already working on a list of prods that i believe can and/or should be deleted. in some cases it's questionable if the prod is really a prod at all, or just an image grabbed from the internet with a link that has literally never worked to download anything of substance, in others it's "zero bytes hahaha" junk "prods", and there's a few more categories on that list which i personally wouldn't feel any remorse about when they'd be promoted to a proverbial eternal dustbin.
so that'd leave us with ~0.3-0.5% of "real lost prods". from what i understood from conversations with professionals from the archiving universe, that's ridiculously low. so let's not feel too bad about it ;)
so that'd leave us with ~0.3-0.5% of "real lost prods". from what i understood from conversations with professionals from the archiving universe, that's ridiculously low. so let's not feel too bad about it ;)
Well, also include all the stuff that was never uploaded to Pouët to begin with :-) For instance, I know for a fact that a lot of the TG stuff has been lost (entire compos basically were never even uploaded some years), and while of course this was not AAA stuff, it was how a typical compo was at the time. If you care about watching good demos, of course not much has been lost, but if you want to keep the history alive, the story is a bit different.
Also, at some point, I did a re-crawl of pouet-mirror, and a surprising amount of files had changed contents (people tend to just update their archives silently). Which I also find interesting, although I don't really know what to do about it.
Also, at some point, I did a re-crawl of pouet-mirror, and a surprising amount of files had changed contents (people tend to just update their archives silently). Which I also find interesting, although I don't really know what to do about it.
i appreciate your argument but personally i can't afford to care about non-uploaded stuff, if people think their stuff is too good, not good enough or otherwise doesn't belong here, that's their own problem and beyond the scope of my caring ability
as for changed files, i see it like partyversion vs finals, generally speaking i'll first check out the final if it exists, and if that works OK the partyversion is mostly irrelevant imho. but again, that's just my personal opinion, if others want to delve deeper, that's A-OK obviously
as for changed files, i see it like partyversion vs finals, generally speaking i'll first check out the final if it exists, and if that works OK the partyversion is mostly irrelevant imho. but again, that's just my personal opinion, if others want to delve deeper, that's A-OK obviously
I'm not sure if that's really what's happened; most of this was probably never consciously not uploaded to Pouët, it just wasn't something people did on autopilot. Pouët hasn't always had the same status as it had today, and I can't fault people who entered a compo in 2005 (or even 1995, before Pouët even existed!) for nobody having uploaded their stuff if even the compo organizers didn't manage to keep track of it at the time. And given that a lot of demoscene productions have been made by people who were only tangentially related to the demoscene (this especially goes for music, graphics and wild/video entries from before the demoscene decided to go all-cocoon and only care about demoscene-only parties), it's not surprising that it never occurred to them to upload it themselves much later. If they even still had the prod at the time.
mind you, i wasn't trying to explain why any particular (group of) prod(s) hasn't been added to this site, i was just giving my personal take on stuff that isn't here. in my mind, the matters you're speaking about are more scene.org than pouet.net related. (which is not to say your argument is wrong or doesn't belong in this thread, it just falls outside the scope of my caring, is all i'm trying to say.)
Being someone who loves fixing links of broken prods and finding them if they are lost, like a paleontologist (the mod team can testify about that), over the years, I have some remarks about all this.
*First, I agree (on yet another rare occasion) with havoc that most of the broken links are a very low percentage compared to the entire number of prods that exist on pouet.
*Second, on any database there will be lost prods, offline archiving them would help us immensely. For instance, every time I download a prod, I then drop the .zip/.rar, etc. to a folder named "Archive" on my hard drive for future reference if I ever have to reupload it. Now, imagine if every demoscener did just that, how quickly all the missing prods would be found.
*Third, if or when a site is about to close it is better to move everything somewhere safe before taking everything down. I used to have an archive with lost demoscene prods, which I had to take down for a personal reason. I moved everything to scene.org beforehand so that it won't become lost.
*Forth, you should never ever take any online resource for granted. Everything can be lost or taken down, so back up everything when you can / while you can.
That's all I can think about the subject right now, hope I shed some light to prod finding/link fixing/archiving. :)
*First, I agree (on yet another rare occasion) with havoc that most of the broken links are a very low percentage compared to the entire number of prods that exist on pouet.
*Second, on any database there will be lost prods, offline archiving them would help us immensely. For instance, every time I download a prod, I then drop the .zip/.rar, etc. to a folder named "Archive" on my hard drive for future reference if I ever have to reupload it. Now, imagine if every demoscener did just that, how quickly all the missing prods would be found.
*Third, if or when a site is about to close it is better to move everything somewhere safe before taking everything down. I used to have an archive with lost demoscene prods, which I had to take down for a personal reason. I moved everything to scene.org beforehand so that it won't become lost.
*Forth, you should never ever take any online resource for granted. Everything can be lost or taken down, so back up everything when you can / while you can.
That's all I can think about the subject right now, hope I shed some light to prod finding/link fixing/archiving. :)
Quote:
tangentially related to the demoscene (this especially goes for music
Quote:
the matters you're speaking about are more scene.org
I don't think even scene.org cares much about tangentially related works. Lots of netlabel stuff that used to be there is missing.
absence, i can't speak for scene.org at all but have heard rumours through the grapevine that "missing" netlabel stuff for the largest part is/was caused by authors of those materials later on claiming copyrights on those works. so it's not really missing, the authors (or promotors) preferred to not have their works on scene.org. which is fair enough. but at the same time also undermines your validity when you claim stuff to be missing (no offense meant!).
Quote:
Lots of netlabel stuff that used to be there is missing.
what exactly are you refering to?
i have noticed one or other occasional release having been removed for copyright or whatever motive. but i wouldn't qualify it as "lots". but maybe i haven't been paying that much attention to the archive at scene.org lately
Quote:
what exactly are you refering to?
Upon reviewing, I'm happy to say that the situation is less bleak than I remembered (perhaps some of the missing data has been uploaded again). Still, the entire Noise catalogue is missing, as well as parts of the Tokyo Dawn Records catalogue.
Quote:
caused by authors of those materials later on claiming copyrights on those works
That's an interesting can of worms. One could argue that once something is released with a licence like this, sites are free to continue distribution regardless of what the authors think:
Quote:
NOISE songs are not freeware. You are allowed to copy the music module files without restrictions for non-commercial use, provided that all files contained in the original ZIP archives are included without modifications.
But depending on how insistent said authors (or authors' record companies' lawyers) are, one might have to make that argument in court, which is understandably something to avoid. There's also something to be said for adhering to an author's wishes, and I don't know the story behind the removals.
Quote:
undermines your validity when you claim stuff to be missing (no offense meant!).
Fair point, I didn't consider the possibility of legal headaches.