this post was submitted on 19 Oct 2023
86 points (100.0% liked)

Technology

59269 readers
4068 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


founded 1 year ago
MODERATORS
 

Related:

Major cyber attack could cost the world $3.5 trillion - Power Grid, Internet Outage

The one database/file/zip to save humanity, what is it?

Show Lemmy the downloadable URL of a Database or AI you know of so we can have a local backup copy that will improve the resilience and availability of Human Knowledge.

Given the state of AI being Corporatized I think we could definitely use links for whatever comes closest to a fully usable Open Source, fully self-contained downloadable AI.

Starter Pack:

★ Lemmy List

Databases

AI

top 11 comments
sorted by: hot top controversial new old
[–] sir_reginald@lemmy.world 25 points 1 year ago* (last edited 1 year ago) (2 children)

This is too much catastrophism for my taste, but If I wanted to start archiving, I'll start by downloading Wikipedia, The Library Genesis and the Gutenberg Project.

Videos are too heavy to archive with ease, and they are probably of much less value of actual knowledge.

[–] fubo@lemmy.world 7 points 1 year ago (2 children)

Humanity has been using writing for millennia. It's a proven technology. Photographs and video don't tend to last longer than the one institution or family that cares about them.

[–] fiat_lux@kbin.social 2 points 1 year ago

Mostly due to previous physical constraints, I would argue. Thankfully there are fewer chances your hard drive is going to decompose into vinegar while sitting in your cupboard, and even if it does, it's likely not the only copy.

They're also more limited for current data because they're harder to parse and convert into other usable formats, but thankfully that will get better over time too.

I still preference text-first data for various reasons, but let's not dismiss the leagues of potential video has for communication and archival value, both intentional and unintentional.

[–] Taleya@aussie.zone 2 points 1 year ago

Plus writing dgaf if you get hit with a carrington event

[–] fiat_lux@kbin.social 5 points 1 year ago

Perhaps think of it more as knowledge decentralization as a form of resiliency for unplanned network outages. Sometimes the library of Alexandria just happens to catch fire, and it might be nobody's fault at all.

Besides, plenty of people grew up in families with a basic encyclopaedia or dictionary or a repair manual. This is essentially the same thing, just with less paper.

[–] elias_griffin@lemmy.world 6 points 1 year ago

I'm particulary looking for anyone that already has a collection of Arxiv and Sci-Hub papers. Please curate your collection and make it available here!

We also need a hashtag/topic/keyword for this project that is brief and catchy we can also use for a GitHub search, etc. Anyone?

[–] nix@merv.news 3 points 1 year ago (2 children)

Is it possible to download an archive of scihub?

[–] PeachMan@lemmy.world 9 points 1 year ago (2 children)

Sci-Hub is ENORMOUS, about 100TB. If you want to help preserve it, you can torrent and seed one of their many 100GB chunks.

[–] BolexForSoup@kbin.social 1 points 1 year ago

Super cool never knew about this. I got probably 1-2tb I can spare for the effort.

[–] elias_griffin@lemmy.world 1 points 1 year ago

What a fantastic resource, this is exactly what is needed. I also found about The Standard Template Construct Library:

"Learn about how to access large corpus of high-quality scholarly texts using Python and use them in AI apps"

Does anyone know if a LLM has been trained on something like scihub?