this post was submitted on 18 Nov 2023
1 points (100.0% liked)

Data Hoarder

1 readers
1 users here now

We are digital librarians. Among us are represented the various reasons to keep data -- legal requirements, competitive requirements, uncertainty of permanence of cloud services, distaste for transmitting your data externally (e.g. government or corporate espionage), cultural and familial archivists, internet collapse preppers, and people who do it themselves so they're sure it's done right. Everyone has their reasons for curating the data they have decided to keep (either forever or For A Damn Long Time (tm) ). Along the way we have sought out like-minded individuals to exchange strategies, war stories, and cautionary tales of failures.

founded 11 months ago
MODERATORS
 

what's the archive type with the bext size compression (lowest size after archiving) but that has partial extraction (extracting specific files) ?

top 8 comments
sorted by: hot top controversial new old
[–] CorvusRidiculissimus@alien.top 1 points 10 months ago (1 children)

Depends on your data, but there are two major contenders for that title: 7z (with solid mode off) and zpaq. You will probably get slightly better compression on zpaq, but it's not widely known.

[–] gasterblastsky@alien.top 1 points 10 months ago

I tried with zpaq but it told ne that archive type did not support partial extraction

[–] ghjones@beehaw.org 1 points 10 months ago

Been awhile since I've looked, but you might consider pixz:

https://github.com/vasi/pixz

[–] MemeLordAscendant@alien.top 1 points 10 months ago

It's dependent on dataset. I would suggest 7z and simply uncheck "solid archive". There is info here on running a test to find the best compression: Link

You may want to look into filesystem compression. As it will be much easier to implement and may suit your needs.

[–] dr100@alien.top 1 points 10 months ago (1 children)

That is kind of inconsequential as you can always compress the files individually if you wish and then make a tar with all of them together.

The question is what files you have, based on that various algorithms would do better or worse. And of course not doing solid archives would add a penalty to most algorithms if the files are somehow similar.

[–] gasterblastsky@alien.top 1 points 10 months ago (2 children)

images and videos

mostly jpg png mp4 webm

[–] ghjones@beehaw.org 1 points 10 months ago

These formats are generally not very compressible by general purpose compression algorithms, as they are already compressed formats themselves, each with a compression algorithm specially tailored to their content type.

[–] Carnildo@alien.top 1 points 10 months ago

Tar.

Mostly not joking here -- the image and video formats you list are already heavily compressed. You'll be lucky to get even 1% compression from any format, so you might as well just package them up in an uncompressed archive format.