• sga@lemmings.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 days ago

    what do you really mean by documents? do you mean something like a word document (docx or odt) - they are just zip files with xml files with your text and images in a zip

    or pdf - which is kinda like a post script with raw data, and then compressed by zlib or epub which is html with images in a zip

    or something in a scanned format - just images, or djvu

    is it plaintext? then txt or tex (latex or something) or md or typ (typst), or even html

    and what do you mean by long term storage? if by long term you mean some opener exists after 50 years for that format - arguably all work, but plain text requires least amount of stack. At the end of day, they are all effectively same - text and images - with plain text you can not have images and rest in one file, and that is about it.

    do you mean something which is bit rot resistant - then basically all these are bad, but plain text is least bad since if you compress and bit rot happens - likelyness of recovery is lower. but if your archive format has recovery goals (something like pdfa, even docx has something like this i think). you can add bit rot resistance to plain text too - just create a archive, and uses something like par2