cross-posted from: https://slrpnk.net/post/21031468
SSDs can only tolerate a certain number of writes to each block. And the number is low. I have a 64gb SSD that went into a permanent read-only mode. 64gb is still today a very useful capacity. Thus the usefulness is cut short by hardware design deficiencies.
Contrast that with magnetic hard drives which often live beyond the usefulness of their capacity. That is, people toss out working 80mb mechanical drives now because they’re too small to justify the physical space they occupy, not because of premature failure ending the device’s useful life.
Nannying
When an SSD crosses a line whereby the manufacturer considers it unreliable, it goes into a read-only mode which (I believe) is passworded with a key that is not disclosed to consumers. The read-only mode is reasonable as it protects users from data loss. But the problem is the nannying that denies “owners” ultimate control over their own property.
When I try to
dd if=/dev/zero of=/dev/mydrive
, dd is lied to and will write zeros all day and report success, butdd
’s instructions are merely ignored and have no effect.The best fix in that scenario would generally be to tell the drive to erase itself using a special ATA command, like this:
$ hdparm --security-erase $'\0' /dev/sdb security_password: "" /dev/sdb: Issuing SECURITY_ERASE command, password="", user=user SG_IO: bad/missing sense data, sb[]: 70 00 01 00 00 00 00 0a 00 00 00 00 00 1d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 SG_IO: bad/missing sense data, sb[]: 70 00 0b 00 00 00 00 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Not sure why my null char got converted to a yen symbol, but as you can see the ATA instruction is blocked.
Here is a take from someone who endorses the nannying. The problem is that there is a presumption on how the drive will be used. Give me a special switch like:
$ hdparm --security-erase $'\0' --I-know-what-I-am-doing-please-let-me-shoot-myself-in-the-foot /dev/sdb
and this is what I would do:
$ dd if=KNOPPIX_V8.2-2018-05-10-EN.iso of=/dev/foo $ hdparm --make-read-only /dev/foo
When the drive crosses whatever arbitrary line of reliability, it’s of course perfectly reasonable to do one last write operation to control what content is used in read-only mode.
5 years later when a different live distro is needed, it would of course be reasonable to repeat the process. One write every ~5 years would at least keep the hardware somewhat useful in the long term.
I don’t think this is right to repair tbh. You can’t repair the SSD in this state? You’re not going to go in and repair any of the NAND modules. What you refer to as “nannying” is to stop what an overwhelming majority of end users would do. Which click “ignore” after not reading the message informing them the SSD is pre-failure and then losing all their data when the drive fails. A good majority of users aren’t anywhere near as tech savvy enough to understand what’s going on. The drive effectively being unusable prompts them to replace it while they still have the ability to remove data off of it.
I don’t think this is right to repair tbh. You can’t repair the SSD in this state?
“Repair” does not necessarily mean returning to a factory state. If a machine/appliance/device breaks and the OEM parts are no longer available, and you hack it to serve your purpose without restoring the original mint state, that’s still a repair. My bicycle is loaded after-market parts and hacks in order to keep it in service. The fact that the parts function differently does not mean they cease to repair the bike.
In the case at hand, the drive is crippled. To uncripple it to get back some of its original functionality to some extent is to repair it.
Which click “ignore” after not reading the message informing them
That’s not how it works. Though it would be feasible for an OS creator to implement such a click-through hack, that’s on them. ATM it does not exist. It’s unlikely that OS suppliers would want that liability.
A good majority of users aren’t anywhere near as tech savvy enough to understand what’s going on.
Nannying those who do know what they’re doing is not a justified propoposition when low-tech users can still be nannied nonetheless. Anyone who implements a one-click automatic dialog as you suggest would be at fault for low-tech users getting stung. Publishing an ATA password for
hdparm
users gives a sufficient hurdle for the tech illiterates without nannying advanced users.“Repair” does not necessarily mean returning to a factory state.
I didn’t claim as such and replacing a faulty or damaged module wouldn’t return it to factory condition. I wouldn’t consider “hacking” a drive to continue using it when you shouldn’t a repair. As far as I’m aware it’s to comply with JEDEC standards.
There’s now ambiguity between bits which, if this cell were allowed to remain active in an SSD, would mean that when you go to read a file on your drive there’s a chance that you won’t actually get the data you’re requesting. A good SSD should mark these bits bad at this point.
There’s a JEDEC spec that defines what should happen to the NAND once its cells get to this point. For consumer applications, the NAND should remain in a read-only state that can guarantee data availability for 12 months at 30C with the drive powered off.
I just don’t see how using a drive into the period where it’s likely to fail and lose data, against specification, is a good idea. Let alone a right to repair issue.
Source: https://www.anandtech.com/show/4902/intel-ssd-710-200gb-review/2
I didn’t claim as such
Luckily I quoted you, which shows that you have defined “repair” so narrowly as to exclude taking actions to restore a product to put back into service.
and replacing a faulty or damaged module wouldn’t return it to factory condition.
I never said it would. But more importantly, this is a red herring. I don’t accept your claim that it wouldn’t, but it’s a moot point because this is not the sort of repair I would do and it’s not likely worthwhile. The anti-repair tactic that I condemn is the one that blocks owners from hacks that make the device more useful than the read-only state.
I wouldn’t consider “hacking” a drive to continue using it when you shouldn’t a repair.
(emphasis mine) This is the nannying I am calling out. If someone can make a degraded product useful again, it’s neither your place nor the manufacturers place to tell advanced users/repairers not to – to dictate what is appropriate.
As far as I’m aware it’s to comply with JEDEC standards.
It’s over-compliant. Also, we don’t give a shit about JEDEC standards after the drive is garbage. The standards are only useful during the useful life of the product. From your own source:
In the consumer space you need that time to presumably transfer your data over.
I need a couple weeks tops to transfer my data. It’s good that we get a year. Then what? The drive is as useful as a brick. And needlessly so.
I just don’t see how using a drive into the period where it’s likely to fail and lose data,
That’s because you’re not making the distinction between reading and writing, and understanding that it’s writing that fails. The fitness to write to a NAND declines gradually with each cycle. Every transistor is different. A transistor might last 11,943 cycles and it sits next to a transistor that lasts 10,392 cycles. They drew a line and said “10k writes is safe for this tech, so draw a line there and go into read-only mode when an arbitrary number of transistors have likely undergone 10k writes”.
The telemetry on the device is not sophisticated enough to track exactly when a transistor’s state becomes ambiguous. So the best they could do is keep an avg cycle count which factors in a large safety margin for error. So of course it would be an insignificant risk to do 1 (or 5) more write cycles. Even if the straw that breaks the camel’s back is on the 1 additional write operation on a particular sector, we have software that is sophisticated enough to correct it. Have a look at
par2
.against specification,
It’s not “against” the spec because the spec does not specify how we may use the drive. Rightfully so. The spec says the drive must remain readable for 1 year after crossing a threshhold (which BTW is determined by write cycle counts not actual ability to store electrons).
is a good idea.
Bricking by design is a bad idea because preventable e-waste and consumerism is harmful to the environment. I write this post from a 2008 laptop that novice consumers would have declared useless 10 years ago.
Let alone a right to repair issue.
Of course it’s a right to repair issue because it’s a nannying anti-repair tactic that has prematurely forced a functional product into uselessness. I am being artificially blocked from returning the product into useful service.
Me providing an example of a repair is not me claiming it is the only method of repair.
If someone can make a degraded product useful again, it’s neither your place nor the manufacturers place to tell advanced users/repairers not to – to dictate what is appropriate.
Except, again, you aren’t making it useful again, you’re attempting to bypass a fail safe put in place by engineers. You aren’t repairing anything to make useful again, you aren’t fixing any part of the SSD. You’re merely attempting to bypass a “lockout”. You aren’t arguing to repair the drive; you’re arguing to keep using after this point (which is fine, even if I disagree with it).
That’s because you’re not making the distinction between reading and writing, and understanding that it’s writing that fails. The fitness to write to a NAND declines gradually with each cycle. Every transistor is different. A transistor might last 11,943 cycle…
The first paragraph quoted (and the article as whole) cover reads, different between different drives (including different specs for enterprise vs consumer) and how the values are drawn. 10k is for intel 50nm MLC NAND specifically. Other values are presented in the article. It isn’t arbitrary as you’ve attempted to hand wave it as. I suggest you read it in its entirety. It doesn’t matter how sophisticated the software standard is, the oxide on the drive will eventually wear down and is a physical problem.
I am being artificially blocked from returning the product into useful service
Except it isn’t useful service. I would have a hard time buying that a a pre-fail drive, even second hand, is useful for service. I get what you’re going for/saying but again it doesn’t pass for right to repair imo. It’s risking data loss to wring an extra 12 months (or likely, less) from a dying drive. For every 1 person like you that its an annoyance for it saves multitudes more that are less savvy pointlessly risking data loss.
Me providing an example of a repair is not me claiming it is the only method of repair.
Luckily I quoted you, which shows that you have defined “repair” so narrowly as to exclude taking actions to restore a product to put back into service.
Except, again, you aren’t making it useful again,
Of course it’s useful again. To claim writing to a drive is not useful is to misunderstand how storage devices are useful.
you’re attempting to bypass a fail safe put in place by engineers.
No I’m not. The fail safe should remain. That much was well done by engineers and I would be outraged if it were not in place. I WANT my drive to go into read-only mode when it crosses a reliability threshhold. The contention is what happens after the fail safe – after recovering the data. No one here believes the drive should not fail safe.
The first paragraph quoted (and the article as whole) cover reads, different between different drives (including different specs for enterprise vs consumer) and how the values are drawn.
Yes I read that. And? It’s immaterial to the discussion whether it’s an enterprise or consumer grade. Enterprise hardware still lands in the hands of consumers at 2nd-hand markets.
10k is for intel 50nm MLC NAND specifically. Other values are presented in the article.
And? Why do you think this is relevant to the nannying anti-repair discussion? It doesn’t obviate anything I have said. It’s just a red herring.
It isn’t arbitrary as you’ve attempted to hand wave it as.
Yes it is. Read your own source. They are counting write cycles to get an approximation of wear, not counting electrons that stick.
It doesn’t matter how sophisticated the software standard is, the oxide on the drive will eventually wear down and is a physical problem.
This supports what I have said. Extreme precision is not needed when we have software that gives redundancy to a user-specified extent and precisely detects errors.
it doesn’t pass for right to repair imo.
Denying owners control over their own property s.t. they cannot put it back into service is an assault on repair. Opposing the nannying is to advocate for a right to repair.
It’s risking data loss to wring an extra 12 months (or likely, less) from a dying drive.
You’re not grasping how the tech works. The 12 months is powered off state maintenance for reading. Again, you’re missing the reading and writing roles here. I’m not going to explain it again. Read your own source again.
For every 1 person like you that its an annoyance for it saves multitudes more that are less savvy pointlessly risking data loss.
This is a false dichotomy. It’s possible to protect the low tech novices without compromising experts from retaining control over their own product. This false dichotomy manifests from your erroneous belief that the fail safe contradicts an ability to reverse the safety switch after it triggers.
Luckily I quoted you, which shows that you have defined “repair” so narrowly as to exclude taking actions to restore a product to put back into service.
Yes, that would be a compelling point did I not, twice, tell you your interpretation of my quote is incorrect and go on to clarify it as an example. I think this makes your intentions clear enough that it isn’t worth continuing wasting time on. All I’ll say is I’m glad you have nothing to do with making the specifications for this sort of hardware and that it’s left to competent and educated engineers. Assault on repair, good lord lol.
your interpretation of my quote is incorrect
Your words, quoted here again as proof that you have defined “repair” so narrowly as to exclude taking actions to restore a product to put back into service:
I wouldn’t consider “hacking” a drive to continue using it when you shouldn’t a repair.
What is your mother tongue that is so far from English?
All I’ll say is I’m glad you have nothing to do with making the specifications for this sort of hardware and that it’s left to competent and educated engineers.
You are really lost here. We actually agreed on the engineering decision (which was the decision to have a fail safe trigger). Again, the point of contention is the management decision to block property owners from control over their own property after they recover their data – the management decision that forces useful hardware to be needlessly committed to e-waste after the data has been migrated. It is because you think the profit-driven management decision of a private enterprise is “engineering” makes you profoundly incompetent for involvement in engineering specs. But you might be able to do marketing or management at a company like Microsoft. Shareholders would at least love your corporate boot-licking posture and your propaganda rhetoric in framing management decisions as “engineering”.
But plz, stay away from specs. Proper specs favor the consumers/users and community. They are not optimized to exploit consumers to enrich corporate suppliers and generate landfill.
Better get SSD+HDD (SSD boot for OS, HDD storage for everything else) when setting up your PC rather than single or multiple SSDs, much cheaper, data doesnt get lost when the boot drive dies
That indeed makes sense from a purely practical PoV, if you neglect right to repair. But the drive maker (Apacer) is effectively denying users their right to repair through the nannying. Your approach is good for repair avoidance but still supports anti-repair suppliers in the end.
They would have to move away from SSD manufacturers in which their products are sourced from that particular maker