Re: [NTLK] Experiencing Serious ATA 1.0b8-D Data Corruption (I can reproduce it).

From: Adam Warner (lists_at_consulting.net.nz)
Date: Sat Oct 20 2001 - 23:24:16 EDT


On Sun, 2001-10-21 at 05:07, Paul Guyot wrote:

>(just as you do with your package, which, although
> it's 1.3 MB, takes 2.2 MB in sectors if you add it on an empty store
> which is more than 2 MB).

Ok I understand the mathematics of how the apparent space loss occurs.
However I've understood from you that the free sector space can be used
by other packages, right (even with large packages)?

> But, when you want to delete it, the original sector is used, and
> freeing it means getting a free sector, copying to it and then
> freeing the original object. Copy sector is marked as non data to be
> not used (and is of course no longer free), but first sector isn't
> marked as free (or whatever the new used size of it is) until the
> transaction is over.
>
> So at this point, we needed something like 2 x 4400 free sectors
> while we only had 3600 left. And of course it failed.

OK. Understood. Deleting the package creates a copy before deleting it.
Even though the package is 1.3MB it uses up 2.2MB of free sectors. So
you'd need 4.4MB of total space to delete it. But the store is only 4MB.
 
> Of course, I have to figure out why the transaction doesn't properly
> reverted to the previous state

Indeed. The message about restoring the store to a clean state never
occurs.

, but I have logged entirely the
> deletion of the object (it took more than 4 hours and the log is
> about 11 MB long, larger than the previous largest log I made, still
> when debugging ATA Support, 5.8 MB).

OUCH! Sorry I've been such a showstopper. Thanks for all the effort you
are putting in to fix this.

> Nevertheless, I also need to find a way to prevent this to happen. I
> don't know if I really can change the written size of sectors when
> doing transactions, it might lead to big problems. Let's say that I
> do, it's possible nevertheless that you won't be able to delete an
> object you succeeded to store. I can also have plenty of problems if
> you do revert a transaction on a single object, which the system does
> in some particular cases, and so on.

If I understand these two conditions correctly:

* The storage method only causes fragmentation. Other programs are able
to use the free space in the sectors

* As the card becomes fragmented the journal can still store in the free
space...

Then my recommendation would be:

Intercept deletion on ATA stores. If free space remaining is less than
that required by the journal then stop the deletion from occurring.

Anything more can be ignored for the ATA 1.0 release. [For a post-ATA
release you could investigate providing the option of disabling the
journal for the period of the transaction].

I understand why the package cannot be deleted. The workaround is:

* use large flash memory stores. For example use a full 32MB for a 32MB
card. 32MB is now the low end of the Compact Flash market.

* Even with such large stores it is still theoretically possible to
upload a package that cannot be deleted. But they still can be by
reformatting the partition. Annoying but not the end of the world.

So long as you stop impossible deletion operations you will have
succeeded in fixing the data corruption.

> I also thought this morning before I found out the cause of your
> problem that I should reserve a pool of sectors for transaction when
> the store becomes full. But I surely can't reserve 4400 of them. I
> don't know how many to reserve, though, probably something depending
> on the size of the store.

OK the problem is that the card can be filled up and then it may be
impossible to delete anything, no matter how small the packages are.

This is a possible solution:

Let the user to set a per-partition reserved sector size. Make it a
default of 100 sectors (allow a 4-5 digit selection box). Don't allow
packages to be stored on the reserved sectors but do allow the reserved
sectors to be used for deletion operations (and if this is no possible
then add a tick box to allow the reserve sectors to be disabled in the
preferences).

Cautious users may want to reserve a large number of sectors. Others may
want to pack as much on the card as possible.

So long as there are enough small packages stored on the card it should
be possible to delete enough of them in order to be able to delete the
larger packages.

> In the end, I start to wonder if (a) linear stores do transaction on
> commands i.e. store a list of operations and execute them when you
> commit and (b) if they only do transactions on meta data (i.e. where
> objects are).
>
> (a) and (b) are incompatible with my wish of doing the maximum for
> the safety of data. Indeed, if a sector cannot be written for any
> reason, the whole transaction will be cancelled with the current
> implementation. If I do store the journal of commands to execute, I
> can't do that.

I think (b) is fine. If a sector cannot be written for any reason and
the transaction is cancelled won't the original data still be available?

For example: You go to copy some data. Copy operation fails. State is
restored to before the copy operation. Original data intact.

Moving a file: Move operation fails. Original filename is restored? Or
would you lose the data?

You go to delete some data: Delete fails. Data cannot be restored and
the file is deleted. But who cares? You wanted to delete the file
anyway.

I don't understand why (b) is not acceptable to you. Journalling
filesystems typically implement (b), like ReiserFS that I am using now.
Otherwise there is a huge performance hit if all data is being
duplicated.

Regards,
Adam

--
This is the Newtontalk mailinglist - http://www.newtontalk.net
To unsubscribe or manage: visit the above link or
	mailto:newtontalk-request_at_newtontalk.net?Subject=unsubscribe



This archive was generated by hypermail 2.1.2 : Thu Nov 01 2001 - 10:02:23 EST