On Wed, Aug 19, 1998 at 05:32:00PM -0600, Duane Wessels wrote:
> Sounds nice. How does the application decide the disk has failed?
> 10 write errors? 50% per 60 seconds?
When you get ENOSPC after the write, which will have infrequently (or
else you have other problems to worry about), check which file the
descriptor is associated with and from that you can work out which
store the problem resides in.
You can then shrink that store (because one store might fill up
before the others) and adjust the total size appropriately, and flag
the error with a timestamp or something.
If another (say) 50 errors occur writing to that store withing a
predetrimined amount of time, take the store out of circulation (for
writes anyhow, might be harder for reads?).
If writing to a store gives you EIO, immeditatly mark that store RO
and adjust as above, if reading gives an EIO, then we have to take it
out of ciruculation for reads too.
None of this sounds hard, except maybe the case of reads?
-cw
Received on Tue Jul 29 2003 - 13:15:51 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:51 MST