From: sct@dcs.ed.ac.uk (Stephen Tweedie) Subject: Re: FDFLUSH [and fsync too] Date: 18 Jan 1993 15:40:25 GMT
> joel@wam.umd.edu (Joel M. Hoffman) writes:
>> In article <ug8DXB5w165w@kf8nh.wariat.org> kf8nh@kf8nh.wariat.org (Brandon S.
>> >sct@dcs.ed.ac.uk (Stephen Tweedie) writes:
>> >> [re volume labels for floppies not used as file systems]
>> >> In particular, there are problems when you try to access the block
>> >> device itself, rather than going through the file system - for
>> >> example, when creating or checking a file system, or (especially for
>> >> floppies) tarring to/from the device. There is no way that a volume
>> >> label could be compatible with these different uses of the removeable
>> >> media.
>> >
>> >Incorrect. The driver "fakes it", so offset 0 is actually just *past* the
>> >volume label. Since this is done at the driver level it is invisible to
>> >both mount and tar. (If Linux had *character* devices for the floppy then
>> >there would be potential problems.)
>>
>> But then you'll have problems with, e.g., rawritten disks. One of the
>> nice things about tar, dd, etc., is that any OS can write data, and
>> any OS can read it.
> Only if the first sector looks like a volume label (it of course would
> contain a CRC so it could be detected as a label), in which case the driver
> invisibly skips the label. Or you can follow my other suggestion and have
> /dev/fd0 be raw, while /dev/lfd0 requires a label. I *did* try to make
> clear that the labeled-disk driver was not a complete replacement for
> unlabeled disks, but an enhancement that would be invoked automaticaly if a
> labeled disk is in the drive.
Fine - my point was exactly this, that if you are transferring data
between different machines then volume labels WILL cost you
compatibility. If tar disks created on my linux box have volume
labels then they won't work on the Suns at work unless I have
additional user-mode programs working for me.
Your comments about multi-volume device drivers for /dev/fd0 are
interesting; I had thought about implementing something like this
myself. The reason that I eventually just decided to implement the
FDFLUSH was that such a device driver would be a MAJOR undertaking.
Given the ability to manage disk changes through system calls, most of
what you want can be done at the user level by, for example, piping
tar into an archive splitter. This sort of solution will only work
for stream data, however; there is no way for an lseek on a pipe to be
passed to the program at the other end of the pipe!
In the mean time, is FDFLUSH worth inclusion in the kernel? I find it
extraordinarily useful for a ten-line patch.
Oh yes, and you were also surprised at linux's lack of fsync(). Yup,
it's true. I have been looking at implementing this for a while now,
and it would almost certainly have to be done as an extra vfs entry
point. Given the existing file systems, fsync() could be added fairly
easily by walking through a file's data blocks looking for dirty
blocks, but this would be inefficient for large files because you
would have to load every indirect block (dirty or not) during the
search for dirty data.
Alternative solutions I can think of would need extra complexity in
the kernel. For example, the buffer-head struct could be augmented
with an inode reference number (the device number is already stored
there); this would require passing the inode number to getblk()
whenever new blocks are requested, and would also require a way to
unmark the blocks on truncate() or unlink().
An alternative (more complex but less memory-hungry than marking every
buffer-head) would be to maintain a cache (at the filing-system level)
of all indirect blocks which have dirty children. This way, fsyncing
an append to a large file is much more efficient. The cache can be
flushed on sync(), and if the cache overflows, individual entries can
be flushed by syncing the children of individual blocks. This way, we
need to be much more careful about race conditions involving sync(),
and acquiring/releasing blocks is at least as complex as the
buffer-head solution.
I think that the best way would be to make an initial implementation
using the naive zone-walk solution, and to see whether any more
complex solution is worth the extra effort. I may get around to it
after the next stable defragmenter release.
By the way, what should the behaviour of fsync() be on character
devices? I'm tempted just to return ENOSYS, but it might be suitable
to perform a block until the device's output buffer is empty.