Linux系统调用fsync函数详解
网络知识 2023-02-09 13:14www.1681989.comseo网站推广
功能描述
同步内存中所有已修改的文件数据到储存设备。
用法
#clude <unistd.h>
t fsync(t fd);
参数
fd文件描述词。
返回说明
成功执行时,返回0。失败返回-1,errno被设为以下的某个值
EBADF 文件描述词无效
EIO 读写的过程中发生错误
EROFS, EINVAL文件所在的文件系统不支持同步
强制把系统缓存写入文件sync和fsync函数,, fflush和fsync的联系和区别2010-05-10 11:25传统的U N I X实现在内核中设有缓冲存储器,大多数磁盘I / O都通过缓存进行。当将数据写
到文件上时,通常该数据先由内核复制到缓存中,如果该缓存尚未写满,则并不将其排入输出
队列,而是等待其写满或者当内核需要重用该缓存以便存放其他磁盘块数据时,再将该缓存排
入输出队列,然后待其到达队首时,才进行实际的I / O操作。这种输出方式被称之为延迟写
(delayed write)(Bach 〔1 9 8 6〕第3章详细讨论了延迟写)。延迟写减少了磁盘读写次数,
第4章文件和目录8 7
下载
却降低了文件内容的更新速度,使得欲写到文件中的数据在一段时间内并没有写到磁盘上。当
系统发生故障时,这种延迟可能造成文件更新内容的丢失。为了保证磁盘上实际文件系统与缓
存中内容的一致性,U N I X系统提供了s y n c和f s y n c两个系统调用函数。
#clude <unistd.h>
void sync(void);
t fsync(tf i l e d e s) ;
返回若成功则为0,若出错则为-1
s y n c只是将所有修改过的块的缓存排入写队列,然后就返回,它并不等待实际I / O操作结束。
系统精灵进程(通常称为u p d a t e )一般每隔3 0秒调用一次s y n c函数。这就保证了定期刷新内
核的块缓存。命令s y n c ( 1 )也调用s y n c函数。
函数f s y n c只引用单个文件(由文件描述符f i l e d e s指定),它等待I / O结束,然后返回。f s y n c可
用于数据库这样的应用程序,它确保修改过的块立即写到磁盘上。比较一下f s y n c和O _ S Y N C标
志(见3 . 1 3节)。当调用f s y n c时,它更新文件的内容,而对于O _ S Y N C,则每次对文件调用w r i t e
函数时就更新文件的内容。
fflush和fsync的联系和区别
[zz ] http://blog.chaunix./u2/73874/showart_1421917.html
1.提供者fflush是libc.a中提供的方法,fsync是系统提供的系统调用。2.原形fflush接受一个参数FILE .fflush(FILE );fsync接受的时一个Int型的文件描述符。fsync(t fd);3.功能fflush:是把C库中的缓冲调用write函数写到磁盘[其实是写到内核的缓冲区]。fsync是把内核缓冲刷到磁盘上。
c库缓冲-----fflush---------〉内核缓冲--------fsync-----〉磁盘
再转一篇英文的
Write-back support
UBIFS supports write-back, which means that file changes do not go to the flash media straight away, but they are cached and go to the flash later, when it is absolutely necessary. This helps to greatly reduce the amount of I/O which results better performance. Write-back cachg is a standard technique which is used by most file systems like ext3 or XFS.
In contrast, JFFS2 does not have write-back support and all the JFFS2 file system changes go the flash synchronously. Well, this is not pletely true and JFFS2 does have a small buffer of a NAND page size (if the underlyg flash is NAND). This buffer contas last written data and is flushed once it is full. However, because the amount of cached data are very small, JFFS2 is very close to a synchronous file system.
Write-back support requires the application programmers to take extra care about synchronizg important files time. Otherwise the files may corrupt or disappear case of power-cuts, which happens very often many embedded devices. Let's take a glimpse at Lux manual pages:
$ man 2 write
....
NOTES
A suessful return from write() does not make any guarantee that data
has been mitted to disk. In fact, on some buggy implementations, it
does not even guarantee that space has suessfully been reserved for
the data. The only way to be sure is to call fsync(2) after you are
done writg all your data.
...
This is true for UBIFS (except of the "some buggy implementations" part, because UBIFS does reserves space for cached dirty data). This is also true for JFFS2, as well as for any other Lux file system.
However, some (perhaps not very good) user-space programmers do not take write-back to aount. They do not read manual pages carefully. When such applications are used embedded systems which run JFFS2 - they work fe, because JFFS2 is almost synchronous. Of course, the applications are buggy, but they appear to work well enough with JFFS2. But the bugs show up when UBIFS is used stead. Please, be careful and check/test your applications with respect to power cut tolerance if you switch from JFFS2 to UBIFS. The followg is a list of useful hts and advices.
If you want to switch to synchronous mode, use the -o sync option when mountg UBIFS; however, the file system performance will drop - be careful; Also remember that UBIFS mounted synchronous mode provides less guarantees than JFFS2 - refer this section for details.
Always keep md the above statement from the manual pages and run fsync() for all important files you change; of course, there is no need to synchronize "throw-away" temporary files; Just thk how important is the file data and decide; and do not use fsync() unnecessarily, because this will hit the performance;
If you want to be more aurate, you may use fdatasync(), which cases only data changes will be flushed, but not ode meta-data changes (e.g., "mtime" or permissions); this might be more optimal than usg fsync() if the synchronization is done often, e.g., a loop; otherwise just stick with fsync();
In shell, the sync mand may be used, but it synchronizes whole file system which might be not very optimal; and there is a similar libc sync() function;
You may use the O_SYNC flag of the open() call; this will make sure all the data (but not meta-data) changes go to the media before the write() operation returns; but general, it is better to use fsync(), because O_SYNC makes each write to be synchronous, while fsync() allows to aumulate many writes and synchronize them at once;
It is possible to make certa odes to be synchronous by default by settg the "sync" ode flag; a shell, the chattr +S mand may be used; C programs, use the FS_IOC_SETFLAGS ioctl mand; Note, the mkfs.ubifs tool checks for the "sync" flag the origal FS tree, so the synchronous files the origal FS tree will be synchronous the resultg UBIFS image.
Let us stress that the above items are true for any Lux file system, cludg JFFS2.
fsync() may be called for directories - it synchronizes the directory ode meta-data. The "sync" flag may also be set for directories to make the directory ode synchronous. But the flag is herited, which means all new children of this directory will also have this flag. New files and sub-directories of this directory will also be synchronous, and their children, and so forth. This feature is very useful if one needs to create a whole sub-tree of synchronous files and directories, or to make all new children of some directory to be synchronous by default (e.g., /etc).
The fdatasync() call for directories is "no-op" UBIFS and all UBIFS operations which change directory entries are synchronous. However, you should not assume this for portability (e.g., this is not true for ext2). Similarly, the "dirsync" ode flag has no effect UBIFS.
The functions mentioned above work on file-descriptors, not on streams (FILE ). To synchronize a stream, you should first get its file descriptor usg the fileno() libc function, then flush the stream usg fflush(), and then synchronize the file usg fsync() or fdatasync(). You may use other synchronization methods, but remember to flush the stream before synchronizg the file. The fflush() function flushes the libc-level buffers, while sync(), fsync(), etc flush kernel-level buffers.
Please, refer this FAQ entry for formation about how to atomically update the contents of a file. Also, the Theodore Tso's article is a good readg.
Write-back knobs Lux
Lux has several knobs "/proc/sys/vm" which you may use to tune write-back. The knobs are global, so they affect all file-systems. Please, refer the "Documentation/sysctl/vm.txt" file fore more formation. The file may be found the Lux kernel source tree. Below are terestg knobs described UBIFS context and a simplified form.
dirty_writeback_centisecs - how often the Lux periodic write-back thread wakes up and writes out dirty data. This is a mechanism which makes sure all dirty data hits the media at some pot.
dirty_expire_centisecs - dirty data expire period. This is maximum time data may stay dirty. After this period of time it will be written back by the Lux periodic write-back thread. IOW, the periodic write-back thread wakes up every "dirty_writeback_centisecs" centi-seconds and synchronizes data which was dirtied "dirty_expire_centisecs" centi-seconds ago.
dirty_background_ratio - maximum amount of dirty data percent of total memory. When the amount of dirty data bees larger, the periodic write-back thread starts synchronizg it until it bees smaller. Even non-expired data will be synchronized. This may be used to set a "soft" limit for the amount of dirty data the system.
dirty_ratio - maximum amount of dirty data at which writers will first synchronize the existg dirty data before addg more. IOW, this is a "hard" limit of the amount of dirty data the system.
Note, UBIFS additionally has small write-buffers which are synchronized every 3-5 seconds. This means that most of the dirty data are delayed by dirty_expire_centisecs centi-seconds, but the last few KiB are additionally delayed by 3-5 seconds.
UBIFS write-buffer
UBIFS is asynchronous file-system (read this section for more formation). As other Lux file-system, it utilizes the page cache. The page cache is a generic Lux memory-management mechanism. It may be very large and cache a lot of data. When you write to a file, the data are written to the page cache, marked as dirty, and the write returns (unless the file is synchronous). Later the data are written-back.
Write-buffer is an additional UBIFS buffer, which is implemented side UBIFS, and it sits between the page cache and the flash. This means that write-back actually writes to the write-buffer, not directly to the flash.
The write-buffer is designated to speed-up UBIFS on NAND flashes. NAND flashes consist of NAND pages, which are usually 512, 2KiB or 4KiB size. NAND page is the mimal read/write unit of NAND flash (see this section).
Write-buffer size is equivalent to NAND page size (so it is ty parg to the page cache). It's purpose is to aumulate small writes, and write full NAND pages stead of partially filled. Indeed, image we have to write 4 512-byte nodes with half a second terval, and NAND page size is 2KiB. Without write-buffer we would have to write 4 NAND pages and waste 6KiB of flash space, while write-buffer allows us to write only once and waste nothg. This means we write less, we create less dirty space so UBIFS garbage collector will have to do less work, we save power.
Well, the example shows an ideal situation, and even with the write-buffer we may waste space, for example case of synchronous I/O, or if the data arrives with long time tervals. This is because the write-buffer has an associated timer, which flushes it every 3-5 seconds, even if it isn't full. We do this for data tegrity reasons.
Of course, when UBIFS has to write a lot of data, it does not use write buffer. Only the last part of the data which is smaller than the NAND page ends up the write-buffer and waits more for data, until it is flushed by the timer.
The write-buffer implementation is a little more plex, and we actually have several of them - one for each journal head. But this does not change the basic idea behd the write-buffer.
Few notes with regards to synchronization:
"sync()" also synchronizes all write-buffers;
"fsync(fd)" also synchronizes all write-buffers which conta pieces of "fd";
synchronous files, as well as files opened with "O_SYNC", bypass write-buffers, so the I/O is deed synchronous for this files;
write-buffers are also bypassed if the file-system is mounted with the "-o sync" mount option.
Take to aount that write-buffers delay the data synchronization timeout defed by "dirty_expire_centisecs" (see here) by 3-5 seconds. However, sce write-buffers are small, only few data are delayed.
UBIFS synchronous mode vs JFFS2
When UBIFS is mounted synchronous mode (-o sync mount options) - all file system operations bee synchronous. This means that all data are written to flash before the file-system operations return.
For example, if you write 10MiB of data to a file f.dat usg the write() call, and UBIFS is synchronous mode, then UBIFS guarantees that all 10MiB of data and the meta-data (file size and date changes) will reach the flash media before write() returns. And if a power cut happens after the write() call returns, the file will conta the written data.
The same is true for situations when f.dat has was opened with O_SYNC or has the sync flag (see man 2 chattr).
It is well-known that the JFFS2 file-system is synchronous (except a small write-buffer). However, UBIFS synchronous mode is not the same as JFFS2 and provides somewhat less guarantees that JFFS2 does with respect to sudden power cuts.
In JFFS2 all the meta-data (like ode atime/mtime/ctime, ode size, UID/GID, etc) are stored the data node headers. Data nodes carry 4KiB of (pressed) data. This means that the meta-data formation is duplicated many places, but this also means that every time JFFS2 writes a data node to the flash media, it updates ode size as well. So when JFFS2 mounts it scans the flash media, fds the latest data node, and fetches the ode size from there.
In practice this means that JFFS2 will write these 10MiB of data sequentially, from the begng to the end. And if you have a power cut, you will just lose some amount of data at the end of the ode. For example, if JFFS2 starts writg those 10MiB of data, write 5MiB, and a power cut happens, you will end up with a 5MiB f.dat file. You lose only the last 5MiB.
Thgs are a little bit more plex case of UBIFS, where data are stored data nodes and meta-data are stored (separate) ode nodes. The meta-data are not duplicated each data node, like JFFS2. UBIFS never writes data nodes beyond the on-flash ode size. If it has to write a data node and the data node is beyond the on-flash ode size (the -memory ode has up-to-data size, but it is dirty and was not flushed yet), then UBIFS first writes the ode to the media, and then it starts writg the data. And if you have an terrupt, you lose data nodes and you have holes (or old data nodes, if you are overwritg). Lets consider an example.
User creates an empty file f.dat. The file is synchronous, or UBIFS is mounted synchronous mode. User calls the write() function with a 10MiB buffer.
The kernel first copies all 10MiB of the data to the page cache. Inode size is changed to 10MiB as well and the ode is marked as dirty. Nothg has been written to the flash media so far. If a power cut happens at this pot, the user will end up with an empty f.dat file.
UBIFS sees that the I/O has to be synchronous, and starts synchronizg the ode. First of all, it writes the ode node to the flash media. If a power cut happens at this moment, the user will end up with a 10MiB file which contas no data (hole), and if he read this file, he will get 10MiB of zeroes.
UBIFS starts writg the data. If a power cut happens at this pot, the user will end up with a 10MiB file contag a hole at the end.
Note, if the I/O was not synchronous, UBIFS would skip the last step and would just return. And the actual write-back would then happen back-ground. But power cuts durg write-back could anyway lead to files with holes at the end.
Thus, synchronous I/O UBIFS provides less guarantees than JFFS2 I/O - UBIFS has an effect of holes at the end of files. In ideal world applications should not assume anythg about the contents of files which were not synchronized before a power-cut has happened. And "mastream" file-systems like ext3 do not provide JFSS2-like guarantees.
However, UBIFS is sometimes used as a JFFS2 replacement and people may want it to behave the same way as JFFS2 if it is mounted synchronously. This is doable, but needs some non-trivial development, so this was not implemented so far. On the other hand, there was no strong demand. You may implement this as an exercise, or you may try to convce UBIFS authors to do this.
Synchronization exceptions for buggy applications
As this section describes, UBIFS is an asynchronous file-system, and applications should synchronize their files whenever it is required. The same applies to most Lux file-systems, e.g. XFS.
However, many applications ignore this and do not synchronize files properly. And there was a huge war between user-space and kernel developers related to ext4 delayed allocation feature. Please, see the Theodore Tso's blog post. More formation may be found this LWN article.
In short, the flame war was about 2 cases. The first case was about the atomic re-name, where many user-space programs did not synchronize the copy before re-namg it. The second case was about applications which truncate files, then change them. There was no fal agreement, but the "we cannot ignore the real world" argument found ext4 developers' understandg, and there were 2 ext4 changes which help both problems.
Roughly speakg, the first change made ext4 synchronize files on close if they were previously truncated. This was a hack from file-system pot of view, but it "fixed" applications which truncate files, write new contents, and close the files without synchronizg them.
The second change made ext4 synchronize the renamed file.
Well, this is not exactly correct description, because ext4 does not write the files synchronously, but actually itiates asynchronous write-out of the files, so the performance hit is not very high. For the truncation case this means that the file is synchronized soon after it is closed. For the re-name case this means that ext4 writes data before it writes the re-name meta-data.
However, the application writers should never rely on these thgs, because this is not portable. Instead, they should properly synchronize files. The ext4 fixes were because there were many broken user-space applications the wild already.
We have plans to implement these features UBIFS, but this has not been done yet. The problem is that UBI/MTD are fully synchronous and we cannot itiate asynchronous write-out, so we'd have to synchronously write files on close/rename, which is slow. So implementg these features would require implementg asynchronous I/O UBI, which is a big job. But feel free to do this :-).
同步内存中所有已修改的文件数据到储存设备。
用法
#clude <unistd.h>
t fsync(t fd);
参数
fd文件描述词。
返回说明
成功执行时,返回0。失败返回-1,errno被设为以下的某个值
EBADF 文件描述词无效
EIO 读写的过程中发生错误
EROFS, EINVAL文件所在的文件系统不支持同步
强制把系统缓存写入文件sync和fsync函数,, fflush和fsync的联系和区别2010-05-10 11:25传统的U N I X实现在内核中设有缓冲存储器,大多数磁盘I / O都通过缓存进行。当将数据写
到文件上时,通常该数据先由内核复制到缓存中,如果该缓存尚未写满,则并不将其排入输出
队列,而是等待其写满或者当内核需要重用该缓存以便存放其他磁盘块数据时,再将该缓存排
入输出队列,然后待其到达队首时,才进行实际的I / O操作。这种输出方式被称之为延迟写
(delayed write)(Bach 〔1 9 8 6〕第3章详细讨论了延迟写)。延迟写减少了磁盘读写次数,
第4章文件和目录8 7
下载
却降低了文件内容的更新速度,使得欲写到文件中的数据在一段时间内并没有写到磁盘上。当
系统发生故障时,这种延迟可能造成文件更新内容的丢失。为了保证磁盘上实际文件系统与缓
存中内容的一致性,U N I X系统提供了s y n c和f s y n c两个系统调用函数。
#clude <unistd.h>
void sync(void);
t fsync(tf i l e d e s) ;
返回若成功则为0,若出错则为-1
s y n c只是将所有修改过的块的缓存排入写队列,然后就返回,它并不等待实际I / O操作结束。
系统精灵进程(通常称为u p d a t e )一般每隔3 0秒调用一次s y n c函数。这就保证了定期刷新内
核的块缓存。命令s y n c ( 1 )也调用s y n c函数。
函数f s y n c只引用单个文件(由文件描述符f i l e d e s指定),它等待I / O结束,然后返回。f s y n c可
用于数据库这样的应用程序,它确保修改过的块立即写到磁盘上。比较一下f s y n c和O _ S Y N C标
志(见3 . 1 3节)。当调用f s y n c时,它更新文件的内容,而对于O _ S Y N C,则每次对文件调用w r i t e
函数时就更新文件的内容。
fflush和fsync的联系和区别
[zz ] http://blog.chaunix./u2/73874/showart_1421917.html
1.提供者fflush是libc.a中提供的方法,fsync是系统提供的系统调用。2.原形fflush接受一个参数FILE .fflush(FILE );fsync接受的时一个Int型的文件描述符。fsync(t fd);3.功能fflush:是把C库中的缓冲调用write函数写到磁盘[其实是写到内核的缓冲区]。fsync是把内核缓冲刷到磁盘上。
c库缓冲-----fflush---------〉内核缓冲--------fsync-----〉磁盘
再转一篇英文的
Write-back support
UBIFS supports write-back, which means that file changes do not go to the flash media straight away, but they are cached and go to the flash later, when it is absolutely necessary. This helps to greatly reduce the amount of I/O which results better performance. Write-back cachg is a standard technique which is used by most file systems like ext3 or XFS.
In contrast, JFFS2 does not have write-back support and all the JFFS2 file system changes go the flash synchronously. Well, this is not pletely true and JFFS2 does have a small buffer of a NAND page size (if the underlyg flash is NAND). This buffer contas last written data and is flushed once it is full. However, because the amount of cached data are very small, JFFS2 is very close to a synchronous file system.
Write-back support requires the application programmers to take extra care about synchronizg important files time. Otherwise the files may corrupt or disappear case of power-cuts, which happens very often many embedded devices. Let's take a glimpse at Lux manual pages:
$ man 2 write
....
NOTES
A suessful return from write() does not make any guarantee that data
has been mitted to disk. In fact, on some buggy implementations, it
does not even guarantee that space has suessfully been reserved for
the data. The only way to be sure is to call fsync(2) after you are
done writg all your data.
...
This is true for UBIFS (except of the "some buggy implementations" part, because UBIFS does reserves space for cached dirty data). This is also true for JFFS2, as well as for any other Lux file system.
However, some (perhaps not very good) user-space programmers do not take write-back to aount. They do not read manual pages carefully. When such applications are used embedded systems which run JFFS2 - they work fe, because JFFS2 is almost synchronous. Of course, the applications are buggy, but they appear to work well enough with JFFS2. But the bugs show up when UBIFS is used stead. Please, be careful and check/test your applications with respect to power cut tolerance if you switch from JFFS2 to UBIFS. The followg is a list of useful hts and advices.
If you want to switch to synchronous mode, use the -o sync option when mountg UBIFS; however, the file system performance will drop - be careful; Also remember that UBIFS mounted synchronous mode provides less guarantees than JFFS2 - refer this section for details.
Always keep md the above statement from the manual pages and run fsync() for all important files you change; of course, there is no need to synchronize "throw-away" temporary files; Just thk how important is the file data and decide; and do not use fsync() unnecessarily, because this will hit the performance;
If you want to be more aurate, you may use fdatasync(), which cases only data changes will be flushed, but not ode meta-data changes (e.g., "mtime" or permissions); this might be more optimal than usg fsync() if the synchronization is done often, e.g., a loop; otherwise just stick with fsync();
In shell, the sync mand may be used, but it synchronizes whole file system which might be not very optimal; and there is a similar libc sync() function;
You may use the O_SYNC flag of the open() call; this will make sure all the data (but not meta-data) changes go to the media before the write() operation returns; but general, it is better to use fsync(), because O_SYNC makes each write to be synchronous, while fsync() allows to aumulate many writes and synchronize them at once;
It is possible to make certa odes to be synchronous by default by settg the "sync" ode flag; a shell, the chattr +S mand may be used; C programs, use the FS_IOC_SETFLAGS ioctl mand; Note, the mkfs.ubifs tool checks for the "sync" flag the origal FS tree, so the synchronous files the origal FS tree will be synchronous the resultg UBIFS image.
Let us stress that the above items are true for any Lux file system, cludg JFFS2.
fsync() may be called for directories - it synchronizes the directory ode meta-data. The "sync" flag may also be set for directories to make the directory ode synchronous. But the flag is herited, which means all new children of this directory will also have this flag. New files and sub-directories of this directory will also be synchronous, and their children, and so forth. This feature is very useful if one needs to create a whole sub-tree of synchronous files and directories, or to make all new children of some directory to be synchronous by default (e.g., /etc).
The fdatasync() call for directories is "no-op" UBIFS and all UBIFS operations which change directory entries are synchronous. However, you should not assume this for portability (e.g., this is not true for ext2). Similarly, the "dirsync" ode flag has no effect UBIFS.
The functions mentioned above work on file-descriptors, not on streams (FILE ). To synchronize a stream, you should first get its file descriptor usg the fileno() libc function, then flush the stream usg fflush(), and then synchronize the file usg fsync() or fdatasync(). You may use other synchronization methods, but remember to flush the stream before synchronizg the file. The fflush() function flushes the libc-level buffers, while sync(), fsync(), etc flush kernel-level buffers.
Please, refer this FAQ entry for formation about how to atomically update the contents of a file. Also, the Theodore Tso's article is a good readg.
Write-back knobs Lux
Lux has several knobs "/proc/sys/vm" which you may use to tune write-back. The knobs are global, so they affect all file-systems. Please, refer the "Documentation/sysctl/vm.txt" file fore more formation. The file may be found the Lux kernel source tree. Below are terestg knobs described UBIFS context and a simplified form.
dirty_writeback_centisecs - how often the Lux periodic write-back thread wakes up and writes out dirty data. This is a mechanism which makes sure all dirty data hits the media at some pot.
dirty_expire_centisecs - dirty data expire period. This is maximum time data may stay dirty. After this period of time it will be written back by the Lux periodic write-back thread. IOW, the periodic write-back thread wakes up every "dirty_writeback_centisecs" centi-seconds and synchronizes data which was dirtied "dirty_expire_centisecs" centi-seconds ago.
dirty_background_ratio - maximum amount of dirty data percent of total memory. When the amount of dirty data bees larger, the periodic write-back thread starts synchronizg it until it bees smaller. Even non-expired data will be synchronized. This may be used to set a "soft" limit for the amount of dirty data the system.
dirty_ratio - maximum amount of dirty data at which writers will first synchronize the existg dirty data before addg more. IOW, this is a "hard" limit of the amount of dirty data the system.
Note, UBIFS additionally has small write-buffers which are synchronized every 3-5 seconds. This means that most of the dirty data are delayed by dirty_expire_centisecs centi-seconds, but the last few KiB are additionally delayed by 3-5 seconds.
UBIFS write-buffer
UBIFS is asynchronous file-system (read this section for more formation). As other Lux file-system, it utilizes the page cache. The page cache is a generic Lux memory-management mechanism. It may be very large and cache a lot of data. When you write to a file, the data are written to the page cache, marked as dirty, and the write returns (unless the file is synchronous). Later the data are written-back.
Write-buffer is an additional UBIFS buffer, which is implemented side UBIFS, and it sits between the page cache and the flash. This means that write-back actually writes to the write-buffer, not directly to the flash.
The write-buffer is designated to speed-up UBIFS on NAND flashes. NAND flashes consist of NAND pages, which are usually 512, 2KiB or 4KiB size. NAND page is the mimal read/write unit of NAND flash (see this section).
Write-buffer size is equivalent to NAND page size (so it is ty parg to the page cache). It's purpose is to aumulate small writes, and write full NAND pages stead of partially filled. Indeed, image we have to write 4 512-byte nodes with half a second terval, and NAND page size is 2KiB. Without write-buffer we would have to write 4 NAND pages and waste 6KiB of flash space, while write-buffer allows us to write only once and waste nothg. This means we write less, we create less dirty space so UBIFS garbage collector will have to do less work, we save power.
Well, the example shows an ideal situation, and even with the write-buffer we may waste space, for example case of synchronous I/O, or if the data arrives with long time tervals. This is because the write-buffer has an associated timer, which flushes it every 3-5 seconds, even if it isn't full. We do this for data tegrity reasons.
Of course, when UBIFS has to write a lot of data, it does not use write buffer. Only the last part of the data which is smaller than the NAND page ends up the write-buffer and waits more for data, until it is flushed by the timer.
The write-buffer implementation is a little more plex, and we actually have several of them - one for each journal head. But this does not change the basic idea behd the write-buffer.
Few notes with regards to synchronization:
"sync()" also synchronizes all write-buffers;
"fsync(fd)" also synchronizes all write-buffers which conta pieces of "fd";
synchronous files, as well as files opened with "O_SYNC", bypass write-buffers, so the I/O is deed synchronous for this files;
write-buffers are also bypassed if the file-system is mounted with the "-o sync" mount option.
Take to aount that write-buffers delay the data synchronization timeout defed by "dirty_expire_centisecs" (see here) by 3-5 seconds. However, sce write-buffers are small, only few data are delayed.
UBIFS synchronous mode vs JFFS2
When UBIFS is mounted synchronous mode (-o sync mount options) - all file system operations bee synchronous. This means that all data are written to flash before the file-system operations return.
For example, if you write 10MiB of data to a file f.dat usg the write() call, and UBIFS is synchronous mode, then UBIFS guarantees that all 10MiB of data and the meta-data (file size and date changes) will reach the flash media before write() returns. And if a power cut happens after the write() call returns, the file will conta the written data.
The same is true for situations when f.dat has was opened with O_SYNC or has the sync flag (see man 2 chattr).
It is well-known that the JFFS2 file-system is synchronous (except a small write-buffer). However, UBIFS synchronous mode is not the same as JFFS2 and provides somewhat less guarantees that JFFS2 does with respect to sudden power cuts.
In JFFS2 all the meta-data (like ode atime/mtime/ctime, ode size, UID/GID, etc) are stored the data node headers. Data nodes carry 4KiB of (pressed) data. This means that the meta-data formation is duplicated many places, but this also means that every time JFFS2 writes a data node to the flash media, it updates ode size as well. So when JFFS2 mounts it scans the flash media, fds the latest data node, and fetches the ode size from there.
In practice this means that JFFS2 will write these 10MiB of data sequentially, from the begng to the end. And if you have a power cut, you will just lose some amount of data at the end of the ode. For example, if JFFS2 starts writg those 10MiB of data, write 5MiB, and a power cut happens, you will end up with a 5MiB f.dat file. You lose only the last 5MiB.
Thgs are a little bit more plex case of UBIFS, where data are stored data nodes and meta-data are stored (separate) ode nodes. The meta-data are not duplicated each data node, like JFFS2. UBIFS never writes data nodes beyond the on-flash ode size. If it has to write a data node and the data node is beyond the on-flash ode size (the -memory ode has up-to-data size, but it is dirty and was not flushed yet), then UBIFS first writes the ode to the media, and then it starts writg the data. And if you have an terrupt, you lose data nodes and you have holes (or old data nodes, if you are overwritg). Lets consider an example.
User creates an empty file f.dat. The file is synchronous, or UBIFS is mounted synchronous mode. User calls the write() function with a 10MiB buffer.
The kernel first copies all 10MiB of the data to the page cache. Inode size is changed to 10MiB as well and the ode is marked as dirty. Nothg has been written to the flash media so far. If a power cut happens at this pot, the user will end up with an empty f.dat file.
UBIFS sees that the I/O has to be synchronous, and starts synchronizg the ode. First of all, it writes the ode node to the flash media. If a power cut happens at this moment, the user will end up with a 10MiB file which contas no data (hole), and if he read this file, he will get 10MiB of zeroes.
UBIFS starts writg the data. If a power cut happens at this pot, the user will end up with a 10MiB file contag a hole at the end.
Note, if the I/O was not synchronous, UBIFS would skip the last step and would just return. And the actual write-back would then happen back-ground. But power cuts durg write-back could anyway lead to files with holes at the end.
Thus, synchronous I/O UBIFS provides less guarantees than JFFS2 I/O - UBIFS has an effect of holes at the end of files. In ideal world applications should not assume anythg about the contents of files which were not synchronized before a power-cut has happened. And "mastream" file-systems like ext3 do not provide JFSS2-like guarantees.
However, UBIFS is sometimes used as a JFFS2 replacement and people may want it to behave the same way as JFFS2 if it is mounted synchronously. This is doable, but needs some non-trivial development, so this was not implemented so far. On the other hand, there was no strong demand. You may implement this as an exercise, or you may try to convce UBIFS authors to do this.
Synchronization exceptions for buggy applications
As this section describes, UBIFS is an asynchronous file-system, and applications should synchronize their files whenever it is required. The same applies to most Lux file-systems, e.g. XFS.
However, many applications ignore this and do not synchronize files properly. And there was a huge war between user-space and kernel developers related to ext4 delayed allocation feature. Please, see the Theodore Tso's blog post. More formation may be found this LWN article.
In short, the flame war was about 2 cases. The first case was about the atomic re-name, where many user-space programs did not synchronize the copy before re-namg it. The second case was about applications which truncate files, then change them. There was no fal agreement, but the "we cannot ignore the real world" argument found ext4 developers' understandg, and there were 2 ext4 changes which help both problems.
Roughly speakg, the first change made ext4 synchronize files on close if they were previously truncated. This was a hack from file-system pot of view, but it "fixed" applications which truncate files, write new contents, and close the files without synchronizg them.
The second change made ext4 synchronize the renamed file.
Well, this is not exactly correct description, because ext4 does not write the files synchronously, but actually itiates asynchronous write-out of the files, so the performance hit is not very high. For the truncation case this means that the file is synchronized soon after it is closed. For the re-name case this means that ext4 writes data before it writes the re-name meta-data.
However, the application writers should never rely on these thgs, because this is not portable. Instead, they should properly synchronize files. The ext4 fixes were because there were many broken user-space applications the wild already.
We have plans to implement these features UBIFS, but this has not been done yet. The problem is that UBI/MTD are fully synchronous and we cannot itiate asynchronous write-out, so we'd have to synchronously write files on close/rename, which is slow. So implementg these features would require implementg asynchronous I/O UBI, which is a big job. But feel free to do this :-).
网络推广
- 5g网络有多快:5g网络网速有多快
- dota2无法连接至steam网络 为什么无法连接dota2网络
- 无线网络信号接收器 无线网络信号接收器怎么用
- 网络延时是什么意思 网络延时是什么原因
- 无线网络不可用:无线网络不可用是什么原因
- 网络广告销售技巧 网络广告销售技巧有哪些
- 智能手机网络设置 智能手机网络设置在哪里
- 为什么找不到无线网络 为什么找不到无线网络信
- 网络这么赚钱:网络怎么能赚到钱
- 为什么无线网络连接不上 为什么无线网连不起来
- 网络上的人际交往 网络上人际交往的优势
- 支付宝网络系统异常 支付宝显示网络异常什么时
- 营销软件:营销软件有哪些
- 无法访问您要使用的功能所在的网络位置
- 网络安全基本知识 网络安全基本知识有哪些
- 什么网络电话最好用 什么网络电话最好用最便宜