- Jun 15, 2009
-
-
Tao Ma authored
In ocfs2_add_branch, we use the rightmost rec of the leaf extent block to generate the e_cpos for the newly added branch. In the most case, it is OK but if the parent extent block's rightmost rec covers more clusters than the leaf does, it will cause kernel panic if we insert some clusters in it. The message is something like: (7445,1):ocfs2_insert_at_leaf:3775 ERROR: bug expression: le16_to_cpu(el->l_next_free_rec) >= le16_to_cpu(el->l_count) (7445,1):ocfs2_insert_at_leaf:3775 ERROR: inode 66053, depth 0, count 28, next free 28, rec.cpos 270, rec.clusters 1, insert.cpos 275, insert.clusters 1 [<fa7ad565>] ? ocfs2_do_insert_extent+0xb58/0xda0 [ocfs2] [<fa7b08f2>] ? ocfs2_insert_extent+0x5bd/0x6ba [ocfs2] [<fa7b1b8b>] ? ocfs2_add_clusters_in_btree+0x37f/0x564 [ocfs2] ... The panic can be easily reproduced by the following small test case (with bs=512, cs=4K, and I remove all the error handling so that it looks clear enough for reading). int main(int argc, char **argv) { int fd, i; char buf[5] = "test"; fd = open(argv[1], O_RDWR|O_CREAT); for (i = 0; i < 30; i++) { lseek(fd, 40960 * i, SEEK_SET); write(fd, buf, 5); } ftruncate(fd, 1146880); lseek(fd, 1126400, SEEK_SET); write(fd, buf, 5); close(fd); return 0; } The reason of the panic is that: the 30 writes and the ftruncate makes the file's extent list looks like: Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 280 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 Now another write at 1126400(275 cluster) whiich will write at the gap between 271 and 280 will trigger ocfs2_add_branch, but the result after the function looks like: Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 280 86183 1 271 0 143592 So the extent record is intersected and make the following operation bug out. This patch just try to remove the gap before we add the new branch, so that the root(branch) rightmost rec will cover the same right position. So in the above case, before adding branch the tree will be changed to Tree Depth: 1 Count: 19 Next Free Rec: 1 ## Offset Clusters Block# 0 0 271 86183 SubAlloc Bit: 7 SubAlloc Slot: 0 Blknum: 86183 Next Leaf: 0 CRC32: 00000000 ECC: 0000 Tree Depth: 0 Count: 28 Next Free Rec: 28 ## Offset Clusters Block# Flags 0 0 1 143368 0x0 1 10 1 143376 0x0 ... 26 260 1 143576 0x0 27 270 1 143584 0x0 And after branch add, the tree looks like Tree Depth: 1 Count: 19 Next Free Rec: 2 ## Offset Clusters Block# 0 0 271 86183 1 271 0 143592 Signed-off-by:
Tao Ma <tao.ma@oracle.com> Acked-by:
Mark Fasheh <mfasheh@suse.com> Signed-off-by:
Joel Becker <joel.becker@oracle.com>
-
- Apr 03, 2009
-
-
Mark Fasheh authored
This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
- Mar 13, 2009
-
-
Tao Ma authored
We need to use le32_to_cpu to test rec->e_cpos in ocfs2_dinode_insert_check. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Feb 26, 2009
-
-
Tao Ma authored
In __ocfs2_mark_extent_written, when we meet with the situation of c_split_covers_rec, the old solution just replace the extent record and forget to access and dirty the buffer_head. This will cause a problem when the unwritten extent is in an extent block. So access and dirty it. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Feb 02, 2009
-
-
Mark Fasheh authored
We weren't reclaiming the clusters which get free'd from this function, so any user punching holes in a file would still have those bytes accounted against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this. Interestingly enough, the journal credits calculation already took this into account. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Jan Kara <jack@suse.cz>
-
- Jan 08, 2009
-
-
Fernando Carrijo authored
Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Acked-by:
Theodore Ts'o <tytso@mit.edu> Acked-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
David S. Miller <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Acked-by:
Casey Schaufler <casey@schaufler-ca.com> Acked-by:
Takashi Iwai <tiwai@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 05, 2009
-
-
Tao Ma authored
In commit "ocfs2: Use metadata-specific ocfs2_journal_access_*() functions", the wrong buffer_head is accessed. So change it to the right buffer_head. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
When an ocfs2 extended attribute is large enough to require its own allocation tree, we root it with an ocfs2_xattr_value_root. However, these roots can be a part of inodes, xattr blocks, or xattr buckets. Thus, they need a different journal access function for each container. We wrap the bh, its journal access function, and the value root (xv) in a structure called ocfs2_xattr_valu_buf. This is a package that can be passed around. In this first pass, we simply pass it to the extent tree code. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2 commit triggers and allow us to compute metadata ecc right before the buffers are written out. This commit provides ecc for inodes, extent blocks, group descriptors, and quota blocks. It is not safe to use extened attributes and metaecc at the same time yet. The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide the type of block at their root. Before, it didn't matter, but now the root block must use the appropriate ocfs2_journal_access_*() function. To keep this abstract, the structures now have a pointer to the matching journal_access function and a wrapper call to call it. A few places use naked ocfs2_write_block() calls instead of adding the blocks to the journal. We make sure to calculate their checksum and ecc before the write. Since we pass around the journal_access functions. Let's typedef them in ocfs2.h. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The majority of ocfs2_new_path() calls are: ocfs2_new_path(path_root_bh(otherpath), path_root_el(otherpath)); Let's call that ocfs2_new_path_from_path(). The rest do similar things from struct ocfs2_extent_tree. Let's call those ocfs2_new_path_from_et(). This will make the next change easier. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
Add block check calls to the read_block validate functions. This is the almost all of the read-side checking of metaecc. xattr buckets are not checked yet. Writes are also unchecked, and so a read-write mount will quickly fail. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Jan Kara authored
Add quota calls for allocation and freeing of inodes and space, also update estimates on number of needed credits for a transaction. Move out inode allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called outside of a transaction. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
JBD2 is fully backwards compatible with JBD and it's been tested enough with Ocfs2 that we can clean this code up now. Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
Add an optional validation hook to ocfs2_read_blocks(). Now the validation function is only called when a block was actually read off of disk. It is not called when the buffer was in cache. We add a buffer state bit BH_NeedsValidate to flag these buffers. It must always be one higher than the last JBD2 buffer state bit. The dinode, dirblock, extent_block, and xattr_block validators are lifted to this scheme directly. The group_descriptor validator needs to be split into two pieces. The first part only needs the gd buffer and is passed to ocfs2_read_block(). The second part requires the dinode as well, and is called every time. It's only 3 compares, so it's tiny. This also allows us to clean up the non-fatal gd check used by resize.c. It now has no magic argument. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
We weren't consistently checking extent blocks after we read them. Most places checked the signature, but none checked h_blkno or h_fs_signature. Create a toplevel ocfs2_read_extent_block() that does the read and the validation. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
Random places in the code would check a dinode bh to see if it was valid. Not only did they do different levels of validation, they handled errors in different ways. The previous commit unified inode block reads, validating all block reads in the same place. Thus, these haphazard checks are no longer necessary. Rather than eliminate them, however, we change them to BUG_ON() checks. This ensures the assumptions remain true. All of the code paths to these checks have been audited to ensure they come from a validated inode read. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The ocfs2 code currently reads inodes off disk with a simple ocfs2_read_block() call. Each place that does this has a different set of sanity checks it performs. Some check only the signature. A couple validate the block number (the block read vs di->i_blkno). A couple others check for VALID_FL. Only one place validates i_fs_generation. A couple check nothing. Even when an error is found, they don't all do the same thing. We wrap inode reading into ocfs2_read_inode_block(). This will validate all the above fields, going readonly if they are invalid (they never should be). ocfs2_read_inode_block_full() is provided for the places that want to pass read_block flags. Every caller is passing a struct inode with a valid ip_blkno, so we don't need a separate blkno argument either. We will remove the validation checks from the rest of the code in a later commit, as they are no longer necessary. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
This patch genericizes the high level handling of extent removal. ocfs2_remove_btree_range() is nearly identical to __ocfs2_remove_inode_range(), except that extent tree operations have been used where necessary. We update ocfs2_remove_inode_range() to use the generic helper. Now extent tree based structures have an easy way to truncate ranges. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
Tao Ma authored
Now in ocfs2 xattr set, the whole process are divided into many small parts and they are wrapped into diffrent transactions and it make the set doesn't look like a real transaction. So we want to integrate it into a real one. In some cases we will allocate some clusters and free some in just one transaction. e.g, one xattr is larger than inline size, so it and its value root is stored within the inode while the value is outside in a cluster. Then we try to update it with a smaller value(larger than the size of root but smaller than inline size), we may need to free the outside cluster while allocate a new bucket(one cluster) since now the inode may be full. The old solution will lock the global_bitmap(if the local alloc failed in stress test) and then the truncate log. This will cause a ABBA lock with truncate log flush. This patch add the clusters free in dealloc_ctxt, so that we can record the free clusters during the transaction and then free it after we release the global_bitmap in xattr set. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Oct 14, 2008
-
-
Joel Becker authored
More than 30 callers of ocfs2_read_block() pass exactly OCFS2_BH_CACHED. Only six pass a different flag set. Rather than have every caller care, let's make ocfs2_read_block() take no flags and always do a cached read. The remaining six places can call ocfs2_read_blocks() directly. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
Now that synchronous readers are using ocfs2_read_blocks_sync(), all callers of ocfs2_read_blocks() are passing an inode. Use it unconditionally. Since it's there, we don't need to pass the ocfs2_super either. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
This is pointless as brelse() already does the check. Signed-off-by: Mark Fasheh
-
Joel Becker authored
ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is limiting our maximum filesystem size. It's a pretty trivial change. Most functions are just renamed. The only functional change is moving to Jan's inode-based ordered data mode. It's better, too. Because JBD2 reads and writes JBD journals, this is compatible with any existing filesystem. It can even interact with JBD-based ocfs2 as long as the journal is formated for JBD. We provide a compatibility option so that paranoid people can still use JBD for the time being. This will go away shortly. [ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to ocfs2_truncate_for_delete(). --Mark ] Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The original get/put_extent_tree() functions held a reference on et_root_bh. However, every single caller already has a safe reference, making the get/put cycle irrelevant. We change ocfs2_get_*_extent_tree() to ocfs2_init_*_extent_tree(). It no longer gets a reference on et_root_bh. ocfs2_put_extent_tree() is removed. Callers now have a simpler init+use pattern. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
struct ocfs2_extent_tree_operations provides methods for the different on-disk btrees in ocfs2. Describing what those methods do is probably a good idea. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
We now have three different kinds of extent trees in ocfs2: inode data (dinode), extended attributes (xattr_tree), and extended attribute values (xattr_value). There is a nice abstraction for them, ocfs2_extent_tree, but it is hidden in alloc.c. All the calling functions have to pick amongst a varied API and pass in type bits and often extraneous pointers. A better way is to make ocfs2_extent_tree a first-class object. Everyone converts their object to an ocfs2_extent_tree() via the ocfs2_get_*_extent_tree() calls, then uses the ocfs2_extent_tree for all tree calls to alloc.c. This simplifies a lot of callers, making for readability. It also provides an easy way to add additional extent tree types, as they only need to be defined in alloc.c with a ocfs2_get_<new>_extent_tree() function. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
A couple places check an extent_tree for a valid inode. We move that out to add an eo_insert_check() operation. It can be called from ocfs2_insert_extent() and elsewhere. We also have the wrapper calls ocfs2_et_insert_check() and ocfs2_et_sanity_check() ignore NULL ops. That way we don't have to provide useless operations for xattr types. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
A caller knows what kind of extent tree they have. There's no reason they have to call ocfs2_get_extent_tree() with a NULL when they could just as easily call a specific function to their type of extent tree. Introduce ocfs2_dinode_get_extent_tree(), ocfs2_xattr_tree_get_extent_tree(), and ocfs2_xattr_value_get_extent_tree(). They only take the necessary arguments, calling into the underlying __ocfs2_get_extent_tree() to do the real work. __ocfs2_get_extent_tree() is the old ocfs2_get_extent_tree(), but without needing any switch-by-type logic. ocfs2_get_extent_tree() is now a wrapper around the specific calls. It exists because a couple alloc.c functions can take et_type. This will go later. Another benefit is that ocfs2_xattr_value_get_extent_tree() can take a struct ocfs2_xattr_value_root* instead of void*. This gives us typechecking where we didn't have it before. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
Provide an optional extent_tree_operation to specify the max_leaf_clusters of an ocfs2_extent_tree. If not provided, the value is 0 (unlimited). Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
ocfs2_num_free_extents() re-implements the logic of ocfs2_get_extent_tree(). Now that ocfs2_get_extent_tree() does not allocate, let's use it in ocfs2_num_free_extents() to simplify the code. The inode validation code in ocfs2_num_free_extents() is not needed. All callers are passing in pre-validated inodes. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The root_el of an ocfs2_extent_tree needs to be calculated from et->et_object. Make it an operation on et->et_ops. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The 'private' pointer was a way to store off xattr values, which don't live at a set place in the bh. But the concept of "the object containing the extent tree" is much more generic. For an inode it's the struct ocfs2_dinode, for an xattr value its the value. Let's save off the 'object' at all times. If NULL is passed to ocfs2_get_extent_tree(), 'object' is set to bh->b_data; Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
Rather than allocating a struct ocfs2_extent_tree, just put it on the stack. Fill it with ocfs2_get_extent_tree() and drop it with ocfs2_put_extent_tree(). Now the callers don't have to ENOMEM, yet still safely ref the root_bh. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The members of the ocfs2_extent_tree structure gain a prefix of 'et_'. All users are updated. Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Joel Becker authored
The ocfs2_extent_tree_operations structure gains a field prefix on its members. The ->eo_sanity_check() operation gains a wrapper function for completeness. All of the extent tree operation wrappers gain a consistent name (ocfs2_et_*()). Signed-off-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
In xattr bucket, we want to limit the maximum size of a btree leaf, otherwise we'll lose the benefits of hashing because we'll have to search large leaves. So add a new field in ocfs2_extent_tree which indicates the maximum leaf cluster size we want so that we can prevent ocfs2_insert_extent() from merging the leaf record even if it is contiguous with an adjacent record. Other btree types are not affected by this change. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
When necessary, an ocfs2_xattr_block will embed an ocfs2_extent_list to store large numbers of EAs. This patch adds a new type in ocfs2_extent_tree_type and adds the implementation so that we can re-use the b-tree code to handle the storage of many EAs. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tiger Yang authored
Add the structures and helper functions we want for handling inline extended attributes. We also update the inline-data handlers so that they properly function in the event that we have both inline data and inline attributes sharing an inode block. Signed-off-by:
Tiger Yang <tiger.yang@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Oct 13, 2008
-
-
Tao Ma authored
Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-