- Apr 03, 2009
-
-
Sunil Mushran authored
This patch adds code to create and destroy the dlm->master_hash. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
This patch refactors dlm_clean_master_list() so as to make it easier to convert the mle list to a hash. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
For master mle, the name it stored in the attached lockres in struct qstr. For block and migration mle, the name is stored inline in struct dlm_lock_name. This patch attempts to make struct dlm_lock_name look like a struct qstr. While we could use struct qstr, we don't because we want to avoid having to malloc and free the lockname string as the mle's lifetime is fairly short. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
This patch encapsulates adding and removing of the mle from the dlm->master_list. This patch is part of the series of patches that converts the mle list to a mle hash. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
In ocfs2, the block group search looks for the "emptiest" group to allocate from. So if the allocator has many equally(or almost equally) empty groups, new block group will tend to get spread out amongst them. So we add osb_inode_alloc_group in ocfs2_super to record the last used inode allocation group. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy . I have done some basic test and the results are a ten times improvement on some cold-cache stat workloads. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
Inode groups used to be allocated from local alloc file, but since we want all inodes to be contiguous enough, we will try to allocate them directly from global_bitmap. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
In ocfs2, the inode block search looks for the "emptiest" inode group to allocate from. So if an inode alloc file has many equally (or almost equally) empty groups, new inodes will tend to get spread out amongst them, which in turn can put them all over the disk. This is undesirable because directory operations on conceptually "nearby" inodes force a large number of seeks. So we add ip_last_used_group in core directory inodes which records the last used allocation group. Another field named ip_last_used_slot is also added in case inode stealing happens. When claiming new inode, we passed in directory's inode so that the allocation can use this information. For more details, please see http://oss.oracle.com/osswiki/OCFS2/DesignDocs/InodeAllocationStrategy . Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
ocfs2_dx_dir_rebalance() is passed the block offset of a dx leaf which needs rebalancing. Since we rebalance an entire cluster at a time however, this function needs to calculate the beginning of that cluster, in blocks. The calculation was wrong, which would result in a read of non-leaf blocks. Fix the calculation by adding ocfs2_block_to_cluster_start() which is a more straight-forward way of determining this. Reported-by:
Tristan Ye <tristan.ye@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
ocfs2_empty_dir() is far more expensive than checking link count. Since both need to be checked at the same time, we can improve performance by checking link count first. Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
Since the disk format is finalized, we can set this feature bit in the supported mask. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <Joel.Becker@oracle.com>
-
Mark Fasheh authored
This little bit of extra accounting speeds up ocfs2_empty_dir() dramatically by allowing us to short-circuit the full directory scan. Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Mark Fasheh authored
Since we've now got a directory format capable of handling a large number of entries, we can increase the maximum link count supported. This only gets increased if the directory indexing feature is turned on. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
Mark Fasheh authored
The only operation which doesn't get faster with directory indexing is insert, which still has to walk the entire unindexed directory portion to find a free block. This patch provides an improvement in directory insert performance by maintaining a singly linked list of directory leaf blocks which have space for additional dirents. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
Mark Fasheh authored
Allow us to store a small number of directory index records in the ocfs2_dx_root_block. This saves us a disk read on small to medium sized directories (less than about 250 entries). The inline root is automatically turned into a root block with extents if the directory size increases beyond it's capacity. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
Mark Fasheh authored
This patch makes use of Ocfs2's flexible btree code to add an additional tree to directory inodes. The new tree stores an array of small, fixed-length records in each leaf block. Each record stores a hash value, and pointer to a block in the traditional (unindexed) directory tree where a dirent with the given name hash resides. Lookup exclusively uses this tree to find dirents, thus providing us with constant time name lookups. Some of the hashing code was copied from ext3. Unfortunately, it has lots of unfixed checkpatch errors. I left that as-is so that tracking changes would be easier. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
Mark Fasheh authored
Many directory manipulation calls pass around a tuple of dirent, and it's containing buffer_head. Dir indexing has a bit more state, but instead of adding yet more arguments to functions, we introduce 'struct ocfs2_dir_lookup_result'. In this patch, it simply holds the same tuple, but future patches will add more state. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Joel Becker <joel.becker@oracle.com>
-
Sunil Mushran authored
This patch removes the debugfs file local_alloc_stats as that information is now included in the fs_state debugfs file. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
This patch creates a per mount debugfs file, fs_state, which exposes information like, cluster stack in use, states of the downconvert, recovery and commit threads, number of journal txns, some allocation stats, list of all slots, etc. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
Move the definition of struct recovery_map from journal.c to journal.h. This is preparation for the next patch. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
This patch creates a debugfs file, o2hb/livesnodes, which exposes the aggregate list of heartbeating node across all heartbeat regions. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Apr 01, 2009
-
-
Nicholas Piggin authored
Change the page_mkwrite prototype to take a struct vm_fault, and return VM_FAULT_xxx flags. There should be no functional change. This makes it possible to return much more detailed error information to the VM (and also can provide more information eg. virtual_address to the driver, which might be important in some special cases). This is required for a subsequent fix. And will also make it easier to merge page_mkwrite() with fault() in future. Signed-off-by:
Nick Piggin <npiggin@suse.de> Cc: Chris Mason <chris.mason@oracle.com> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Mark Fasheh <mfasheh@suse.com> Cc: Joel Becker <joel.becker@oracle.com> Cc: Artem Bityutskiy <dedekind@infradead.org> Cc: Felix Blyakher <felixb@sgi.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Al Viro authored
current->fs->umask is what most of fs_struct users are doing. Put that into a helper function. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 27, 2009
-
-
Al Viro authored
Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Mar 13, 2009
-
-
Tao Ma authored
A long time ago, xs->base is allocated a 4K size and all the contents in the bucket are copied to the it. Now we use ocfs2_xattr_bucket to abstract xattr bucket and xs->base is initialized to the start of the bu_bhs[0]. So xs->base + offset will overflow when the value root is stored outside the first block. Then why we can survive the xattr test by now? It is because we always read the bucket contiguously now and kernel mm allocate continguous memory for us. We are lucky, but we should fix it. So just get the right value root as other callers do. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
We need to use le32_to_cpu to test rec->e_cpos in ocfs2_dinode_insert_check. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tiger Yang authored
Replace max_inline_data with max_inline_data_with_xattr to ensure it correct when xattr inlined. Signed-off-by:
Tiger Yang <tiger.yang@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tiger Yang authored
If this is a new directory with inline data, we choose to reserve the entire inline area for directory contents and force an external xattr block. Signed-off-by:
Tiger Yang <tiger.yang@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Feb 26, 2009
-
-
wengang wang authored
Check for IO error in ocfs2_get_sector(). Signed-off-by:
Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tiger Yang authored
This patch set a gap (4 bytes) between xattr entry and name/value when xattr in bucket. This gap use to seperate entry and name/value when a bucket is full. It had already been set when xattr in inode/block. Signed-off-by:
Tiger Yang <tiger.yang@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks with io_mutex held. While for xattr bucket, it is calculated by the whole buckets. So we have to add a spin_lock to prevent multiple processes calculating metaecc. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Tested-by:
Tristan Ye <tristan.ye@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
In ctime updating of xattr, it use the wrong type of access for inode, so use ocfs2_journal_access_di instead. Reported-and-Tested-by:
Tristan Ye <tristan.ye@oracle.com> Signed-off-by:
Tao Ma <tao.ma@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
In dlm_assert_master_handler(), if we get an incorrect assert master from a node that, we reply with EINVAL asking the asserter to die. The problem is that an assert is sent after so many hoops, it is invariably the node that thinks the asserter is wrong, is actually wrong. So instead of killing the asserter, this patch kills the assertee. This patch papers over a race that is still being addressed. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
The code was using dlm->spinlock instead of dlm->ast_lock to protect the ast_list. This patch fixes the issue. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
The dentry lock has a different format than other locks. This patch fixes ocfs2_log_dlm_error() macro to make it print the dentry lock correctly. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Sunil Mushran authored
Mainline commit d4f7e650 attempts to delay the dlm_thread from sending the drop ref message if the lockres is being migrated. The problem is that we make the dlm_thread wait for the migration to complete. This causes a deadlock as dlm_thread also participates in the lockres migration process. A better fix for the original oss bugzilla#1012 is in testing. Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Acked-by:
Joel Becker <joel.becker@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
In __ocfs2_mark_extent_written, when we meet with the situation of c_split_covers_rec, the old solution just replace the extent record and forget to access and dirty the buffer_head. This will cause a problem when the unwritten extent is in an extent block. So access and dirty it. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
- Feb 10, 2009
-
-
Jan Kara authored
If we race with commit code setting i_transaction to NULL, we could possibly dereference it. Proper locking requires the journal pointer (to access journal->j_list_lock), which we don't have. So we have to change the prototype of the function so that filesystem passes us the journal pointer. Also add a more detailed comment about why the function jbd2_journal_begin_ordered_truncate() does what it does and how it should be used. Thanks to Dan Carpenter <error27@gmail.com> for pointing to the suspitious code. Signed-off-by:
Jan Kara <jack@suse.cz> Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu> Acked-by:
Joel Becker <joel.becker@oracle.com> CC: linux-ext4@vger.kernel.org CC: ocfs2-devel@oss.oracle.com CC: mfasheh@suse.de CC: Dan Carpenter <error27@gmail.com>
-
- Feb 02, 2009
-
-
Mark Fasheh authored
We weren't reclaiming the clusters which get free'd from this function, so any user punching holes in a file would still have those bytes accounted against him/her. Add the call to vfs_dq_free_space_nodirty() to fix this. Interestingly enough, the journal credits calculation already took this into account. Signed-off-by:
Mark Fasheh <mfasheh@suse.com> Acked-by:
Jan Kara <jack@suse.cz>
-
Sunil Mushran authored
When two nodes holding PR locks on a resource concurrently attempt to upconvert the locks to EX, the master sends a BAST to one of the nodes. This message tells that node to first cancel convert the upconvert request, followed by downconvert to a NL. Only when this lock is downconverted to NL, can the master upconvert the first node's lock to EX. While the fs was doing the cancel convert, it was forgetting to wake up the dc thread after a successful cancel, leading to a deadlock. Reported-and-Tested-by:
David Teigland <teigland@redhat.com> Signed-off-by:
Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-
Tao Ma authored
In ocfs2_xattr_value_truncate, we may call b-tree codes which will extend the journal transaction. It has a potential problem that it may let the already-accessed-but-not-dirtied buffers gone. So we'd better access the bucket after we call ocfs2_xattr_value_truncate. And as for the root buffer for the xattr value, b-tree code will acess and dirty it, so we don't need to worry about it. Signed-off-by:
Tao Ma <tao.ma@oracle.com> Signed-off-by:
Mark Fasheh <mfasheh@suse.com>
-