Skip to content
Snippets Groups Projects
  1. May 01, 2014
  2. Apr 04, 2014
    • Joe Thornber's avatar
      dm cache: fix a lock-inversion · 0596661f
      Joe Thornber authored
      
      When suspending a cache the policy is walked and the individual policy
      hints written to the metadata via sync_metadata().  This led to this
      lock order:
      
            policy->lock
              cache_metadata->root_lock
      
      When loading the cache target the policy is populated while the metadata
      lock is held:
      
            cache_metadata->root_lock
               policy->lock
      
      Fix this potential lock-inversion (ABBA) deadlock in sync_metadata() by
      ensuring the cache_metadata root_lock is held whilst all the hints are
      written, rather than being repeatedly locked while policy->lock is held
      (as was the case with each callout that policy_walk_mappings() made to
      the old save_hint() method).
      
      Found by turning on the CONFIG_PROVE_LOCKING ("Lock debugging: prove
      locking correctness") build option.  However, it is not clear how the
      LOCKDEP reported paths can lead to a deadlock since the two paths,
      suspending a target and loading a target, never occur at the same time.
      But that doesn't mean the same lock-inversion couldn't have occurred
      elsewhere.
      
      Reported-by: default avatarMarian Csontos <mcsontos@redhat.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      0596661f
  3. Mar 27, 2014
  4. Mar 12, 2014
    • Heinz Mauelshagen's avatar
      dm cache: fix access beyond end of origin device · e893fba9
      Heinz Mauelshagen authored
      
      In order to avoid wasting cache space a partial block at the end of the
      origin device is not cached.  Unfortunately, the check for such a
      partial block at the end of the origin device was flawed.
      
      Fix accesses beyond the end of the origin device that occured due to
      attempted promotion of an undetected partial block by:
      
      - initializing the per bio data struct to allow cache_end_io to work properly
      - recognizing access to the partial block at the end of the origin device
      - avoiding out of bounds access to the discard bitset
      
      Otherwise, users can experience errors like the following:
      
       attempt to access beyond end of device
       dm-5: rw=0, want=20971520, limit=20971456
       ...
       device-mapper: cache: promotion failed; couldn't copy block
      
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      e893fba9
    • Heinz Mauelshagen's avatar
      dm cache: fix truncation bug when copying a block to/from >2TB fast device · 8b9d9666
      Heinz Mauelshagen authored
      
      During demotion or promotion to a cache's >2TB fast device we must not
      truncate the cache block's associated sector to 32bits.  The 32bit
      temporary result of from_cblock() caused a 32bit multiplication when
      calculating the sector of the fast device in issue_copy_real().
      
      Use an intermediate 64bit type to store the 32bit from_cblock() to allow
      for proper 64bit multiplication.
      
      Here is an example of how this bug manifests on an ext4 filesystem:
      
       EXT4-fs error (device dm-0): ext4_mb_generate_buddy:756: group 17136, 32768 clusters in bitmap, 30688 in gd; block bitmap corrupt.
       JBD2: Spotted dirty metadata buffer (dev = dm-0, blocknr = 0). There's a risk of filesystem corruption in case of system crash.
      
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Cc: stable@vger.kernel.org
      8b9d9666
  5. Feb 28, 2014
  6. Feb 17, 2014
    • Mike Snitzer's avatar
      dm cache: do not add migration to completed list before unhooking bio · 80ae49aa
      Mike Snitzer authored
      
      When completing an overwrite bio, in overwrite_endio(), the associated
      migration should not be added to the 'completed_migrations' until the
      bio's fields are restored with dm_unhook_bio().
      
      Otherwise, do_worker() can race to process 'completed_migrations' before
      dm_unhook_bio() -- so the bio's bi_end_io is incorrect.  This is
      unlikely to cause any problems given the current code but should be
      fixed on the basis of correctness.
      
      Also, the cache's spinlock only needs to be held when manipulating the
      'completed_migrations' list -- other changes don't need protection.
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      80ae49aa
    • Mike Snitzer's avatar
      dm cache: move hook_info into common portion of per_bio_data structure · c6eda5e8
      Mike Snitzer authored
      
      Commit c9d28d5d ("dm cache: promotion optimisation for writes")
      incorrectly placed the 'hook_info' member in the writethrough-only
      portion of the per_bio_data structure.
      
      Given that the overwrite optimization may be used for writeback the
      'hook_info' member must be placed above the 'cache' member of the
      per_bio_data structure.  Any members above 'cache' are available from
      both writeback and writethrough modes' per_bio_data structure.
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      Cc: stable@vger.kernel.org # 3.13+
      c6eda5e8
  7. Jan 16, 2014
    • Mike Snitzer's avatar
      dm cache: add policy name to status output · 2e68c4e6
      Mike Snitzer authored
      
      The cache's policy may have been established using the "default" alias,
      which is currently the "mq" policy but the default policy may change in
      the future.  It is useful to know exactly which policy is being used.
      
      Add a 'real' member to the dm_cache_policy_type structure and have the
      "default" dm_cache_policy_type point to the real "mq"
      dm_cache_policy_type.  Update dm_cache_policy_get_name() to check if
      real is set, if so report the name of the real policy (not the alias).
      
      Requested-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2e68c4e6
  8. Jan 10, 2014
    • Mike Snitzer's avatar
      dm cache: add block sizes and total cache blocks to status output · 6a388618
      Mike Snitzer authored
      
      Improve cache_status to emit:
      <metadata block size> <#used metadata blocks>/<#total metadata blocks>
      <cache block size> <#used cache blocks>/<#total cache blocks>
      ...
      
      Adding the block sizes allows for easier calculation of the overall size
      of both the metadata and cache devices.  Adding <#total cache blocks>
      provides useful context for how much of the cache is used.
      
      Unfortunately these additions to the status will require updates to
      users' scripts that monitor the cache status.  But these changes help
      provide more comprehensive information about the cache device and will
      simplify tools that are being developed to manage dm-cache devices --
      because they won't need to issue 3 operations to cobble together the
      information that we can easily provide via a single status ioctl.
      
      While updating the status documentation in cache.txt spaces were
      tabify'd.
      
      Requested-by: default avatarJonathan Brassow <jbrassow@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Acked-by: default avatarJoe Thornber <ejt@redhat.com>
      6a388618
  9. Dec 10, 2013
  10. Dec 04, 2013
  11. Nov 24, 2013
    • Kent Overstreet's avatar
      block: Generic bio chaining · 196d38bc
      Kent Overstreet authored
      
      This adds a generic mechanism for chaining bio completions. This is
      going to be used for a bio_split() replacement, and it turns out to be
      very useful in a fair amount of driver code - a fair number of drivers
      were implementing this in their own roundabout ways, often painfully.
      
      Note that this means it's no longer to call bio_endio() more than once
      on the same bio! This can cause problems for drivers that save/restore
      bi_end_io. Arguably they shouldn't be saving/restoring bi_end_io at all
      - in all but the simplest cases they'd be better off just cloning the
      bio, and immutable biovecs is making bio cloning cheaper. But for now,
      we add a bio_endio_nodec() for these cases.
      
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      196d38bc
    • Kent Overstreet's avatar
      block: Abstract out bvec iterator · 4f024f37
      Kent Overstreet authored
      
      Immutable biovecs are going to require an explicit iterator. To
      implement immutable bvecs, a later patch is going to add a bi_bvec_done
      member to this struct; for now, this patch effectively just renames
      things.
      
      Signed-off-by: default avatarKent Overstreet <kmo@daterainc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: "Ed L. Cashin" <ecashin@coraid.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Lars Ellenberg <drbd-dev@lists.linbit.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Geoff Levand <geoff@infradead.org>
      Cc: Yehuda Sadeh <yehuda@inktank.com>
      Cc: Sage Weil <sage@inktank.com>
      Cc: Alex Elder <elder@inktank.com>
      Cc: ceph-devel@vger.kernel.org
      Cc: Joshua Morris <josh.h.morris@us.ibm.com>
      Cc: Philip Kelleher <pjk1939@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: Alasdair Kergon <agk@redhat.com>
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: dm-devel@redhat.com
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: linux390@de.ibm.com
      Cc: Boaz Harrosh <bharrosh@panasas.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Cc: "James E.J. Bottomley" <JBottomley@parallels.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Nicholas A. Bellinger" <nab@linux-iscsi.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Chris Mason <chris.mason@fusionio.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Jaegeuk Kim <jaegeuk.kim@samsung.com>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Dave Kleikamp <shaggy@kernel.org>
      Cc: Joern Engel <joern@logfs.org>
      Cc: Prasad Joshi <prasadjoshi.linux@gmail.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Ben Myers <bpm@sgi.com>
      Cc: xfs@oss.sgi.com
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Herton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Guo Chao <yan@linux.vnet.ibm.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: "Roger Pau Monné" <roger.pau@citrix.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
      Cc: Ian Campbell <Ian.Campbell@citrix.com>
      Cc: Sebastian Ott <sebott@linux.vnet.ibm.com>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Jerome Marchand <jmarchand@redhat.com>
      Cc: Joe Perches <joe@perches.com>
      Cc: Peng Tao <tao.peng@emc.com>
      Cc: Andy Adamson <andros@netapp.com>
      Cc: fanchaoting <fanchaoting@cn.fujitsu.com>
      Cc: Jie Liu <jeff.liu@oracle.com>
      Cc: Sunil Mushran <sunil.mushran@gmail.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: Namjae Jeon <namjae.jeon@samsung.com>
      Cc: Pankaj Kumar <pankaj.km@samsung.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Mel Gorman <mgorman@suse.de>6
      4f024f37
  12. Nov 12, 2013
  13. Nov 11, 2013
    • Joe Thornber's avatar
      dm cache: add cache block invalidation support · 65790ff9
      Joe Thornber authored
      
      Cache block invalidation is removing an entry from the cache without
      writing it back.  Cache blocks can be invalidated via the
      'invalidate_cblocks' message, which takes an arbitrary number of cblock
      ranges:
         invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
      
      E.g.
         dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      65790ff9
    • Joe Thornber's avatar
      dm cache: add passthrough mode · 2ee57d58
      Joe Thornber authored
      
      "Passthrough" is a dm-cache operating mode (like writethrough or
      writeback) which is intended to be used when the cache contents are not
      known to be coherent with the origin device.  It behaves as follows:
      
      * All reads are served from the origin device (all reads miss the cache)
      * All writes are forwarded to the origin device; additionally, write
        hits cause cache block invalidates
      
      This mode decouples cache coherency checks from cache device creation,
      largely to avoid having to perform coherency checks while booting.  Boot
      scripts can create cache devices in passthrough mode and put them into
      service (mount cached filesystems, for example) without having to worry
      about coherency.  Coherency that exists is maintained, although the
      cache will gradually cool as writes take place.
      
      Later, applications can perform coherency checks, the nature of which
      will depend on the type of the underlying storage.  If coherency can be
      verified, the cache device can be transitioned to writethrough or
      writeback mode while still warm; otherwise, the cache contents can be
      discarded prior to transitioning to the desired operating mode.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarHeinz Mauelshagen <heinzm@redhat.com>
      Signed-off-by: default avatarMorgan Mears <Morgan.Mears@netapp.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      2ee57d58
    • Joe Thornber's avatar
      dm cache: cache shrinking support · f494a9c6
      Joe Thornber authored
      
      Allow a cache to shrink if the blocks being removed from the cache are
      not dirty.
      
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      f494a9c6
  14. Nov 10, 2013
  15. Nov 09, 2013
  16. Aug 23, 2013
  17. Jul 11, 2013
  18. May 10, 2013
  19. Apr 05, 2013
    • Mike Snitzer's avatar
      dm cache: reduce bio front_pad size in writeback mode · 19b0092e
      Mike Snitzer authored
      
      A recent patch to fix the dm cache target's writethrough mode extended
      the bio's front_pad to include a 1056-byte struct dm_bio_details.
      Writeback mode doesn't need this, so this patch reduces the
      per_bio_data_size to 16 bytes in this case instead of 1096.
      
      The dm_bio_details structure was added in "dm cache: fix writes to
      cache device in writethrough mode" which fixed commit e2e74d61 ("dm
      cache: fix race in writethrough implementation").  In writeback mode
      we avoid allocating the writethrough-specific members of the
      per_bio_data structure (the dm_bio_details structure included).
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      19b0092e
    • Darrick J. Wong's avatar
      dm cache: fix writes to cache device in writethrough mode · b844fe69
      Darrick J. Wong authored
      
      The dm-cache writethrough strategy introduced by commit e2e74d61
      ("dm cache: fix race in writethrough implementation") issues a bio to
      the origin device, remaps and then issues the bio to the cache device.
      This more conservative in-series approach was selected to favor
      correctness over performance (of the previous parallel writethrough).
      However, this in-series implementation that reuses the same bio to write
      both the origin and cache device didn't take into account that the block
      layer's req_bio_endio() modifies a completing bio's bi_sector and
      bi_size.  So the new writethrough strategy needs to preserve these bio
      fields, and restore them before submission to the cache device,
      otherwise nothing gets written to the cache (because bi_size is 0).
      
      This patch adds a struct dm_bio_details field to struct per_bio_data,
      and uses dm_bio_record() and dm_bio_restore() to ensure the bio is
      restored before reissuing to the cache device.  Adding such a large
      structure to the per_bio_data is not ideal but we can improve this
      later, for now correctness is the important thing.
      
      This problem initially went unnoticed because the dm-cache test-suite
      uses a linear DM device for the dm-cache device's origin device.
      Writethrough worked as expected because DM submits a *clone* of the
      original bio, so the original bio which was reused for the cache was
      never touched.
      
      Signed-off-by: default avatarDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: default avatarJoe Thornber <ejt@redhat.com>
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      b844fe69
  20. Mar 20, 2013
    • Mike Snitzer's avatar
      dm cache: policy ignore hints if generated by different version · ea2dd8c1
      Mike Snitzer authored
      
      When reading the dm cache metadata from disk, ignore the policy hints
      unless they were generated by the same major version number of the same
      policy module.
      
      The hints are considered to be private data belonging to the specific
      module that generated them and there is no requirement for them to make
      sense to different versions of the policy that generated them.
      Policy modules are all required to work fine if no previous hints are
      supplied (or if existing hints are lost).
      
      Signed-off-by: default avatarMike Snitzer <snitzer@redhat.com>
      Signed-off-by: default avatarAlasdair G Kergon <agk@redhat.com>
      ea2dd8c1
Loading