Skip to content
Snippets Groups Projects
  1. Jan 30, 2014
  2. Jan 16, 2014
  3. Jan 15, 2014
  4. Jan 12, 2014
  5. Jan 09, 2014
  6. Jan 02, 2014
    • Jan Kiszka's avatar
      KVM: nVMX: Unconditionally uninit the MMU on nested vmexit · 29bf08f1
      Jan Kiszka authored
      
      Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway
      in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is
      most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if
      one guest VCPU manipulates the page of another VCPU in L2, we may be
      fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in
      nested state. That can crash the host later on if nested_ept_get_cr3 is
      invoked while L1 already left vmxon and nested.current_vmcs12 became
      NULL therefore.
      
      Cc: stable@kernel.org
      Signed-off-by: default avatarJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: default avatarMarcelo Tosatti <mtosatti@redhat.com>
      29bf08f1
  7. Dec 30, 2013
  8. Dec 19, 2013
    • Len Brown's avatar
      x86 idle: Repair large-server 50-watt idle-power regression · 40e2d7f9
      Len Brown authored
      Linux 3.10 changed the timing of how thread_info->flags is touched:
      
      	x86: Use generic idle loop
      	(7d1a9417)
      
      This caused Intel NHM-EX and WSM-EX servers to experience a large number
      of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote
      from deep C-states to shallow C-states, which caused these platforms
      to experience a significant increase in idle power.
      
      Note that this issue was already present before the commit above,
      however, it wasn't seen often enough to be noticed in power measurements.
      
      Here we extend an errata workaround from the Core2 EX "Dunnington"
      to extend to NHM-EX and WSM-EX, to prevent these immediate
      returns from MWAIT, reducing idle power on these platforms.
      
      While only acpi_idle ran on Dunnington, intel_idle
      may also run on these two newer systems.
      As of today, there are no other models that are known
      to need this tweak.
      
      Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.com
      
      
      Signed-off-by: default avatarLen Brown <len.brown@intel.com>
      Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com
      
      
      Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      40e2d7f9
    • Rik van Riel's avatar
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel authored
      
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: default avatarRik van Riel <riel@redhat.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20841405
    • Mel Gorman's avatar
      mm: numa: serialise parallel get_user_page against THP migration · 2b4847e7
      Mel Gorman authored
      
      Base pages are unmapped and flushed from cache and TLB during normal
      page migration and replaced with a migration entry that causes any
      parallel NUMA hinting fault or gup to block until migration completes.
      
      THP does not unmap pages due to a lack of support for migration entries
      at a PMD level.  This allows races with get_user_pages and
      get_user_pages_fast which commit 3f926ab9 ("mm: Close races between
      THP migration and PMD numa clearing") made worse by introducing a
      pmd_clear_flush().
      
      This patch forces get_user_page (fast and normal) on a pmd_numa page to
      go through the slow get_user_page path where it will serialise against
      THP migration and properly account for the NUMA hinting fault.  On the
      migration side the page table lock is taken for each PTE update.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2b4847e7
  9. Dec 12, 2013
    • Gleb Natapov's avatar
      KVM: x86: fix guest-initiated crash with x2apic (CVE-2013-6376) · 17d68b76
      Gleb Natapov authored
      
      A guest can cause a BUG_ON() leading to a host kernel crash.
      When the guest writes to the ICR to request an IPI, while in x2apic
      mode the following things happen, the destination is read from
      ICR2, which is a register that the guest can control.
      
      kvm_irq_delivery_to_apic_fast uses the high 16 bits of ICR2 as the
      cluster id.  A BUG_ON is triggered, which is a protection against
      accessing map->logical_map with an out-of-bounds access and manages
      to avoid that anything really unsafe occurs.
      
      The logic in the code is correct from real HW point of view. The problem
      is that KVM supports only one cluster with ID 0 in clustered mode, but
      the code that has the bug does not take this into account.
      
      Reported-by: default avatarLars Bull <larsbull@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      17d68b76
    • Andy Honig's avatar
      KVM: x86: Convert vapic synchronization to _cached functions (CVE-2013-6368) · fda4e2e8
      Andy Honig authored
      
      In kvm_lapic_sync_from_vapic and kvm_lapic_sync_to_vapic there is the
      potential to corrupt kernel memory if userspace provides an address that
      is at the end of a page.  This patches concerts those functions to use
      kvm_write_guest_cached and kvm_read_guest_cached.  It also checks the
      vapic_address specified by userspace during ioctl processing and returns
      an error to userspace if the address is not a valid GPA.
      
      This is generally not guest triggerable, because the required write is
      done by firmware that runs before the guest.  Also, it only affects AMD
      processors and oldish Intel that do not have the FlexPriority feature
      (unless you disable FlexPriority, of course; then newer processors are
      also affected).
      
      Fixes: b93463aa ('KVM: Accelerated apic support')
      
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fda4e2e8
    • Andy Honig's avatar
      KVM: x86: Fix potential divide by 0 in lapic (CVE-2013-6367) · b963a22e
      Andy Honig authored
      
      Under guest controllable circumstances apic_get_tmcct will execute a
      divide by zero and cause a crash.  If the guest cpuid support
      tsc deadline timers and performs the following sequence of requests
      the host will crash.
      - Set the mode to periodic
      - Set the TMICT to 0
      - Set the mode bits to 11 (neither periodic, nor one shot, nor tsc deadline)
      - Set the TMICT to non-zero.
      Then the lapic_timer.period will be 0, but the TMICT will not be.  If the
      guest then reads from the TMCCT then the host will perform a divide by 0.
      
      This patch ensures that if the lapic_timer.period is 0, then the division
      does not occur.
      
      Reported-by: default avatarAndrew Honig <ahonig@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAndrew Honig <ahonig@google.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      b963a22e
  10. Dec 11, 2013
  11. Dec 10, 2013
  12. Dec 05, 2013
  13. Dec 04, 2013
  14. Nov 28, 2013
  15. Nov 22, 2013
  16. Nov 20, 2013
  17. Nov 19, 2013
  18. Nov 17, 2013
    • Ramkumar Ramachandra's avatar
      um/vdso: add .gitignore for a couple of targets · b13a9bfc
      Ramkumar Ramachandra authored
      
      Cc: Richard Weinberger <richard@nod.at>
      Signed-off-by: default avatarRamkumar Ramachandra <artagnon@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      b13a9bfc
    • Ramkumar Ramachandra's avatar
      arch/um: make it work with defconfig and x86_64 · e40f04d0
      Ramkumar Ramachandra authored
      
      arch/um/defconfig only lists one default configuration, and that applies
      only to the i386 architecture.  Replace it with two minimal
      configuration files generated using `make savedefconfig`:
      
        i386_defconfig and x86_64_defconfig
      
      The build scripts now require two updates:
      
      1. um's Kconfig (arch/x86/um/Kconfig) should specify an ARCH_DEFCONFIG
         section explicitly pointing to these scripts if the required
         variables are set.  Take care to remove the DEFCONFIG_LIST section
         defined in the included file arch/um/Kconfig.common.
      
      2. um's Makefile (arch/um/Makefile) should set KBUILD_DEFCONFIG properly
         for the top-level Makefile to pick up.  Copy the logic in
         arch/x86/Makefile to properly pick the defconfig file depending on
         the actual architecture; except we're working with $SUBARCH here,
         instead of $ARCH.
      
      Now, you can do:
      
        $ ARCH=um make defconfig
        $ ARCH=um make
      
      and successfully build User-Mode Linux on an x86_64 box in default
      configuration.
      
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Jeff Dike <jdike@addtoit.com>
      Signed-off-by: default avatarRamkumar Ramachandra <artagnon@gmail.com>
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      e40f04d0
    • Richard Weinberger's avatar
      um: Rewrite show_stack() · 9d1ee8ce
      Richard Weinberger authored
      
      Currently on UML stack traces are not very reliable and both
      x86 and x86_64 have their on implementations.
      This patch unifies both and adds support to outline unreliable
      functions calls.
      
      Signed-off-by: default avatarRichard Weinberger <richard@nod.at>
      9d1ee8ce
  19. Nov 15, 2013
    • David Rientjes's avatar
      x86: Export 'boot_cpu_physical_apicid' to modules · cc08e04c
      David Rientjes authored
      
      Commit 9ebddac7 "ACPI, x86: Fix extended error log driver to depend on
      CONFIG_X86_LOCAL_APIC" fixed a build error when CONFIG_X86_LOCAL_APIC was not
      selected and !CONFIG_SMP.
      
      However, since CONFIG_ACPI_EXTLOG is tristate, there is a second build error:
      
        ERROR: "boot_cpu_physical_apicid" [drivers/acpi/acpi_extlog.ko] undefined!
      
      The symbol needs to be exported for it to be available.
      
      Signed-off-by: default avatarDavid Rientjes <rientjes@google.com>
      Acked-by: default avatarTony Luck <tony.luck@intel.com>
      Cc: Chen Gong <gong.chen@linux.intel.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1311141504080.30112@chino.kir.corp.google.com
      
      
      [ Changed it to a _GPL() export. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cc08e04c
    • Christoph Hellwig's avatar
      kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS · 0a06ff06
      Christoph Hellwig authored
      
      We've switched over every architecture that supports SMP to it, so
      remove the new useless config variable.
      
      Signed-off-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0a06ff06
    • Kirill A. Shutemov's avatar
      mm: dynamically allocate page->ptl if it cannot be embedded to struct page · 49076ec2
      Kirill A. Shutemov authored
      
      If split page table lock is in use, we embed the lock into struct page
      of table's page.  We have to disable split lock, if spinlock_t is too
      big be to be embedded, like when DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC
      enabled.
      
      This patch add support for dynamic allocation of split page table lock
      if we can't embed it to struct page.
      
      page->ptl is unsigned long now and we use it as spinlock_t if
      sizeof(spinlock_t) <= sizeof(long), otherwise it's pointer to spinlock_t.
      
      The spinlock_t allocated in pgtable_page_ctor() for PTE table and in
      pgtable_pmd_page_ctor() for PMD table.  All other helpers converted to
      support dynamically allocated page->ptl.
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      49076ec2
    • Kirill A. Shutemov's avatar
      x86: handle pgtable_page_ctor() fail · cecbd1b5
      Kirill A. Shutemov authored
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      cecbd1b5
    • Kirill A. Shutemov's avatar
      x86: add missed pgtable_pmd_page_ctor/dtor calls for preallocated pmds · 09ef4939
      Kirill A. Shutemov authored
      
      In split page table lock case, we embed spinlock_t into struct page.
      For obvious reason, we don't want to increase size of struct page if
      spinlock_t is too big, like with DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC or
      on -rt kernel.  So we disable split page table lock, if spinlock_t is
      too big.
      
      This patchset allows to allocate the lock dynamically if spinlock_t is
      big.  In this page->ptl is used to store pointer to spinlock instead of
      spinlock itself.  It costs additional cache line for indirect access,
      but fix page fault scalability for multi-threaded applications.
      
      LOCK_STAT depends on DEBUG_SPINLOCK, so on current kernel enabling
      LOCK_STAT to analyse scalability issues breaks scalability.  ;)
      
      The patchset mostly fixes this.  Results for ./thp_memscale -c 80 -b 512M
      on 4-socket machine:
      
      baseline, no CONFIG_LOCK_STAT:	9.115460703 seconds time elapsed
      baseline, CONFIG_LOCK_STAT=y:	53.890567123 seconds time elapsed
      patched, no CONFIG_LOCK_STAT:	8.852250368 seconds time elapsed
      patched, CONFIG_LOCK_STAT=y:	11.069770759 seconds time elapsed
      
      Patch count is scary, but most of them trivial. Overview:
      
       Patches 1-4	Few bug fixes. No dependencies to other patches.
      		Probably should applied as soon as possible.
      
       Patch 5	Changes signature of pgtable_page_ctor(). We will use it
      		for dynamic lock allocation, so it can fail.
      
       Patches 6-8	Add missing constructor/destructor calls on few archs.
      		It's fixes NR_PAGETABLE accounting and prepare to use
      		split ptl.
      
       Patches 9-33	Add pgtable_page_ctor() fail handling to all archs.
      
       Patches 34	Finally adds support of dynamically-allocated page->pte.
      		Also contains documentation for split page table lock.
      
      This patch (of 34):
      
      I've missed that we preallocate few pmds on pgd_alloc() if X86_PAE
      enabled.  Let's add missed constructor/destructor calls.
      
      I haven't noticed it during testing since prep_new_page() clears
      page->mapping and therefore page->ptl.  It's effectively equal to
      spin_lock_init(&page->ptl).
      
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Richard Kuo <rkuo@codeaurora.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      09ef4939
    • Kirill A. Shutemov's avatar
      x86, mm: enable split page table lock for PMD level · 9491846f
      Kirill A. Shutemov authored
      
      Enable PMD split page table lock for X86_64 and PAE.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: default avatarKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: default avatarAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9491846f
Loading