Skip to content
Snippets Groups Projects
  1. Nov 07, 2013
  2. Oct 22, 2013
  3. Oct 18, 2013
    • Christoffer Dall's avatar
      KVM: ARM: Transparent huge page (THP) support · 9b5fdb97
      Christoffer Dall authored
      
      Support transparent huge pages in KVM/ARM and KVM/ARM64.  The
      transparent_hugepage_adjust is not very pretty, but this is also how
      it's solved on x86 and seems to be simply an artifact on how THPs
      behave.  This should eventually be shared across architectures if
      possible, but that can always be changed down the road.
      
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      9b5fdb97
    • Christoffer Dall's avatar
      KVM: ARM: Support hugetlbfs backed huge pages · ad361f09
      Christoffer Dall authored
      
      Support huge pages in KVM/ARM and KVM/ARM64.  The pud_huge checking on
      the unmap path may feel a bit silly as the pud_huge check is always
      defined to false, but the compiler should be smart about this.
      
      Note: This deals only with VMAs marked as huge which are allocated by
      users through hugetlbfs only.  Transparent huge pages can only be
      detected by looking at the underlying pages (or the page tables
      themselves) and this patch so far simply maps these on a page-by-page
      level in the Stage-2 page tables.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Acked-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      ad361f09
    • Christoffer Dall's avatar
      KVM: ARM: Update comments for kvm_handle_wfi · 86ed81aa
      Christoffer Dall authored
      
      Update comments to reflect what is really going on and add the TWE bit
      to the comments in kvm_arm.h.
      
      Also renames the function to kvm_handle_wfx like is done on arm64 for
      consistency and uber-correctness.
      
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      86ed81aa
    • Marc Zyngier's avatar
      ARM: KVM: Yield CPU when vcpu executes a WFE · 58d5ec8f
      Marc Zyngier authored
      
      On an (even slightly) oversubscribed system, spinlocks are quickly
      becoming a bottleneck, as some vcpus are spinning, waiting for a
      lock to be released, while the vcpu holding the lock may not be
      running at all.
      
      This creates contention, and the observed slowdown is 40x for
      hackbench. No, this isn't a typo.
      
      The solution is to trap blocking WFEs and tell KVM that we're
      now spinning. This ensures that other vpus will get a scheduling
      boost, allowing the lock to be released more quickly. Also, using
      CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
      when the VM is severely overcommited.
      
      Quick test to estimate the performance: hackbench 1 process 1000
      
      2xA15 host (baseline):	1.843s
      
      2xA15 guest w/o patch:	2.083s
      4xA15 guest w/o patch:	80.212s
      8xA15 guest w/o patch:	Could not be bothered to find out
      
      2xA15 guest w/ patch:	2.102s
      4xA15 guest w/ patch:	3.205s
      8xA15 guest w/ patch:	6.887s
      
      So we go from a 40x degradation to 1.5x in the 2x overcommit case,
      which is vaguely more acceptable.
      
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      58d5ec8f
  4. Oct 17, 2013
  5. Oct 16, 2013
    • Christoffer Dall's avatar
      KVM: ARM: Update comments for kvm_handle_wfi · 82ea046c
      Christoffer Dall authored
      
      Update comments to reflect what is really going on and add the TWE bit
      to the comments in kvm_arm.h.
      
      Also renames the function to kvm_handle_wfx like is done on arm64 for
      consistency and uber-correctness.
      
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      82ea046c
    • Marc Zyngier's avatar
      ARM: KVM: Yield CPU when vcpu executes a WFE · 1f558098
      Marc Zyngier authored
      
      On an (even slightly) oversubscribed system, spinlocks are quickly
      becoming a bottleneck, as some vcpus are spinning, waiting for a
      lock to be released, while the vcpu holding the lock may not be
      running at all.
      
      This creates contention, and the observed slowdown is 40x for
      hackbench. No, this isn't a typo.
      
      The solution is to trap blocking WFEs and tell KVM that we're
      now spinning. This ensures that other vpus will get a scheduling
      boost, allowing the lock to be released more quickly. Also, using
      CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT slightly improves the performance
      when the VM is severely overcommited.
      
      Quick test to estimate the performance: hackbench 1 process 1000
      
      2xA15 host (baseline):	1.843s
      
      2xA15 guest w/o patch:	2.083s
      4xA15 guest w/o patch:	80.212s
      8xA15 guest w/o patch:	Could not be bothered to find out
      
      2xA15 guest w/ patch:	2.102s
      4xA15 guest w/ patch:	3.205s
      8xA15 guest w/ patch:	6.887s
      
      So we go from a 40x degradation to 1.5x in the 2x overcommit case,
      which is vaguely more acceptable.
      
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      1f558098
  6. Oct 13, 2013
  7. Oct 02, 2013
  8. Sep 24, 2013
  9. Aug 31, 2013
  10. Aug 13, 2013
  11. Aug 12, 2013
  12. Aug 08, 2013
    • Marc Zyngier's avatar
      arm64: KVM: fix 2-level page tables unmapping · 979acd5e
      Marc Zyngier authored
      
      When using 64kB pages, we only have two levels of page tables,
      meaning that PGD, PUD and PMD are fused. In this case, trying
      to refcount PUDs and PMDs independently is a a complete disaster,
      as they are the same.
      
      We manage to get it right for the allocation (stage2_set_pte uses
      {pmd,pud}_none), but the unmapping path clears both pud and pmd
      refcounts, which fails spectacularly with 2-level page tables.
      
      The fix is to avoid calling clear_pud_entry when both the pmd and
      pud pages are empty. For this, and instead of introducing another
      pud_empty function, consolidate both pte_empty and pmd_empty into
      page_empty (the code is actually identical) and use that to also
      test the validity of the pud.
      
      Signed-off-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      979acd5e
    • Christoffer Dall's avatar
      ARM: KVM: Fix unaligned unmap_range leak · d3840b26
      Christoffer Dall authored
      
      The unmap_range function did not properly cover the case when the start
      address was not aligned to PMD_SIZE or PUD_SIZE and an entire pte table
      or pmd table was cleared, causing us to leak memory when incrementing
      the addr.
      
      The fix is to always move onto the next page table entry boundary
      instead of adding the full size of the VA range covered by the
      corresponding table level entry.
      
      Acked-by: default avatarMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      d3840b26
  13. Aug 06, 2013
    • Christoffer Dall's avatar
      ARM: KVM: Fix 64-bit coprocessor handling · 240e99cb
      Christoffer Dall authored
      
      The PAR was exported as CRn == 7 and CRm == 0, but in fact the primary
      coprocessor register number was determined by CRm for 64-bit coprocessor
      registers as the user space API was modeled after the coprocessor
      access instructions (see the ARM ARM rev. C - B3-1445).
      
      However, just changing the CRn to CRm breaks the sorting check when
      booting the kernel, because the internal kernel logic always treats CRn
      as the primary register number, and it makes the table sorting
      impossible to understand for humans.
      
      Alternatively we could change the logic to always have CRn == CRm, but
      that becomes unclear in the number of ways we do look up of a coprocessor
      register.  We could also have a separate 64-bit table but that feels
      somewhat over-engineered.  Instead, keep CRn the primary representation
      of the primary coproc. register number in-kernel and always export the
      primary number as CRm as per the existing user space ABI.
      
      Note: The TTBR registers just magically worked because they happened to
      follow the CRn(0) regs and were considered CRn(0) in the in-kernel
      representation.
      
      Signed-off-by: default avatarChristoffer Dall <christoffer.dall@linaro.org>
      240e99cb
  14. Jul 18, 2013
  15. Jun 26, 2013
  16. Jun 12, 2013
Loading