Skip to content
Snippets Groups Projects
  1. Aug 05, 2013
    • Vince Weaver's avatar
      perf/x86: Fix intel QPI uncore event definitions · c9601247
      Vince Weaver authored
      
      John McCalpin reports that the "drs_data" and "ncb_data" QPI
      uncore events are missing the "extra bit" and always return zero
      values unless the bit is properly set.
      
      More details from him:
      
       According to the Xeon E5-2600 Product Family Uncore Performance
       Monitoring Guide, Table 2-94, about 1/2 of the QPI Link Layer events
       (including the ones that "perf" calls "drs_data" and "ncb_data") require
       that the "extra bit" be set.
      
       This was confusing for a while -- a note at the bottom of page 94 says
       that the "extra bit" is bit 16 of the control register.
       Unfortunately, Table 2-86 clearly says that bit 16 is reserved and must
       be zero.  Looking around a bit, I found that bit 21 appears to be the
       correct "extra bit", and further investigation shows that "perf" actually
       agrees with me:
      	[root@c560-003.stampede]# cat /sys/bus/event_source/devices/uncore_qpi_0/format/event
      	config:0-7,21
      
       So the command
      	# perf -e "uncore_qpi_0/event=drs_data/"
       Is the same as
      	# perf -e "uncore_qpi_0/event=0x02,umask=0x08/"
       While it should be
      	# perf -e "uncore_qpi_0/event=0x102,umask=0x08/"
      
       I confirmed that this last version gives results that agree with the
       amount of data that I expected the STREAM benchmark to move across the QPI
       link in the second (cross-chip) test of the original script.
      
      Reported-by: default avatarJohn McCalpin <mccalpin@tacc.utexas.edu>
      Signed-off-by: default avatarVince Weaver <vincent.weaver@maine.edu>
      Cc: zheng.z.yan@intel.com
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1308021037280.26119@vincent-weaver-1.um.maine.edu
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      c9601247
  2. Jul 31, 2013
  3. Jul 29, 2013
  4. Jul 24, 2013
  5. Jul 19, 2013
  6. Jul 18, 2013
  7. Jul 17, 2013
    • Kees Cook's avatar
      x86: Make sure IDT is page aligned · 4df05f36
      Kees Cook authored
      
      Since the IDT is referenced from a fixmap, make sure it is page aligned.
      Merge with 32-bit one, since it was already aligned to deal with F00F
      bug. Since bss is cleared before IDT setup, it can live there. This also
      moves the other *_idt_table variables into common locations.
      
      This avoids the risk of the IDT ever being moved in the bss and having
      the mapping be offset, resulting in calling incorrect handlers. In the
      current upstream kernel this is not a manifested bug, but heavily patched
      kernels (such as those using the PaX patch series) did encounter this bug.
      
      The tables other than idt_table technically do not need to be page
      aligned, at least not at the current time, but using a common
      declaration avoids mistakes.  On 64 bits the table is exactly one page
      long, anyway.
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Link: http://lkml.kernel.org/r/20130716183441.GA14232@www.outflux.net
      
      
      Reported-by: default avatarPaX Team <pageexec@gmail.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      4df05f36
  8. Jul 15, 2013
    • H. Peter Anvin's avatar
      x86, suspend: Handle CPUs which fail to #GP on RDMSR · 5ff560fd
      H. Peter Anvin authored
      
      There are CPUs which have errata causing RDMSR of a nonexistent MSR to
      not fault.  We would then try to WRMSR to restore the value of that
      MSR, causing a crash.  Specifically, some Pentium M variants would
      have this problem trying to save and restore the non-existent EFER,
      causing a crash on resume.
      
      Work around this by making sure we can write back the result at
      suspend time.
      
      Huge thanks to Christian Sünkenberg for finding the offending erratum
      that finally deciphered the mystery.
      
      Reported-and-tested-by: default avatarJohan Heinrich <onny@project-insanity.org>
      Debugged-by: default avatarChristian Sünkenberg <christian.suenkenberg@student.kit.edu>
      Acked-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Link: http://lkml.kernel.org/r/51DDC972.3010005@student.kit.edu
      Cc: <stable@vger.kernel.org> # v3.7+
      5ff560fd
    • Paul Gortmaker's avatar
      x86: delete __cpuinit usage from all x86 files · 148f9bb8
      Paul Gortmaker authored
      The __cpuinit type of throwaway sections might have made sense
      some time ago when RAM was more constrained, but now the savings
      do not offset the cost and complications.  For example, the fix in
      commit 5e427ec2 ("x86: Fix bit corruption at CPU resume time")
      is a good example of the nasty type of bugs that can be created
      with improper use of the various __init prefixes.
      
      After a discussion on LKML[1] it was decided that cpuinit should go
      the way of devinit and be phased out.  Once all the users are gone,
      we can then finally remove the macros themselves from linux/init.h.
      
      Note that some harmless section mismatch warnings may result, since
      notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
      are flagged as __cpuinit  -- so if we remove the __cpuinit from
      arch specific callers, we will also get section mismatch warnings.
      As an intermediate step, we intend to turn the linux/init.h cpuinit
      content into no-ops as early as possible, since that will get rid
      of these warnings.  In any case, they are temporary and harmless.
      
      This removes all the arch/x86 uses of the __cpuinit macros from
      all C files.  x86 only had the one __CPUINIT used in assembly files,
      and it wasn't paired off with a .previous or a __FINIT, so we can
      delete it directly w/o any corresponding additional change there.
      
      [1] https://lkml.org/lkml/2013/5/20/589
      
      
      
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: x86@kernel.org
      Acked-by: default avatarIngo Molnar <mingo@kernel.org>
      Acked-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Acked-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      148f9bb8
  9. Jul 12, 2013
  10. Jul 11, 2013
  11. Jul 09, 2013
  12. Jul 05, 2013
  13. Jul 04, 2013
    • Dave Hansen's avatar
      consolidate per-arch stack overflow debugging options · d1a1dc0b
      Dave Hansen authored
      Original posting:
      
      	http://lkml.kernel.org/r/20121214184202.F54094D9@kernel.stglabs.ibm.com
      
      
      
      Several architectures have similar stack debugging config options.
      They all pretty much do the same thing, some with slightly
      differing help text.
      
      This patch changes the architectures to instead enable a Kconfig
      boolean, and then use that boolean in the generic Kconfig.debug
      to present the actual menu option.  This removes a bunch of
      duplication and adds consistency across arches.
      
      Signed-off-by: default avatarDave Hansen <dave@linux.vnet.ibm.com>
      Reviewed-by: default avatarH. Peter Anvin <hpa@zytor.com>
      Reviewed-by: default avatarJames Hogan <james.hogan@imgtec.com>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com> [for tile]
      Signed-off-by: default avatarDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d1a1dc0b
    • Gleb Natapov's avatar
      KVM: VMX: mark unusable segment as nonpresent · 03617c18
      Gleb Natapov authored
      
      Some userspaces do not preserve unusable property. Since usable
      segment has to be present according to VMX spec we can use present
      property to amend userspace bug by making unusable segment always
      nonpresent. vmx_segment_access_rights() already marks nonpresent segment
      as unusable.
      
      Cc: stable@vger.kernel.org # 3.9+
      Reported-by: default avatarStefan Pietsch <stefan.pietsch@lsexperts.de>
      Tested-by: default avatarStefan Pietsch <stefan.pietsch@lsexperts.de>
      Signed-off-by: default avatarGleb Natapov <gleb@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      03617c18
    • Alexandre Bounine's avatar
      rapidio: add modular build option for the subsystem core · fdf90abc
      Alexandre Bounine authored
      
      Add a configuration option to build RapidIO subsystem core code as a
      loadable kernel module.  Currently this option is available only for
      x86-based platforms, with the additional patch for PowerPC planned to be
      provided later.
      
      This patch replaces kernel command line parameter "riohdid=" with its
      module-specific analog "rapidio.hdid=".
      
      Signed-off-by: default avatarAlexandre Bounine <alexandre.bounine@idt.com>
      Cc: Matt Porter <mporter@kernel.crashing.org>
      Cc: Li Yang <leoli@freescale.com>
      Cc: Kumar Gala <galak@kernel.crashing.org>
      Cc: Andre van Herk <andre.van.herk@Prodrive.nl>
      Cc: Micha Nelissen <micha.nelissen@Prodrive.nl>
      Cc: Stef van Os <stef.van.os@Prodrive.nl>
      Cc: Jean Delvare <jdelvare@suse.de>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      fdf90abc
    • Oleg Nesterov's avatar
      x86: kill TIF_DEBUG · 37f07655
      Oleg Nesterov authored
      
      Because it is not used.
      
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      37f07655
    • Jiang Liu's avatar
      mm/x86: prepare for removing num_physpages and simplify mem_init() · 46a84132
      Jiang Liu authored
      
      Prepare for removing num_physpages and simplify mem_init().
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      46a84132
    • Jiang Liu's avatar
      mm: concentrate modification of totalram_pages into the mm core · 0c988534
      Jiang Liu authored
      
      Concentrate code to modify totalram_pages into the mm core, so the arch
      memory initialized code doesn't need to take care of it.  With these
      changes applied, only following functions from mm core modify global
      variable totalram_pages: free_bootmem_late(), free_all_bootmem(),
      free_all_bootmem_node(), adjust_managed_page_count().
      
      With this patch applied, it will be much more easier for us to keep
      totalram_pages and zone->managed_pages in consistence.
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Acked-by: default avatarDavid Howells <dhowells@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0c988534
    • Jiang Liu's avatar
      mm: make __free_pages_bootmem() only available at boot time · 170a5a7e
      Jiang Liu authored
      
      In order to simpilify management of totalram_pages and
      zone->managed_pages, make __free_pages_bootmem() only available at boot
      time.  With this change applied, __free_pages_bootmem() will only be
      used by bootmem.c and nobootmem.c at boot time, so mark it as __init.
      Other callers of __free_pages_bootmem() have been converted to use
      free_reserved_page(), which handles totalram_pages and
      zone->managed_pages in a safer way.
      
      This patch also fix a bug in free_pagetable() for x86_64, which should
      increase zone->managed_pages instead of zone->present_pages when freeing
      reserved pages.
      
      And now we have managed_pages_count_lock to protect totalram_pages and
      zone->managed_pages, so remove the redundant ppb_lock lock in
      put_page_bootmem().  This greatly simplifies the locking rules.
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      170a5a7e
    • Jiang Liu's avatar
      mm: accurately calculate zone->managed_pages for highmem zones · 7b4b2a0d
      Jiang Liu authored
      
      Commit "mm: introduce new field 'managed_pages' to struct zone" assumes
      that all highmem pages will be freed into the buddy system by function
      mem_init().  But that's not always true, some architectures may reserve
      some highmem pages during boot.  For example PPC may allocate highmem
      pages for giagant HugeTLB pages, and several architectures have code to
      check PageReserved flag to exclude highmem pages allocated during boot
      when freeing highmem pages into the buddy system.
      
      So treat highmem pages in the same way as normal pages, that is to:
      1) reset zone->managed_pages to zero in mem_init().
      2) recalculate managed_pages when freeing pages into the buddy system.
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7b4b2a0d
    • Jiang Liu's avatar
      mm/x86: use free_reserved_area() to simplify code · c88442ec
      Jiang Liu authored
      
      Use common help function free_reserved_area() to simplify code.
      
      Signed-off-by: default avatarJiang Liu <jiang.liu@huawei.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: "Michael S. Tsirkin" <mst@redhat.com>
      Cc: <sworddragon2@aol.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c88442ec
    • Pavel Emelyanov's avatar
      mm: soft-dirty bits for user memory changes tracking · 0f8975ec
      Pavel Emelyanov authored
      
      The soft-dirty is a bit on a PTE which helps to track which pages a task
      writes to.  In order to do this tracking one should
      
        1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
        2. Wait some time.
        3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)
      
      To do this tracking, the writable bit is cleared from PTEs when the
      soft-dirty bit is.  Thus, after this, when the task tries to modify a
      page at some virtual address the #PF occurs and the kernel sets the
      soft-dirty bit on the respective PTE.
      
      Note, that although all the task's address space is marked as r/o after
      the soft-dirty bits clear, the #PF-s that occur after that are processed
      fast.  This is so, since the pages are still mapped to physical memory,
      and thus all the kernel does is finds this fact out and puts back
      writable, dirty and soft-dirty bits on the PTE.
      
      Another thing to note, is that when mremap moves PTEs they are marked
      with soft-dirty as well, since from the user perspective mremap modifies
      the virtual memory at mremap's new address.
      
      Signed-off-by: default avatarPavel Emelyanov <xemul@parallels.com>
      Cc: Matt Mackall <mpm@selenic.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0f8975ec
  14. Jul 02, 2013
    • Seiji Aguchi's avatar
      x86/tracing: Add irq_enter/exit() in smp_trace_reschedule_interrupt() · 4787c368
      Seiji Aguchi authored
      
      Reschedule vector tracepoints may be called in cpu idle state.
      This causes lockdep check warning below.
      
      The tracepoint requires rcu but for accuracy it also
      requires irq_enter() (tracepoints record the irq context), thus,
      the tracepoint interrupt handler should be calling irq_enter()
      and not rcu_irq_enter() (irq_enter() calls rcu_irq_enter()).
      
      So, add irq_enter/exit() to smp_trace_reschedule_interrupt()
      with common pre/post processing functions, smp_entering_irq()
      and exiting_irq() (exiting_irq() calls just irq_exit()
       in arch/x86/include/asm/apic.h),
      because these can be shared among reschedule, call_function,
      and call_function_single vectors.
      
      [   50.720557] Testing event reschedule_exit:
      [   50.721349]
      [   50.721502] ===============================
      [   50.721835] [ INFO: suspicious RCU usage. ]
      [   50.722169] 3.10.0-rc6-00004-gcf910e8 #190 Not tainted
      [   50.722582] -------------------------------
      [   50.722915] /c/kernel-tests/src/linux/arch/x86/include/asm/trace/irq_vectors.h:50 suspicious rcu_dereference_check() usage!
      [   50.723770]
      [   50.723770] other info that might help us debug this:
      [   50.723770]
      [   50.724385]
      [   50.724385] RCU used illegally from idle CPU!
      [   50.724385] rcu_scheduler_active = 1, debug_locks = 0
      [   50.725232] RCU used illegally from extended quiescent state!
      [   50.725690] no locks held by swapper/0/0.
      [   50.726010]
      [   50.726010] stack backtrace:
      [...]
      
      Signed-off-by: default avatarSeiji Aguchi <seiji.aguchi@hds.com>
      Reviewed-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/51CDCFA3.9080101@hds.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      4787c368
  15. Jun 29, 2013
  16. Jun 28, 2013
    • David Vrabel's avatar
      x86: xen: Sync the CMOS RTC as well as the Xen wallclock · 47433b8c
      David Vrabel authored
      
      Adjustments to Xen's persistent clock via update_persistent_clock()
      don't actually persist, as the Xen wallclock is a software only clock
      and modifications to it do not modify the underlying CMOS RTC.
      
      The x86_platform.set_wallclock hook is there to keep the hardware RTC
      synchronized. On a guest this is pointless.
      
      On Dom0 we can use the native implementaion which actually updates the
      hardware RTC, but we still need to keep the software emulation of RTC
      for the guests up to date. The subscription to the pvclock_notifier
      allows us to emulate this easily. The notifier is called at every tick
      and when the clock was set.
      
      Right now we only use that notifier when the clock was set, but due to
      the fact that it is called periodically from the timekeeping update
      code, we can utilize it to emulate the NTP driven drift compensation
      of update_persistant_clock() for the Xen wall (software) clock.
      
      Add a 11 minutes periodic update to the pvclock_gtod notifier callback
      to achieve that. The static variable 'next' which maintains that 11
      minutes update cycle is protected by the core code serialization so
      there is no need to add a Xen specific serialization mechanism.
      
      [ tglx: Massaged changelog and added a few comments ]
      
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      47433b8c
    • David Vrabel's avatar
      x86: xen: Sync the wallclock when the system time is set · 5584880e
      David Vrabel authored
      
      Currently the Xen wallclock is only updated every 11 minutes if NTP is
      synchronized to its clock source (using the sync_cmos_clock() work).
      If a guest is started before NTP is synchronized it may see an
      incorrect wallclock time.
      
      Use the pvclock_gtod notifier chain to receive a notification when the
      system time has changed and update the wallclock to match.
      
      This chain is called on every timer tick and we want to avoid an extra
      (expensive) hypercall on every tick.  Because dom0 has historically
      never provided a very accurate wallclock and guests do not expect one,
      we can do this simply: the wallclock is only updated if the clock was
      set.
      
      Signed-off-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      5584880e
    • Laszlo Ersek's avatar
      xen/time: remove blocked time accounting from xen "clockchip" · 0b0c002c
      Laszlo Ersek authored
      ... because the "clock_event_device framework" already accounts for idle
      time through the "event_handler" function pointer in
      xen_timer_interrupt().
      
      The patch is intended as the completion of [1]. It should fix the double
      idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
      stolen time accounting (the removed code seems to be isolated).
      
      The approach may be completely misguided.
      
      [1] https://lkml.org/lkml/2011/10/6/10
      [2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html
      
      
      
      John took the time to retest this patch on top of v3.10 and reported:
      "idle time is correctly incremented for pv and hvm for the normal
      case, nohz=off and nohz=idle." so lets put this patch in.
      
      CC: stable@vger.kernel.org
      Signed-off-by: default avatarLaszlo Ersek <lersek@redhat.com>
      Signed-off-by: default avatarJohn Haxby <john.haxby@oracle.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      0b0c002c
Loading