- Feb 14, 2014
-
-
Matt Fleming authored
Madper reported seeing the following crash, BUG: unable to handle kernel paging request at ffffffffff340003 IP: [<ffffffff81d85ba4>] efi_bgrt_init+0x9d/0x133 Call Trace: [<ffffffff81d8525d>] efi_late_init+0x9/0xb [<ffffffff81d68f59>] start_kernel+0x436/0x450 [<ffffffff81d6892c>] ? repair_env_string+0x5c/0x5c [<ffffffff81d68120>] ? early_idt_handlers+0x120/0x120 [<ffffffff81d685de>] x86_64_start_reservations+0x2a/0x2c [<ffffffff81d6871e>] x86_64_start_kernel+0x13e/0x14d This is caused because the layout of the ACPI BGRT header on this system doesn't match the definition from the ACPI spec, and so we get a bogus physical address when dereferencing ->image_address in efi_bgrt_init(). Luckily the status field in the BGRT header clearly marks it as invalid, so we can check that field and skip BGRT initialisation. Reported-by:
Madper Xie <cxie@redhat.com> Suggested-by:
Toshi Kani <toshi.kani@hp.com> Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Josh Triplett <josh@joshtriplett.org> Signed-off-by:
Matt Fleming <matt.fleming@intel.com>
-
Borislav Petkov authored
We do not enable the new efi memmap on 32-bit and thus we need to run runtime_code_page_mkexec() unconditionally there. Fix that. Reported-and-tested-by:
Lejun Zhu <lejun.zhu@intel.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Signed-off-by:
Matt Fleming <matt.fleming@intel.com>
-
- Feb 06, 2014
-
-
Tang Chen authored
The following path will cause array out of bound. memblock_add_region() will always set nid in memblock.reserved to MAX_NUMNODES. In numa_register_memblks(), after we set all nid to correct valus in memblock.reserved, we called setup_node_data(), and used memblock_alloc_nid() to allocate memory, with nid set to MAX_NUMNODES. The nodemask_t type can be seen as a bit array. And the index is 0 ~ MAX_NUMNODES-1. After that, when we call node_set() in numa_clear_kernel_node_hotplug(), the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~ MAX_NUMNODES-1]. See below: numa_init() |---> numa_register_memblks() | |---> memblock_set_node(memory) set correct nid in memblock.memory | |---> memblock_set_node(reserved) set correct nid in memblock.reserved | |...... | |---> setup_node_data() | |---> memblock_alloc_nid() here, nid is set to MAX_NUMNODES (1024) |...... |---> numa_clear_kernel_node_hotplug() |---> node_set() here, we have an index 1024, and overflowed This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix this problem. Reported-by:
Dave Jones <davej@redhat.com> Signed-off-by:
Tang Chen <tangchen@cn.fujitsu.com> Tested-by:
Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by:
Dave Jones <davej@redhat.com> Cc: David Rientjes <rientjes@google.com> Tested-by:
Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Tang Chen authored
On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug() was not initialized. So we need to initialize it. [akpm@linux-foundation.org: use NODE_MASK_NONE, per David] Signed-off-by:
Tang Chen <tangchen@cn.fujitsu.com> Tested-by:
Gu Zheng <guz.fnst@cn.fujitsu.com> Reported-by:
Dave Jones <davej@redhat.com> Reported-by:
David Rientjes <rientjes@google.com> Tested-by:
Dave Jones <davej@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Borislav Petkov authored
For additional coverage, BorisO and friends unknowlingly did swap AMD microcode with Intel microcode blobs in order to see what happens. What did happen on 32-bit was [ 5.722656] BUG: unable to handle kernel paging request at be3a6008 [ 5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0 [ 5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000 because there was a valid initrd there but without valid microcode in it and the container check happened *after* the relocated ramdisk handling on 32-bit, which was clearly wrong. While at it, take care of the ramdisk relocation on both 32- and 64-bit as it is done on both. Also, comment what we're doing because this code is a bit tricky. Reported-and-tested-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by:
Borislav Petkov <bp@suse.de> Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Matt Fleming authored
CONFIG_X86_32 doesn't map the boot services regions into the EFI memory map (see commit 70087011 ("x86, efi: Don't map Boot Services on i386")), and so efi_lookup_mapped_addr() will fail to return a valid address. Executing the ioremap() path in efi_bgrt_init() causes the following warning on x86-32 because we're trying to ioremap() RAM, WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0() Modules linked in: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1 Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013 00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310 00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff 00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88 Call Trace: [<c09a5196>] dump_stack+0x41/0x52 [<c0448c1e>] warn_slowpath_common+0x7e/0xa0 [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0 [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0 [<c0448ce2>] warn_slowpath_null+0x22/0x30 [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0 [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43 [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5 [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0 [<c043bc2b>] ioremap_nocache+0x1b/0x20 [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c [<c0cb01c8>] efi_bgrt_init+0x83/0x10c [<c0cafd82>] efi_late_init+0x8/0xa [<c0c9bab2>] start_kernel+0x3ae/0x3c3 [<c0c9b53b>] ? repair_env_string+0x51/0x51 [<c0c9b378>] i386_start_kernel+0x12e/0x131 Switch to using early_memremap(), which won't trigger this warning, and has the added benefit of more accurately conveying what we're trying to do - map a chunk of memory. This patch addresses the following bug report, https://bugzilla.kernel.org/show_bug.cgi?id=67911 Reported-by:
Adam Williamson <awilliam@redhat.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Signed-off-by:
Matt Fleming <matt.fleming@intel.com>
-
- Feb 05, 2014
-
-
Ingo Molnar authored
It can take some time to validate the image, make sure {allyes|allmod}config doesn't enable it. I'd say randconfig will cover it often enough, and the failure is also borderline build coverage related: you cannot really make the decoder test fail via source level changes, only with changes in the build environment, so I agree with Andi that we can disable this one too. Signed-off-by:
Ingo Molnar <mingo@kernel.org> Acked-by:
Paul Gortmaker <paul.gortmaker@windriver.com> Suggested-and-acked-by:
Andi Kleen <andi@firstfloor.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Feb 03, 2014
-
-
Mukesh Rathor authored
During bootup in the 'probe_page_size_mask' these CR4 flags are set in there. But for AP processors they are not set as we do not use 'secondary_startup_64' which the baremetal kernels uses. Instead do it in this function which we use in Xen PVH during our startup for AP processors. As such fix it up to make sure we have that flag set. Signed-off-by:
Mukesh Rathor <mukesh.rathor@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
Konrad Rzeszutek Wilk authored
This reverts commit 08ece5bb. As it breaks ARM builds and needs more attention on the ARM side. Acked-by:
David Vrabel <david.vrabel@citrix.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
- Feb 02, 2014
-
-
Petr Tesarik authored
With DISCONTIGMEM, the mapping between a pfn and its owning node is initialized using data provided by the BIOS. However, the initialization may fail if the extents are not aligned to section boundary (64M). The symptom of this bug is an early boot failure in pfn_to_page(), as it tries to access NODE_DATA(__nid) using index from an unitialized element of the physnode_map[] array. While the bug is always present, it is more likely to be hit in kdump kernels on large machines, because: 1. The memory map for a kdump kernel is specified as exactmap, and exactmap is more likely to be unaligned. 2. Large reservations are more likely to span across a 64M boundary. [ hpa: fixed incorrect use of "pfn" instead of "start" ] Signed-off-by:
Petr Tesarik <ptesarik@suse.cz> Link: http://lkml.kernel.org/r/20140201133019.32e56f86@hananiah.suse.cz Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
- Jan 31, 2014
-
-
Dave Jones authored
Passing a freed 'pages' to free_xenballooned_pages will end badly on kernels with slub debug enabled. This looks out of place between the rc assign and the check, but we do want to kfree pages regardless of which path we take. Signed-off-by:
Dave Jones <davej@fedoraproject.org> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
Zoltan Kiss authored
The grant mapping API does m2p_override unnecessarily: only gntdev needs it, for blkback and future netback patches it just cause a lock contention, as those pages never go to userspace. Therefore this series does the following: - the original functions were renamed to __gnttab_[un]map_refs, with a new parameter m2p_override - based on m2p_override either they follow the original behaviour, or just set the private flag and call set_phys_to_machine - gnttab_[un]map_refs are now a wrapper to call __gnttab_[un]map_refs with m2p_override false - a new function gnttab_[un]map_refs_userspace provides the old behaviour It also removes a stray space from page.h and change ret to 0 if XENFEAT_auto_translated_physmap, as that is the only possible return value there. v2: - move the storing of the old mfn in page->index to gnttab_map_refs - move the function header update to a separate patch v3: - a new approach to retain old behaviour where it needed - squash the patches into one v4: - move out the common bits from m2p* functions, and pass pfn/mfn as parameter - clear page->private before doing anything with the page, so m2p_find_override won't race with this v5: - change return value handling in __gnttab_[un]map_refs - remove a stray space in page.h - add detail why ret = 0 now at some places v6: - don't pass pfn to m2p* functions, just get it locally Signed-off-by:
Zoltan Kiss <zoltan.kiss@citrix.com> Suggested-by:
David Vrabel <david.vrabel@citrix.com> Acked-by:
David Vrabel <david.vrabel@citrix.com> Acked-by:
Stefano Stabellini <stefano.stabellini@eu.citrix.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
-
Andrey Vagin authored
The SOFT_DIRTY bit shows that the content of memory was changed after a defined point in the past. mprotect() doesn't change the content of memory, so it must not change the SOFT_DIRTY bit. This bug causes a malfunction: on the first iteration all pages are dumped. On other iterations only pages with the SOFT_DIRTY bit are dumped. So if the SOFT_DIRTY bit is cleared from a page by mistake, the page is not dumped and its content will be restored incorrectly. This patch does nothing with _PAGE_SWP_SOFT_DIRTY, becase pte_modify() is called only for present pages. Fixes commit 0f8975ec ("mm: soft-dirty bits for user memory changes tracking"). Signed-off-by:
Andrey Vagin <avagin@openvz.org> Acked-by:
Cyrill Gorcunov <gorcunov@openvz.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Borislav Petkov <bp@suse.de> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Prarit Bhargava authored
Further discussion here: http://marc.info/?l=linux-kernel&m=139073901101034&w=2 kbuild, 0day kernel build service, outputs the warning: arch/x86/kernel/irq.c:333:1: warning: the frame size of 2056 bytes is larger than 2048 bytes [-Wframe-larger-than=] because check_irq_vectors_for_cpu_disable() allocates two cpumasks on the stack. Fix this by moving the two cpumasks to a global file context. Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Tested-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Prarit Bhargava <prarit@redhat.com> Link: http://lkml.kernel.org/r/1390915331-27375-1-git-send-email-prarit@redhat.com Cc: Andi Kleen <ak@linux.intel.com> Cc: Michel Lespinasse <walken@google.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Yang Zhang <yang.z.zhang@Intel.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Janet Morgan <janet.morgan@intel.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Ruiv Wang <ruiv.wang@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- Jan 30, 2014
-
-
David Woodhouse authored
Both clang 3.5 and GCC 4.9 will support this (as of r199754 and r207196 respectively). Both have been tested to produce booting kernels when the 16-bit code is built with -m16. (Modulo LLVM PR3997, at least.) [ hpa: folded test for -m16 into M16_CFLAGS ] Signed-off-by:
David Woodhouse <David.Woodhouse@intel.com> Link: http://lkml.kernel.org/r/1390997807.20153.133.camel@i7.infradead.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
David Woodhouse authored
Commit dd78b973 ("x86, boot: Move CPU flags out of cpucheck") introduced ambiguous inline asm in the has_eflag() function. In 16-bit mode want the instruction to be 'pushfl', but we just say 'pushf' and hope the compiler does what we wanted. When building with 'clang -m16', it won't, because clang doesn't use the horrid '.code16gcc' hack that even 'gcc -m16' uses internally. Say what we mean and don't make the compiler make assumptions. [ hpa: ideally we would be able to use the gcc %zN construct here, but that is broken for 64-bit integers in gcc < 4.5. The code with plain "pushf/popf" is fine for 32- or 64-bit mode, but not for 16-bit mode; in 16-bit mode those are 16-bit instructions in .code16 mode, and 32-bit instructions in .code16gcc mode. ] Signed-off-by:
David Woodhouse <David.Woodhouse@intel.com> Link: http://lkml.kernel.org/r/1391079628.26079.82.camel@shinybook.infradead.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Andi Kleen authored
LTO requires consistent types of symbols over all files. So "nmi" cannot be declared as a char [] here, need to use the correct function type. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-8-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Andi Kleen authored
These functions are called from inline assembler stubs, thus need to be global and visible. Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Gleb Natapov <gleb@kernel.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-7-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Andi Kleen authored
LTO in gcc 4.6/47. has trouble with global register variables. They were used to read the stack pointer. Use a simple inline assembler statement with a mov instead. This also helps LLVM/clang, which does not support global register variables. [ hpa: Ideally this should become a builtin in both gcc and clang. ] v2: More general asm constraint. Fix description (Jan Beulich) Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-6-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Andi Kleen authored
The paravirt thunks use a hack of using a static reference to a static function to reference that function from the top level statement. This assumes that gcc always generates static function names in a specific format, which is not necessarily true. Simply make these functions global and asmlinkage or __visible. This way the static __used variables are not needed and everything works. Functions with arguments are __visible to keep the register calling convention on 32bit. Changed in paravirt and in all users (Xen and vsmp) v2: Use __visible for functions with arguments Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Ido Yariv <ido@wizery.com> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-5-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Andi Kleen authored
The paravirt patching code assumes that it can reference a local assembler label between two different top level assembler statements. This does not work with LTO where the assembler code may end up in different assembler files. Replace it with extern / global /asm linkage labels. This also removes one redundant copy of the macro. Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-4-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Andi Kleen authored
- Make the C code used by the paravirt stubs visible - Since they have to be global now, give them a more unique name. Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1382458079-24450-3-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- Jan 29, 2014
-
-
Paolo Bonzini authored
When Hyper-V hypervisor leaves are present, KVM must relocate its own leaves at 0x40000100, because Windows does not look for Hyper-V leaves at indices other than 0x40000000. In this case, the KVM features are at 0x40000101, but the old code would always look at 0x40000001. Fix by using kvm_cpuid_base(). This also requires making the function non-inline, since kvm_cpuid_base() is static. Fixes: 1085ba7f Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
It is unnecessary to go through hypervisor_cpuid_base every time a leaf is found (which will be every time a feature is requested after the next patch). Fixes: 1085ba7f Cc: stable@vger.kernel.org Cc: mtosatti@redhat.com Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Paolo Bonzini authored
Self explanatory. Reported-by:
Radim Krcmar <rkrcmar@redhat.com> Cc: Vadim Rozenfeld <vrozenfe@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
David Cohen authored
platform_ipc.h and platform_msic.h are wrongly declaring functions as external and with 'weak' attribute. This patch does a cleanup on those header files. It should have no functional change. Signed-off-by:
David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1390950567-12821-1-git-send-email-david.a.cohen@linux.intel.com Cc: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- Jan 28, 2014
-
-
Yinghai Lu authored
Dave reported big numa system booting is broken. It turns out that commit 5b6e5295 ("x86: memblock: set current limit to max low memory address") sets the limit to low wrongly. max_low_pfn_mapped is different from max_pfn_mapped. max_low_pfn_mapped is always under 4G. That will memblock_alloc_nid all go under 4G. Revert 5b6e5295 to fix a no-boot regression which was triggered by 457ff1de ("lib/swiotlb.c: use memblock apis for early memory allocations"). Signed-off-by:
Yinghai Lu <yinghai@kernel.org> Reported-by:
Dave Hansen <dave.hansen@intel.com> Acked-by:
Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 27, 2014
-
-
Josh Triplett authored
The kernel already has this information, so other bits of kernel code shouldn't duplicate that. This also eliminates the use of __DATE__, which makes the build non-deterministic. This message was already disabled at build time, with PRINT_MESSAGES undef'd at the top of the file. Signed-off-by:
Josh Triplett <josh@joshtriplett.org> Signed-off-by:
Michal Marek <mmarek@suse.cz>
-
Jan Kiszka authored
Check for invalid state transitions on guest-initiated updates of MSR_IA32_APICBASE. This address both enabling of the x2APIC when it is not supported and all invalid transitions as described in SDM section 10.12.5. It also checks that no reserved bit is set in APICBASE by the guest. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> [Use cpuid_maxphyaddr instead of guest_cpuid_get_phys_bits. - Paolo] Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
- Jan 25, 2014
-
-
David Cohen authored
This patch fixes the following warning: warning: (X86_INTEL_MID) selects INTEL_SCU_IPC which has unmet direct dependencies (X86 && X86_PLATFORM_DEVICES && X86_INTEL_MID) It happens because when selected, X86_INTEL_MID tries to select INTEL_SCU_IPC regardless all its dependencies are met or not. This patch fixes it by adding the missing X86_PLATFORM_DEVICES dependency to X86_INTEL_MID. Reported-by:
kbuild test robot <fengguang.wu@intel.com> Signed-off-by:
David Cohen <david.a.cohen@linux.intel.com> Link: http://lkml.kernel.org/r/1390329699-20782-1-git-send-email-david.a.cohen@linux.intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Toshi Kani authored
When ACPI SLIT table has an I/O locality (i.e. a locality unique to an I/O device), numa_set_distance() emits this warning message: NUMA: Warning: node ids are out of bound, from=-1 to=-1 distance=10 acpi_numa_slit_init() calls numa_set_distance() with pxm_to_node(), which assumes that all localities have been parsed with SRAT previously. SRAT does not list I/O localities, where as SLIT lists all localities including I/Os. Hence, pxm_to_node() returns NUMA_NO_NODE (-1) for an I/O locality. I/O localities are not supported and are ignored today, but emitting such warning message leads to unnecessary confusion. Change acpi_numa_slit_init() to avoid calling numa_set_distance() with NUMA_NO_NODE. Signed-off-by:
Toshi Kani <toshi.kani@hp.com> Acked-by:
David Rientjes <rientjes@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/n/tip-dSvpjjvp8aMzs1ybkftxohlh@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mel Gorman authored
There was a large ebizzy performance regression that was bisected to commit 611ae8e3 (x86/tlb: enable tlb flush range support for x86). The problem was related to the tlb_flushall_shift tuning for IvyBridge which was altered. The problem is that it is not clear if the tuning values for each CPU family is correct as the methodology used to tune the values is unclear. This patch uses a conservative tlb_flushall_shift value for all CPU families except IvyBridge so the decision can be revisited if any regression is found as a result of this change. IvyBridge is an exception as testing with one methodology determined that the value of 2 is acceptable. Details are in the changelog for the patch "x86: mm: Change tlb_flushall_shift for IvyBridge". One important aspect of this to watch out for is Xen. The original commit log mentioned large performance gains on Xen. It's possible Xen is more sensitive to this value if it flushes small ranges of pages more frequently than workloads on bare metal typically do. Signed-off-by:
Mel Gorman <mgorman@suse.de> Tested-by:
Davidlohr Bueso <davidlohr@hp.com> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-dyzMww3fqugnhbhgo6Gxmtkw@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mel Gorman authored
There was a large performance regression that was bisected to commit 611ae8e3 ("x86/tlb: enable tlb flush range support for x86"). This patch simply changes the default balance point between a local and global flush for IvyBridge. In the interest of allowing the tests to be reproduced, this patch was tested using mmtests 0.15 with the following configurations configs/config-global-dhp__tlbflush-performance configs/config-global-dhp__scheduler-performance configs/config-global-dhp__network-performance Results are from two machines Ivybridge 4 threads: Intel(R) Core(TM) i3-3240 CPU @ 3.40GHz Ivybridge 8 threads: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Page fault microbenchmark showed nothing interesting. Ebizzy was configured to run multiple iterations and threads. Thread counts ranged from 1 to NR_CPUS*2. For each thread count, it ran 100 iterations and each iteration lasted 10 seconds. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 6395.44 ( 0.00%) 6789.09 ( 6.16%) Mean 2 7012.85 ( 0.00%) 8052.16 ( 14.82%) Mean 3 6403.04 ( 0.00%) 6973.74 ( 8.91%) Mean 4 6135.32 ( 0.00%) 6582.33 ( 7.29%) Mean 5 6095.69 ( 0.00%) 6526.68 ( 7.07%) Mean 6 6114.33 ( 0.00%) 6416.64 ( 4.94%) Mean 7 6085.10 ( 0.00%) 6448.51 ( 5.97%) Mean 8 6120.62 ( 0.00%) 6462.97 ( 5.59%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 7336.65 ( 0.00%) 7787.02 ( 6.14%) Mean 2 8218.41 ( 0.00%) 9484.13 ( 15.40%) Mean 3 7973.62 ( 0.00%) 8922.01 ( 11.89%) Mean 4 7798.33 ( 0.00%) 8567.03 ( 9.86%) Mean 5 7158.72 ( 0.00%) 8214.23 ( 14.74%) Mean 6 6852.27 ( 0.00%) 7952.45 ( 16.06%) Mean 7 6774.65 ( 0.00%) 7536.35 ( 11.24%) Mean 8 6510.50 ( 0.00%) 6894.05 ( 5.89%) Mean 12 6182.90 ( 0.00%) 6661.29 ( 7.74%) Mean 16 6100.09 ( 0.00%) 6608.69 ( 8.34%) Ebizzy hits the worst case scenario for TLB range flushing every time and it shows for these Ivybridge CPUs at least that the default choice is a poor on. The patch addresses the problem. Next was a tlbflush microbenchmark written by Alex Shi at http://marc.info/?l=linux-kernel&m=133727348217113 . It measures access costs while the TLB is being flushed. The expectation is that if there are always full TLB flushes that the benchmark would suffer and it benefits from range flushing There are 320 iterations of the test per thread count. The number of entries is randomly selected with a min of 1 and max of 512. To ensure a reasonably even spread of entries, the full range is broken up into 8 sections and a random number selected within that section. iteration 1, random number between 0-64 iteration 2, random number between 64-128 etc This is still a very weak methodology. When you do not know what are typical ranges, random is a reasonable choice but it can be easily argued that the opimisation was for smaller ranges and an even spread is not representative of any workload that matters. To improve this, we'd need to know the probability distribution of TLB flush range sizes for a set of workloads that are considered "common", build a synthetic trace and feed that into this benchmark. Even that is not perfect because it would not account for the time between flushes but there are limits of what can be reasonably done and still be doing something useful. If a representative synthetic trace is provided then this benchmark could be revisited and the shift values retuned. Ivybridge 4 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 10.50 ( 0.00%) 10.50 ( 0.03%) Mean 2 17.59 ( 0.00%) 17.18 ( 2.34%) Mean 3 22.98 ( 0.00%) 21.74 ( 5.41%) Mean 5 47.13 ( 0.00%) 46.23 ( 1.92%) Mean 8 43.30 ( 0.00%) 42.56 ( 1.72%) Ivybridge 8 threads 3.13.0-rc7 3.13.0-rc7 vanilla altshift-v3 Mean 1 9.45 ( 0.00%) 9.36 ( 0.93%) Mean 2 9.37 ( 0.00%) 9.70 ( -3.54%) Mean 3 9.36 ( 0.00%) 9.29 ( 0.70%) Mean 5 14.49 ( 0.00%) 15.04 ( -3.75%) Mean 8 41.08 ( 0.00%) 38.73 ( 5.71%) Mean 13 32.04 ( 0.00%) 31.24 ( 2.49%) Mean 16 40.05 ( 0.00%) 39.04 ( 2.51%) For both CPUs, average access time is reduced which is good as this is the benchmark that was used to tune the shift values in the first place albeit it is now known *how* the benchmark was used. The scheduler benchmarks were somewhat inconclusive. They showed gains and losses and makes me reconsider how stable those benchmarks really are or if something else might be interfering with the test results recently. Network benchmarks were inconclusive. Almost all results were flat except for netperf-udp tests on the 4 thread machine. These results were unstable and showed large variations between reboots. It is unknown if this is a recent problems but I've noticed before that netperf-udp results tend to vary. Based on these results, changing the default for Ivybridge seems like a logical choice. Signed-off-by:
Mel Gorman <mgorman@suse.de> Tested-by:
Davidlohr Bueso <davidlohr@hp.com> Reviewed-by:
Alex Shi <alex.shi@linaro.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-cqnadffh1tiqrshthRj3Esge@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mel Gorman authored
When choosing between doing an address space or ranged flush, the x86 implementation of flush_tlb_mm_range takes into account whether there are any large pages in the range. A per-page flush typically requires fewer entries than would covered by a single large page and the check is redundant. There is one potential exception. THP migration flushes single THP entries and it conceivably would benefit from flushing a single entry instead of the mm. However, this flush is after a THP allocation, copy and page table update potentially with any other threads serialised behind it. In comparison to that, the flush is noise. It makes more sense to optimise balancing to require fewer flushes than to optimise the flush itself. This patch deletes the redundant huge page check. Signed-off-by:
Mel Gorman <mgorman@suse.de> Tested-by:
Davidlohr Bueso <davidlohr@hp.com> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-sgei1drpOcburujPsfh6ovmo@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mel Gorman authored
NR_TLB_LOCAL_FLUSH_ALL is not always accounted for correctly and the comparison with total_vm is done before taking tlb_flushall_shift into account. Clean it up. Signed-off-by:
Mel Gorman <mgorman@suse.de> Tested-by:
Davidlohr Bueso <davidlohr@hp.com> Reviewed-by:
Alex Shi <alex.shi@linaro.org> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/n/tip-Iz5gcahrgskIldvukulzi0hh@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mel Gorman authored
Bisection between 3.11 and 3.12 fingered commit 9824cf97 ("mm: vmstats: tlb flush counters") to cause overhead problems. The counters are undeniably useful but how often do we really need to debug TLB flush related issues? It does not justify taking the penalty everywhere so make it a debugging option. Signed-off-by:
Mel Gorman <mgorman@suse.de> Tested-by:
Davidlohr Bueso <davidlohr@hp.com> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Alex Shi <alex.shi@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-XzxjntugxuwpxXhcrxqqh53b@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mike Travis authored
Fix UV call into kgdb to depend only on whether KGDB is defined and not both KGDB and KDB. This allows the power nmi command to use the gdb remote connection if enabled. Note new action of 'kgdb' needs to be set as well to indicate user wants to wait for gdb to be connected. If it's set to 'kdb' then an error message is displayed if KDB is not configured. Also note that if both KGDB and KDB are enabled, then the action of 'kgdb' or 'kdb' has no affect on which is used. See the KGDB documentation for further information. Signed-off-by:
Mike Travis <travis@sgi.com> Reviewed-by:
Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.635540667@asylum.americas.sgi.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mike Travis authored
Make uv_register_nmi_notifier() and uv_handle_nmi_ping() static to address sparse warnings. Fix problem where uv_nmi_kexec_failed is unused when CONFIG_KEXEC is not defined. Signed-off-by:
Mike Travis <travis@sgi.com> Reviewed-by:
Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.480872353@asylum.americas.sgi.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Mike Travis authored
Some code added to the debug_core module had KDB dependencies that it shouldn't have. Move the KDB dependent REASON back to the caller to remove the dependency in the debug core code. Update the call from the UV NMI handler to conform to the new interface. Signed-off-by:
Mike Travis <travis@sgi.com> Reviewed-by:
Hedi Berriche <hedi@sgi.com> Cc: Russ Anderson <rja@sgi.com> Cc: Jason Wessel <jason.wessel@windriver.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Link: http://lkml.kernel.org/r/20140114162551.318251993@asylum.americas.sgi.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Dan Carpenter authored
This is under CAP_SYS_ADMIN, but Smatch complains that mask comes from the user and the test for "mask > 0xf" can underflow. The fix is simple: amd_set_subcaches() should hand down not an 'int' but an 'unsigned long' like it was originally indended to do. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Acked-by:
Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Daniel J Blueman <daniel@numascale-asia.com> Link: http://lkml.kernel.org/r/20140121072209.GA22095@elgon.mountain Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-