- Jun 05, 2014
-
-
Igor Mammedov authored
Hang is observed on virtual machines during CPU hotplug, especially in big guests with many CPUs. (It reproducible more often if host is over-committed). It happens because master CPU gives up waiting on secondary CPU and allows it to run wild. As result AP causes locking or crashing system. For example as described here: https://lkml.org/lkml/2014/3/6/257 If master CPU have sent STARTUP IPI successfully, and AP signalled to master CPU that it's ready to start initialization, make master CPU wait indefinitely till AP is onlined. To ensure that AP won't ever run wild, make it wait at early startup till master CPU confirms its intention to wait for AP. If AP doesn't respond in 10 seconds, the master CPU will timeout and cancel AP onlining. Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Acked-by:
Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Igor Mammedov authored
If system is running without debug level logging, it will not log error if do_boot_cpu() failed to wakeup AP. It may lead to silent AP bringup failures at boot time. Change message level to KERN_ERR to make error visible to user as it's done on other architectures. Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Acked-by:
Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Igor Mammedov authored
currently if AP wake up is failed, master CPU marks AP as not present in do_boot_cpu() by calling set_cpu_present(cpu, false). That leads to following list corruption on the next physical CPU hotplug: [ 418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0() [ 418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0). [ 418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee [ 418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387 [ 418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007 [ 418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn [ 418.166433] 0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021 [ 418.176460] ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8 [ 418.177453] ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea [ 418.178445] Call Trace: [ 418.185811] [<ffffffff8159b22d>] dump_stack+0x49/0x5c [ 418.186440] [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0 [ 418.187192] [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50 [ 418.191231] [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7 [ 418.193889] [<ffffffff812f796e>] __list_add+0xbe/0xd0 [ 418.196649] [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200 [ 418.208610] [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60 [ 418.213831] [<ffffffff812e2ef4>] kobject_add+0x44/0x70 [ 418.229961] [<ffffffff813e2c60>] device_add+0xd0/0x550 [ 418.234991] [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0 [ 418.250226] [<ffffffff813e32be>] device_register+0x1e/0x30 [ 418.255296] [<ffffffff813e82a3>] register_cpu+0xe3/0x130 [ 418.266539] [<ffffffff81592be5>] arch_register_cpu+0x65/0x150 [ 418.285845] [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b ... Which is caused by the fact that generic_processor_info() allocates logical CPU id by calling: cpu = cpumask_next_zero(-1, cpu_present_mask); which returns id of previously failed to wake up CPU, since its bit is cleared by do_boot_cpu() and as result register_cpu() tries to register another CPU with the same id as already present but failed to be onlined CPU. Taking in account that AP will not do anything if master CPU failed to wake it up, there is no reason to mark that AP as not present and break next cpu hotplug attempts. As a side effect of not marking AP as not present, user would be allowed to online it again later. Also fix memory corruption in acpi_unmap_lsapic() if during CPU hotplug master CPU failed to wake up AP it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP. However following attempt to unplug that CPU will lead to out of bound write access to __apicid_to_node[] which is 32768 items long on x86_64 kernel. So with above fix of cpu_present_mask make sure that a present CPU has a valid APIC ID by not setting x86_cpu_to_apicid to BAD_APICID in do_boot_cpu() on failure and allow acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU. Signed-off-by:
Igor Mammedov <imammedo@redhat.com> Acked-by:
Toshi Kani <toshi.kani@hp.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Jun 04, 2014
-
-
Yinghai Lu authored
check_irq_vectors_for_cpu_disable() can overestimate the number of available interrupt vectors, so the check for cpu down succeeds, but the actual cpu removal fails. It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong because the systems vectors are not taken into account. Limit the search to first_system_vector instead of NR_VECTORS. The second indicator for vector availability the used_vectors bitmap is not taken into account at all. So system vectors, e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20), are accounted as available. Add a check for the used_vectors bitmap and do not account vectors which are marked there. [ tglx: Simplified code. Rewrote changelog and code comments. ] Signed-off-by:
Yinghai Lu <yinghai@kernel.org> Acked-by:
Prarit Bhargava <prarit@redhat.com> Cc: Seiji Aguchi <seiji.aguchi@hds.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com> Cc: x86@kernel.org Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.com Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
- Jun 02, 2014
-
-
Dave Young authored
For ioremapped efi memory aka old_map the virt addresses are not persistant across kexec reboot. kexec-tools will read the runtime maps from sysfs then pass them to 2nd kernel and assuming kexec efi boot is ok. This will cause kexec boot failure. To address this issue do not export runtime maps in case efi old_map so userspace can use no efi boot instead. Signed-off-by:
Dave Young <dyoung@redhat.com> Acked-by:
Borislav Petkov <bp@suse.de> Acked-by:
Vivek Goyal <vgoyal@redhat.com> Signed-off-by:
Matt Fleming <matt.fleming@intel.com>
-
- May 30, 2014
-
-
Minchan Kim authored
While I play inhouse patches with much memory pressure on qemu-kvm, 3.14 kernel was randomly crashed. The reason was kernel stack overflow. When I investigated the problem, the callstack was a little bit deeper by involve with reclaim functions but not direct reclaim path. I tried to diet stack size of some functions related with alloc/reclaim so did a hundred of byte but overflow was't disappeard so that I encounter overflow by another deeper callstack on reclaim/allocator path. Of course, we might sweep every sites we have found for reducing stack usage but I'm not sure how long it saves the world(surely, lots of developer start to add nice features which will use stack agains) and if we consider another more complex feature in I/O layer and/or reclaim path, it might be better to increase stack size( meanwhile, stack usage on 64bit machine was doubled compared to 32bit while it have sticked to 8K. Hmm, it's not a fair to me and arm64 already expaned to 16K. ) So, my stupid idea is just let's expand stack size and keep an eye toward stack consumption on each kernel functions via stacktrace of ftrace. For example, we can have a bar like that each funcion shouldn't exceed 200K and emit the warning when some function consumes more in runtime. Of course, it could make false positive but at least, it could make a chance to think over it. I guess this topic was discussed several time so there might be strong reason not to increase kernel stack size on x86_64, for me not knowing so Ccing x86_64 maintainers, other MM guys and virtio maintainers. Here's an example call trace using up the kernel stack: Depth Size Location (51 entries) ----- ---- -------- 0) 7696 16 lookup_address 1) 7680 16 _lookup_address_cpa.isra.3 2) 7664 24 __change_page_attr_set_clr 3) 7640 392 kernel_map_pages 4) 7248 256 get_page_from_freelist 5) 6992 352 __alloc_pages_nodemask 6) 6640 8 alloc_pages_current 7) 6632 168 new_slab 8) 6464 8 __slab_alloc 9) 6456 80 __kmalloc 10) 6376 376 vring_add_indirect 11) 6000 144 virtqueue_add_sgs 12) 5856 288 __virtblk_add_req 13) 5568 96 virtio_queue_rq 14) 5472 128 __blk_mq_run_hw_queue 15) 5344 16 blk_mq_run_hw_queue 16) 5328 96 blk_mq_insert_requests 17) 5232 112 blk_mq_flush_plug_list 18) 5120 112 blk_flush_plug_list 19) 5008 64 io_schedule_timeout 20) 4944 128 mempool_alloc 21) 4816 96 bio_alloc_bioset 22) 4720 48 get_swap_bio 23) 4672 160 __swap_writepage 24) 4512 32 swap_writepage 25) 4480 320 shrink_page_list 26) 4160 208 shrink_inactive_list 27) 3952 304 shrink_lruvec 28) 3648 80 shrink_zone 29) 3568 128 do_try_to_free_pages 30) 3440 208 try_to_free_pages 31) 3232 352 __alloc_pages_nodemask 32) 2880 8 alloc_pages_current 33) 2872 200 __page_cache_alloc 34) 2672 80 find_or_create_page 35) 2592 80 ext4_mb_load_buddy 36) 2512 176 ext4_mb_regular_allocator 37) 2336 128 ext4_mb_new_blocks 38) 2208 256 ext4_ext_map_blocks 39) 1952 160 ext4_map_blocks 40) 1792 384 ext4_writepages 41) 1408 16 do_writepages 42) 1392 96 __writeback_single_inode 43) 1296 176 writeback_sb_inodes 44) 1120 80 __writeback_inodes_wb 45) 1040 160 wb_writeback 46) 880 208 bdi_writeback_workfn 47) 672 144 process_one_work 48) 528 112 worker_thread 49) 416 240 kthread 50) 176 176 ret_from_fork [ Note: the problem is exacerbated by certain gcc versions that seem to generate much bigger stack frames due to apparently bad coalescing of temporaries and generating too many spills. Rusty saw gcc-4.6.4 using 35% more stack on the virtio path than 4.8.2 does, for example. Minchan not only uses such a bad gcc version (4.6.3 in his case), but some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is what causes that kernel_map_pages() frame, for example). But we're clearly getting too close. The VM code also seems to have excessive stack frames partly for the same compiler reason, triggered by excessive inlining and lots of function arguments. We need to improve on our stack use, but in the meantime let's do this simple stack increase too. Unlike most earlier reports, there is nothing simple that stands out as being really horribly wrong here, apart from the fact that the stack frames are just bigger than they should need to be. - Linus ] Signed-off-by:
Minchan Kim <minchan@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Jones <davej@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Hugh Dickins <hughd@google.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michael S Tsirkin <mst@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- May 22, 2014
-
-
Andy Lutomirski authored
The oops can be triggered in qemu using -no-hpet (but not nohpet) by running a 32-bit program and reading a couple of pages before the vdso. This should send SIGBUS instead of OOPSing. The bug was introduced by: commit 7a59ed41 Author: Stefani Seibold <stefani@seibold.net> Date: Mon Mar 17 23:22:09 2014 +0100 x86, vdso: Add 32 bit VDSO time support for 32 bit kernel which is new in 3.15. Signed-off-by:
Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e99025d887d6670b6c4d81e6ccfeeb83770b21e9.1400109621.git.luto@amacapital.net Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- May 15, 2014
-
-
Linus Torvalds authored
Checkin: b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels disabled 16-bit segments on 64-bit kernels due to an information leak. However, it does seem that people are genuinely using Wine to run old 16-bit Windows programs on Linux. A proper fix for this ("espfix64") is coming in the upcoming merge window, but as a temporary fix, create a sysctl to allow the administrator to re-enable support for 16-bit segments. It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If you hit this issue and care about your old Windows program more than you care about a kernel stack address information leak, you can do echo 1 > /proc/sys/abi/ldt16 as root (add it to your startup scripts), and you should be ok. The sysctl table is only added if you have COMPAT support enabled on x86-64, but I assume anybody who runs old windows binaries very much does that ;) Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com Cc: <stable@vger.kernel.org>
-
- May 14, 2014
-
-
Marcelo Tosatti authored
Updating system_time from the kernel clock once master clock has been enabled can result in time backwards event, in case kernel clock frequency is lower than TSC frequency. Disable master clock in case it is necessary to update it from the resume path. Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Anthony Iliopoulos authored
The invalidation is required in order to maintain proper semantics under CoW conditions. In scenarios where a process clones several threads, a thread operating on a core whose DTLB entry for a particular hugepage has not been invalidated, will be reading from the hugepage that belongs to the forked child process, even after hugetlb_cow(). The thread will not see the updated page as long as the stale DTLB entry remains cached, the thread attempts to write into the page, the child process exits, or the thread gets migrated to a different processor. Signed-off-by:
Anthony Iliopoulos <anthony.iliopoulos@huawei.com> Link: http://lkml.kernel.org/r/20140514092948.GA17391@server-36.huawei.corp Suggested-by:
Shay Goikhman <shay.goikhman@huawei.com> Acked-by:
Dave Hansen <dave.hansen@intel.com> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Cc: <stable@vger.kernel.org> # v2.6.16+ (!)
-
Alexei Starovoitov authored
bpf_alloc_binary() adds 128 bytes of room to JITed program image and rounds it up to the nearest page size. If image size is close to page size (like 4000), it is rounded to two pages: round_up(4000 + 4 + 128) == 8192 then 'hole' is computed as 8192 - (4000 + 4) = 4188 If prandom_u32() % hole selects a number >= PAGE_SIZE - sizeof(*header) then kernel will crash during bpf_jit_free(): kernel BUG at arch/x86/mm/pageattr.c:887! Call Trace: [<ffffffff81037285>] change_page_attr_set_clr+0x135/0x460 [<ffffffff81694cc0>] ? _raw_spin_unlock_irq+0x30/0x50 [<ffffffff810378ff>] set_memory_rw+0x2f/0x40 [<ffffffffa01a0d8d>] bpf_jit_free_deferred+0x2d/0x60 [<ffffffff8106bf98>] process_one_work+0x1d8/0x6a0 [<ffffffff8106bf38>] ? process_one_work+0x178/0x6a0 [<ffffffff8106c90c>] worker_thread+0x11c/0x370 since bpf_jit_free() does: unsigned long addr = (unsigned long)fp->bpf_func & PAGE_MASK; struct bpf_binary_header *header = (void *)addr; to compute start address of 'bpf_binary_header' and header->pages will pass junk to: set_memory_rw(addr, header->pages); Fix it by making sure that &header->image[prandom_u32() % hole] and &header are in the same page Fixes: 314beb9b ("x86: bpf_jit_comp: secure bpf jit against spraying attacks") Signed-off-by:
Alexei Starovoitov <ast@plumgrid.com> Acked-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- May 12, 2014
-
-
H. Peter Anvin authored
One can logically expect that when the user has specified "nordrand", the user doesn't want any use of the CPU random number generator, neither RDRAND nor RDSEED, so disable both. Reported-by:
Stephan Mueller <smueller@chronox.de> Cc: Theodore Ts'o <tytso@mit.edu> Link: http://lkml.kernel.org/r/21542339.0lFnPSyGRS@myon.chronox.de Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
- May 09, 2014
-
-
Boris Ostrovsky authored
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall() may lose upper bits or, worse, add them since compiler will do this: (u64)(tk->wall_to_monotonic.tv_nsec << tk->shift) instead of ((u64)tk->wall_to_monotonic.tv_nsec << tk->shift) So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in the subsequent 'while' loop. We need an explicit cast. Signed-off-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com Acked-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Andres Freund authored
The spuriously added semicolon didn't have any effect because the macro isn't currently in use. c0a639ad Signed-off-by:
Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
Andres Freund authored
Due to a typo the msr accessor function introduced in 22085a66 didn't have any lasting effects because they accidentally wrote the old value back. After c0a639ad this at the very least this causes cpuid limits not to be lifted on some cpus leading to missing capabilities for those. Signed-off-by:
Andres Freund <andres@anarazel.de> Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de Cc: Borislav Petkov <bp@suse.de> Signed-off-by:
H. Peter Anvin <hpa@zytor.com>
-
- May 08, 2014
-
-
Feng Tang authored
HPET on current Baytrail platform has accuracy problem to be used as reliable clocksource/clockevent, so add a early quirk to disable it. Signed-off-by:
Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-2-git-send-email-feng.tang@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Feng Tang authored
HPET on some platform has accuracy problem. Making "boot_hpet_disable" extern so that we can runtime disable the HPET timer by using quirk to check the platform. Signed-off-by:
Feng Tang <feng.tang@intel.com> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1398327498-13163-1-git-send-email-feng.tang@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- May 07, 2014
-
-
George Spelvin authored
If you are using a 64-bit kernel with 32-bit userland, then scripts/gcc-x86_64-has-stack-protector.sh invokes 32-bit gcc with -mcmodel=kernel, which produces: <stdin>:1:0: error: code model 'kernel' not supported in the 32 bit mode and trips the "broken compiler" test at arch/x86/Makefile:120. There are several places a fix is possible, but the following seems cleanest. (But it's minimal; it would also be possible to factor out a bunch of stuff from the two branches of the if.) Signed-off-by:
George Spelvin <linux@horizon.com> Link: http://lkml.kernel.org/r/20140507210552.7581.qmail@ns.horizon.com Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
Paolo Bonzini authored
While running a nested guest, we should disable APIC virtualization controls (virtualized APIC register accesses, virtual interrupt delivery and posted interrupts), because we do not expose them to the nested guest. Reported-by:
Hu Yaohui <loki2441@gmail.com> Suggested-by:
Abel Gordon <abel@stratoscale.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Yan, Zheng authored
Event 0x013c is not the same as fixed counter2, remove it from Silvermont's event constraints. Signed-off-by:
Yan, Zheng <zheng.z.yan@intel.com> Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Christian Gmeiner authored
Certec BPC600 needs reboot=pci to actually reboot. Signed-off-by:
Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Li Aubrey <aubrey.li@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dave Jones <davej@redhat.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1399446114-2147-1-git-send-email-christian.gmeiner@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- May 06, 2014
-
-
Andi Kleen authored
As requested by Linus add explicit __visible to the asmlinkage users. This marks all functions visible to assembler. Tree sweep for arch/x86/* Signed-off-by:
Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1398984278-29319-3-git-send-email-andi@firstfloor.org Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com>
-
H. Peter Anvin authored
arch/x86/crypto/sha1_avx2_x86_64_asm.S introduced _end as a local symbol, which broke the build under certain circumstances. Although the wisdom of _end as a local symbol can definitely be questioned, the build should not break for that reason. Thus, filter the output of nm to only get global symbols of appropriate type. Reported-by:
Andy Lutomirski <luto@amacapital.net> Cc: Chandramouli Narayanan <mouli@linux.intel.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/n/tip-uxm3j3w3odglcwhafwq5tjqu@git.kernel.org
-
- May 03, 2014
-
-
Dave Young authored
earlyprintk=efi,keep will cause kernel hangs while freeing initmem like below: VFS: Mounted root (ext4 filesystem) readonly on device 254:2. devtmpfs: mounted Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000) It is caused by efi earlyprintk use __init function which will be freed later. Such as early_efi_write is marked as __init, also it will use early_ioremap which is init function as well. To fix this issue, I added early initcall early_efi_map_fb which maps the whole efi fb for later use. OTOH, adding a wrapper function early_efi_map which calls early_ioremap before ioremap is available. With this patch applied efi boot ok with earlyprintk=efi,keep console=efi Signed-off-by:
Dave Young <dyoung@redhat.com> Signed-off-by:
Matt Fleming <matt.fleming@intel.com>
-
- May 02, 2014
-
-
Dave Young authored
earlyprintk=efi,keep will cause kernel hangs while freeing initmem like below: VFS: Mounted root (ext4 filesystem) readonly on device 254:2. devtmpfs: mounted Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000) It is caused by efi earlyprintk use __init function which will be freed later. Such as early_efi_write is marked as __init, also it will use early_ioremap which is init function as well. To fix this issue, I added early initcall early_efi_map_fb which maps the whole efi fb for later use. OTOH, adding a wrapper function early_efi_map which calls early_ioremap before ioremap is available. With this patch applied efi boot ok with earlyprintk=efi,keep console=efi Signed-off-by:
Dave Young <dyoung@redhat.com> Signed-off-by:
Matt Fleming <matt.fleming@intel.com>
-
- Apr 28, 2014
-
-
Thomas Gleixner authored
On x86 the allocation of irq descriptors may allocate interrupts which are in the range of the GSI interrupts. That's wrong as those interrupts are hardwired and we don't have the irq domain translation like PPC. So one of these interrupts can be hooked up later to one of the devices which are hard wired to it and the io_apic init code for that particular interrupt line happily reuses that descriptor with a completely different configuration so hell breaks lose. Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs, except for a few usage sites which have not yet blown up in our face for whatever reason. But for drivers which need an irq range, like the GPIO drivers, we have no limit in place and we don't want to expose such a detail to a driver. To cure this introduce a function which an architecture can implement to impose a lower bound on the dynamic interrupt allocations. Implement it for x86 and set the lower bound to nr_gsi_irqs, which is the end of the hardwired interrupt space, so all dynamic allocations happen above. That not only allows the GPIO driver to work sanely, it also protects the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and htirq code. They need to be cleaned up as well, but that's a separate issue. Reported-by:
Jin Yao <yao.jin@linux.intel.com> Signed-off-by:
Thomas Gleixner <tglx@linutronix.de> Tested-by:
Mika Westerberg <mika.westerberg@linux.intel.com> Cc: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Grant Likely <grant.likely@linaro.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Krogerus Heikki <heikki.krogerus@intel.com> Cc: Linus Walleij <linus.walleij@linaro.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de Signed-off-by:
Thomas Gleixner <tglx@linutronix.de>
-
Bandan Das authored
We track shadow vmcs fields through two static lists, one for read only and another for r/w fields. However, with addition of new vmcs fields, not all fields may be supported on all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on unsupported hosts will result in a vmwrite error. For example, commit 36be0b9d introduced GUEST_BNDCFGS, which is not supported by all processors. Filter out host unsupported fields before letting guests use shadow vmcs Signed-off-by:
Bandan Das <bsd@redhat.com> Signed-off-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Oren Twaig authored
Correct IRQ routing in case a vSMP box is detected but the Interrupt Routing Comply (IRC) value is set to "comply", which leads to incorrect IRQ routing. Before the patch: When a vSMP box was detected and IRC was set to "comply", users (and the kernel) couldn't effectively set the destination of the IRQs. This is because the hook inside vsmp_64.c always setup all CPUs as the IRQ destination using cpumask_setall() as the return value for IRQ allocation mask. Later, this "overrided" mask caused the kernel to set the IRQ destination to the lowest online CPU in the mask (CPU0 usually). After the patch: When the IRC is set to "comply", users (and the kernel) can control the destination of the IRQs as we will not be changing the default "apic->vector_allocation_domain". Signed-off-by:
Oren Twaig <oren@scalemp.com> Acked-by:
Shai Fultheim <shai@scalemp.com> Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com [ Minor readability edits. ] Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 24, 2014
-
-
Stephane Eranian authored
This patch fixes a bug introduced by: 24223657 ("perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU") The rdmsrl_safe() function returns 0 on success. The current code was failing to detect the RAPL PMU on real hardware (missing /sys/devices/power) because the return value of rdmsrl_safe() was misinterpreted. Signed-off-by:
Stephane Eranian <eranian@google.com> Acked-by:
Borislav Petkov <bp@suse.de> Acked-by:
Venkatesh Srinivas <venkateshs@google.com> Cc: peterz@infradead.org Cc: zheng.z.yan@intel.com Link: http://lkml.kernel.org/r/20140423170418.GA12767@quad Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 22, 2014
-
-
Behan Webster authored
Wrap -mno-80387 gcc options with cc-option so they don't break clang. Signed-off-by:
Behan Webster <behanw@converseincode.com> Cc: torvalds@linux-foundation.org Cc: dwmw2@infradead.org Cc: pageexec@freemail.hu Link: http://lkml.kernel.org/r/1398145227-25053-1-git-send-email-behanw@converseincode.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 18, 2014
-
-
Venkatesh Srinivas authored
CPUs which should support the RAPL counters according to Family/Model/Stepping may still issue #GP when attempting to access the RAPL MSRs. This may happen when Linux is running under KVM and we are passing-through host F/M/S data, for example. Use rdmsrl_safe to first access the RAPL_POWER_UNIT MSR; if this fails, do not attempt to use this PMU. Signed-off-by:
Venkatesh Srinivas <venkateshs@google.com> Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394739386-22260-1-git-send-email-venkateshs@google.com Cc: zheng.z.yan@intel.com Cc: eranian@google.com Cc: ak@linux.intel.com Cc: linux-kernel@vger.kernel.org [ The patch also silently fixes another bug: rapl_pmu_init() didn't handle the memory alloc failure case previously. ] Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 17, 2014
-
-
Masami Hiramatsu authored
Current kprobes in-kernel page fault handler doesn't expect that its single-stepping can be interrupted by an NMI handler which may cause a page fault(e.g. perf with callback tracing). In that case, the page-fault handled by kprobes and it misunderstands the page-fault has been caused by the single-stepping code and tries to recover IP address to probed address. But the truth is the page-fault has been caused by the NMI handler, and do_page_fault failes to handle real page fault because the IP address is modified and causes Kernel BUGs like below. ---- [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020 [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40 To handle this correctly, I fixed the kprobes fault handler to ensure the faulted ip address is its own single-step buffer instead of checking current kprobe state. Signed-off-by:
Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: fche@redhat.com Cc: systemtap@sourceware.org Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
Ingo Molnar authored
The following commit: 27f6c573 ("x86, CMCI: Add proper detection of end of CMCI storms") Added two preemption bugs: - machine_check_poll() does a get_cpu_var() without a matching put_cpu_var(), which causes preemption imbalance and crashes upon bootup. - it does percpu ops without disabling preemption. Preemption is not disabled due to the mistaken use of a raw spinlock. To fix these bugs fix the imbalance and change cmci_discover_lock to a regular spinlock. Reported-by:
Owen Kibel <qmewlo@gmail.com> Reported-by:
Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by:
Ingo Molnar <mingo@kernel.org> Cc: Chen, Gong <gong.chen@linux.intel.com> Cc: Josh Boyer <jwboyer@fedoraproject.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Alexander Todorov <atodorov@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org Signed-off-by:
Ingo Molnar <mingo@kernel.org> -- arch/x86/kernel/cpu/mcheck/mce.c | 4 +--- arch/x86/kernel/cpu/mcheck/mce_intel.c | 18 +++++++++--------- 2 files changed, 10 insertions(+), 12 deletions(-)
-
- Apr 16, 2014
-
-
Ingo Molnar authored
Steve reported a reboot hang and bisected it back to this commit: a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list He heroically tested all reboot methods and found the following: reboot=t # triple fault ok reboot=k # keyboard ctrl FAIL reboot=b # BIOS ok reboot=a # ACPI FAIL reboot=e # EFI FAIL [system has no EFI] reboot=p # PCI 0xcf9 FAIL And I think it's pretty obvious that we should only try PCI 0xcf9 as a last resort - if at all. The other observation is that (on this box) we should never try the PCI reboot method, but close with either the 'triple fault' or the 'BIOS' (terminal!) reboot methods. Thirdly, CF9_COND is a total misnomer - it should be something like CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ... So this patch fixes the worst problems: - it orders the actual reboot logic to follow the reboot ordering pattern - it was in a pretty random order before for no good reason. - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and BOOT_CF9_SAFE flags to make the code more obvious. - it tries the BIOS reboot method before the PCI reboot method. (Since 'BIOS' is a terminal reboot method resulting in a hang if it does not work, this is essentially equivalent to removing the PCI reboot method from the default reboot chain.) - just for the miraculous possibility of terminal (resulting in hang) reboot methods of triple fault or BIOS returning without having done their job, there's an ordering between them as well. Reported-and-bisected-and-tested-by:
Steven Rostedt <rostedt@goodmis.org> Cc: Li Aubrey <aubrey.li@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 15, 2014
-
-
Konrad Rzeszutek Wilk authored
The git commit a945928e ('xen: Do not enable spinlocks before jump_label_init() has executed') was added to deal with the jump machinery. Earlier the code that turned on the jump label was only called by Xen specific functions. But now that it had been moved to the initcall machinery it gets called on Xen, KVM, and baremetal - ouch!. And the detection machinery to only call it on Xen wasn't remembered in the heat of merge window excitement. This means that the slowpath is enabled on baremetal while it should not be. Reported-by:
Waiman Long <waiman.long@hp.com> Acked-by:
Steven Rostedt <rostedt@goodmis.org> CC: stable@vger.kernel.org CC: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by:
Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com>
-
Boris Ostrovsky authored
Commit 198d208d ("x86: Keep thread_info on thread stack in x86_32") made 32-bit kernels use kernel_stack to point to thread_info. That change missed a couple of updates needed by Xen's 32-bit PV guests: 1. kernel_stack needs to be initialized for secondary CPUs 2. GET_THREAD_INFO() now uses %fs register which may not be the kernel's version when executing xen_iret(). With respect to the second issue, we don't need GET_THREAD_INFO() anymore: we used it as an intermediate step to get to per_cpu xen_vcpu and avoid referencing %fs. Now that we are going to use %fs anyway we may as well go directly to xen_vcpu. Signed-off-by:
Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by:
David Vrabel <david.vrabel@citrix.com>
-
- Apr 14, 2014
-
-
Marcelo Tosatti authored
Function and callers can be preempted. https://bugzilla.kernel.org/show_bug.cgi?id=73721 Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Reviewed-by:
Paolo Bonzini <pbonzini@redhat.com>
-
Feng Wu authored
Rename variable smep to cr4_smep, which can better reflect the meaning of the variable. Signed-off-by:
Feng Wu <feng.wu@intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Feng Wu authored
This patch exposes SMAP feature to guest Signed-off-by:
Feng Wu <feng.wu@intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Feng Wu authored
SMAP is disabled if CPU is in non-paging mode in hardware. However KVM always uses paging mode to emulate guest non-paging mode with TDP. To emulate this behavior, SMAP needs to be manually disabled when guest switches to non-paging mode. Signed-off-by:
Feng Wu <feng.wu@intel.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-