Skip to content
Snippets Groups Projects
  1. Jun 05, 2014
    • Igor Mammedov's avatar
      x86/smpboot: Initialize secondary CPU only if master CPU will wait for it · 3e1a878b
      Igor Mammedov authored
      Hang is observed on virtual machines during CPU hotplug,
      especially in big guests with many CPUs. (It reproducible
      more often if host is over-committed).
      
      It happens because master CPU gives up waiting on
      secondary CPU and allows it to run wild. As result
      AP causes locking or crashing system. For example
      as described here:
      
         https://lkml.org/lkml/2014/3/6/257
      
      
      
      If master CPU have sent STARTUP IPI successfully,
      and AP signalled to master CPU that it's ready
      to start initialization, make master CPU wait
      indefinitely till AP is onlined.
      To ensure that AP won't ever run wild, make it
      wait at early startup till master CPU confirms its
      intention to wait for AP. If AP doesn't respond in 10
      seconds, the master CPU will timeout and cancel
      AP onlining.
      
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1401975765-22328-4-git-send-email-imammedo@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      3e1a878b
    • Igor Mammedov's avatar
      x86/smpboot: Log error on secondary CPU wakeup failure at ERR level · feef1e8e
      Igor Mammedov authored
      
      If system is running without debug level logging,
      it will not log error if do_boot_cpu() failed to
      wakeup AP. It may lead to silent AP bringup
      failures at boot time.
      Change message level to KERN_ERR to make error
      visible to user as it's done on other architectures.
      
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1401975765-22328-3-git-send-email-imammedo@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      feef1e8e
    • Igor Mammedov's avatar
      x86: Fix list/memory corruption on CPU hotplug · 89f898c1
      Igor Mammedov authored
      
      currently if AP wake up is failed, master CPU marks AP as not
      present in do_boot_cpu() by calling set_cpu_present(cpu, false).
      That leads to following list corruption on the next physical CPU
      hotplug:
      
      [  418.107336] WARNING: CPU: 1 PID: 45 at lib/list_debug.c:33 __list_add+0xbe/0xd0()
      [  418.115268] list_add corruption. prev->next should be next (ffff88003dc57600), but was ffff88003e20c3a0. (prev=ffff88003e20c3a0).
      [  418.123693] Modules linked in: nf_conntrack_netbios_ns nf_conntrack_broadcast ipt_MASQUERADE ip6t_REJECT ipt_REJECT cfg80211 xt_conntrack rfkill ee
      [  418.138979] CPU: 1 PID: 45 Comm: kworker/u10:1 Not tainted 3.14.0-rc6+ #387
      [  418.149989] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      [  418.165750] Workqueue: kacpi_hotplug acpi_hotplug_work_fn
      [  418.166433]  0000000000000021 ffff880038ca7988 ffffffff8159b22d 0000000000000021
      [  418.176460]  ffff880038ca79d8 ffff880038ca79c8 ffffffff8106942c ffff880038ca79e8
      [  418.177453]  ffff88003e20c3a0 ffff88003dc57600 ffff88003e20c3a0 00000000ffffffea
      [  418.178445] Call Trace:
      [  418.185811]  [<ffffffff8159b22d>] dump_stack+0x49/0x5c
      [  418.186440]  [<ffffffff8106942c>] warn_slowpath_common+0x8c/0xc0
      [  418.187192]  [<ffffffff81069516>] warn_slowpath_fmt+0x46/0x50
      [  418.191231]  [<ffffffff8136ef51>] ? acpi_ns_get_node+0xb7/0xc7
      [  418.193889]  [<ffffffff812f796e>] __list_add+0xbe/0xd0
      [  418.196649]  [<ffffffff812e2aa9>] kobject_add_internal+0x79/0x200
      [  418.208610]  [<ffffffff812e2e18>] kobject_add_varg+0x38/0x60
      [  418.213831]  [<ffffffff812e2ef4>] kobject_add+0x44/0x70
      [  418.229961]  [<ffffffff813e2c60>] device_add+0xd0/0x550
      [  418.234991]  [<ffffffff813f0e95>] ? pm_runtime_init+0xe5/0xf0
      [  418.250226]  [<ffffffff813e32be>] device_register+0x1e/0x30
      [  418.255296]  [<ffffffff813e82a3>] register_cpu+0xe3/0x130
      [  418.266539]  [<ffffffff81592be5>] arch_register_cpu+0x65/0x150
      [  418.285845]  [<ffffffff81355c0d>] acpi_processor_hotadd_init+0x5a/0x9b
      ...
      Which is caused by the fact that generic_processor_info() allocates
      logical CPU id by calling:
      
       cpu = cpumask_next_zero(-1, cpu_present_mask);
      
      which returns id of previously failed to wake up CPU, since its
      bit is cleared by do_boot_cpu() and as result register_cpu()
      tries to register another CPU with the same id as already
      present but failed to be onlined CPU.
      
      Taking in account that AP will not do anything if master CPU
      failed to wake it up, there is no reason to mark that AP as not
      present and break next cpu hotplug attempts. As a side effect of
      not marking AP as not present, user would be allowed to online
      it again later.
      
      Also fix memory corruption in acpi_unmap_lsapic()
      
      if during CPU hotplug master CPU failed to wake up AP
      it set percpu x86_cpu_to_apicid to BAD_APICID=0xFFFF for AP.
      
      However following attempt to unplug that CPU will lead to
      out of bound write access to __apicid_to_node[] which is
      32768 items long on x86_64 kernel.
      
      So with above fix of cpu_present_mask make sure that a present
      CPU has a valid APIC ID by not setting x86_cpu_to_apicid
      to BAD_APICID in do_boot_cpu() on failure and allow
      acpi_processor_remove()->acpi_unmap_lsapic() cleanly remove CPU.
      
      Signed-off-by: default avatarIgor Mammedov <imammedo@redhat.com>
      Acked-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1401975765-22328-2-git-send-email-imammedo@redhat.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      89f898c1
  2. Jun 04, 2014
    • Yinghai Lu's avatar
      x86: irq: Get correct available vectors for cpu disable · ac2a5539
      Yinghai Lu authored
      
      check_irq_vectors_for_cpu_disable() can overestimate the number of
      available interrupt vectors, so the check for cpu down succeeds, but
      the actual cpu removal fails.
      
      It iterates from FIRST_EXTERNAL_VECTOR to NR_VECTORS, which is wrong
      because the systems vectors are not taken into account.
      
      Limit the search to first_system_vector instead of NR_VECTORS.
      
      The second indicator for vector availability the used_vectors bitmap
      is not taken into account at all. So system vectors,
      e.g. IA32_SYSCALL_VECTOR (0x80) and IRQ_MOVE_CLEANUP_VECTOR (0x20),
      are accounted as available.
      
      Add a check for the used_vectors bitmap and do not account vectors
      which are marked there.
      
      [ tglx: Simplified code. Rewrote changelog and code comments. ]
      
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Acked-by: default avatarPrarit Bhargava <prarit@redhat.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: K. Y. Srinivasan <kys@microsoft.com>
      Cc: Steven Rostedt (Red Hat) <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Elliott, Robert (Server Storage)" <Elliott@hp.com>
      Cc: x86@kernel.org
      Link: http://lkml.kernel.org/r/1400160305-17774-2-git-send-email-prarit@redhat.com
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      ac2a5539
  3. Jun 02, 2014
  4. May 30, 2014
    • Minchan Kim's avatar
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim authored
      
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
  5. May 22, 2014
  6. May 15, 2014
    • Linus Torvalds's avatar
      x86-64, modify_ldt: Make support for 16-bit segments a runtime option · fa81511b
      Linus Torvalds authored
      
      Checkin:
      
      b3b42ac2 x86-64, modify_ldt: Ban 16-bit segments on 64-bit kernels
      
      disabled 16-bit segments on 64-bit kernels due to an information
      leak.  However, it does seem that people are genuinely using Wine to
      run old 16-bit Windows programs on Linux.
      
      A proper fix for this ("espfix64") is coming in the upcoming merge
      window, but as a temporary fix, create a sysctl to allow the
      administrator to re-enable support for 16-bit segments.
      
      It adds a "/proc/sys/abi/ldt16" sysctl that defaults to zero (off). If
      you hit this issue and care about your old Windows program more than
      you care about a kernel stack address information leak, you can do
      
         echo 1 > /proc/sys/abi/ldt16
      
      as root (add it to your startup scripts), and you should be ok.
      
      The sysctl table is only added if you have COMPAT support enabled on
      x86-64, but I assume anybody who runs old windows binaries very much
      does that ;)
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/r/CA%2B55aFw9BPoD10U1LfHbOMpHWZkvJTkMcfCs9s3urPr1YyWBxw@mail.gmail.com
      Cc: <stable@vger.kernel.org>
      fa81511b
  7. May 14, 2014
  8. May 12, 2014
  9. May 09, 2014
  10. May 08, 2014
  11. May 07, 2014
  12. May 06, 2014
  13. May 03, 2014
    • Dave Young's avatar
      x86/efi: earlyprintk=efi,keep fix · 5f35eb0e
      Dave Young authored
      
      earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
      below:
      
        VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
        devtmpfs: mounted
        Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)
      
      It is caused by efi earlyprintk use __init function which will be freed
      later.  Such as early_efi_write is marked as __init, also it will use
      early_ioremap which is init function as well.
      
      To fix this issue, I added early initcall early_efi_map_fb which maps
      the whole efi fb for later use. OTOH, adding a wrapper function
      early_efi_map which calls early_ioremap before ioremap is available.
      
      With this patch applied efi boot ok with earlyprintk=efi,keep console=efi
      
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      5f35eb0e
  14. May 02, 2014
    • Dave Young's avatar
      x86/efi: earlyprintk=efi,keep fix · 34f51147
      Dave Young authored
      
      earlyprintk=efi,keep will cause kernel hangs while freeing initmem like
      below:
      
        VFS: Mounted root (ext4 filesystem) readonly on device 254:2.
        devtmpfs: mounted
        Freeing unused kernel memory: 880K (ffffffff817d4000 - ffffffff818b0000)
      
      It is caused by efi earlyprintk use __init function which will be freed
      later.  Such as early_efi_write is marked as __init, also it will use
      early_ioremap which is init function as well.
      
      To fix this issue, I added early initcall early_efi_map_fb which maps
      the whole efi fb for later use. OTOH, adding a wrapper function
      early_efi_map which calls early_ioremap before ioremap is available.
      
      With this patch applied efi boot ok with earlyprintk=efi,keep console=efi
      
      Signed-off-by: default avatarDave Young <dyoung@redhat.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      34f51147
  15. Apr 28, 2014
    • Thomas Gleixner's avatar
      genirq: x86: Ensure that dynamic irq allocation does not conflict · 62a08ae2
      Thomas Gleixner authored
      
      On x86 the allocation of irq descriptors may allocate interrupts which
      are in the range of the GSI interrupts. That's wrong as those
      interrupts are hardwired and we don't have the irq domain translation
      like PPC. So one of these interrupts can be hooked up later to one of
      the devices which are hard wired to it and the io_apic init code for
      that particular interrupt line happily reuses that descriptor with a
      completely different configuration so hell breaks lose.
      
      Inside x86 we allocate dynamic interrupts from above nr_gsi_irqs,
      except for a few usage sites which have not yet blown up in our face
      for whatever reason. But for drivers which need an irq range, like the
      GPIO drivers, we have no limit in place and we don't want to expose
      such a detail to a driver.
      
      To cure this introduce a function which an architecture can implement
      to impose a lower bound on the dynamic interrupt allocations.
      
      Implement it for x86 and set the lower bound to nr_gsi_irqs, which is
      the end of the hardwired interrupt space, so all dynamic allocations
      happen above.
      
      That not only allows the GPIO driver to work sanely, it also protects
      the bogus callsites of create_irq_nr() in hpet, uv, irq_remapping and
      htirq code. They need to be cleaned up as well, but that's a separate
      issue.
      
      Reported-by: default avatarJin Yao <yao.jin@linux.intel.com>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Tested-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
      Cc: Mathias Nyman <mathias.nyman@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Krogerus Heikki <heikki.krogerus@intel.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1404241617360.28206@ionos.tec.linutronix.de
      
      
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      62a08ae2
    • Bandan Das's avatar
      KVM: x86: Check for host supported fields in shadow vmcs · fe2b201b
      Bandan Das authored
      
      We track shadow vmcs fields through two static lists,
      one for read only and another for r/w fields. However, with
      addition of new vmcs fields, not all fields may be supported on
      all hosts. If so, copy_vmcs12_to_shadow() trying to vmwrite on
      unsupported hosts will result in a vmwrite error. For example, commit
      36be0b9d introduced GUEST_BNDCFGS, which is not supported
      by all processors. Filter out host unsupported fields before
      letting guests use shadow vmcs
      
      Signed-off-by: default avatarBandan Das <bsd@redhat.com>
      Signed-off-by: default avatarPaolo Bonzini <pbonzini@redhat.com>
      fe2b201b
    • Oren Twaig's avatar
      x86/vsmp: Fix irq routing · 39025ba3
      Oren Twaig authored
      
      Correct IRQ routing in case a vSMP box is detected
      but the  Interrupt Routing Comply (IRC) value is set to
      "comply", which leads to incorrect IRQ routing.
      
      Before the patch:
      
      When a vSMP box was detected and IRC was set to "comply",
      users (and the kernel) couldn't effectively set the
      destination of the IRQs. This is because the hook inside
      vsmp_64.c always setup all CPUs as the IRQ destination using
      cpumask_setall() as the return value for IRQ allocation mask.
      Later, this "overrided" mask caused the kernel to set the IRQ
      destination to the lowest online CPU in the mask (CPU0 usually).
      
      After the patch:
      
      When the IRC is set to "comply", users (and the kernel) can control
      the destination of the IRQs as we will not be changing the
      default "apic->vector_allocation_domain".
      
      Signed-off-by: default avatarOren Twaig <oren@scalemp.com>
      Acked-by: default avatarShai Fultheim <shai@scalemp.com>
      Link: http://lkml.kernel.org/r/1398669697-2123-1-git-send-email-oren@scalemp.com
      
      
      [ Minor readability edits. ]
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      39025ba3
  16. Apr 24, 2014
  17. Apr 22, 2014
  18. Apr 18, 2014
  19. Apr 17, 2014
    • Masami Hiramatsu's avatar
      kprobes/x86: Fix page-fault handling logic · 6381c24c
      Masami Hiramatsu authored
      
      Current kprobes in-kernel page fault handler doesn't
      expect that its single-stepping can be interrupted by
      an NMI handler which may cause a page fault(e.g. perf
      with callback tracing).
      
      In that case, the page-fault handled by kprobes and it
      misunderstands the page-fault has been caused by the
      single-stepping code and tries to recover IP address
      to probed address.
      
      But the truth is the page-fault has been caused by the
      NMI handler, and do_page_fault failes to handle real
      page fault because the IP address is modified and
      causes Kernel BUGs like below.
      
       ----
       [ 2264.726905] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
       [ 2264.727190] IP: [<ffffffff813c46e0>] copy_user_generic_string+0x0/0x40
      
      To handle this correctly, I fixed the kprobes fault
      handler to ensure the faulted ip address is its own
      single-step buffer instead of checking current kprobe
      state.
      
      Signed-off-by: default avatarMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
      Cc: Sandeepa Prabhu <sandeepa.prabhu@linaro.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: fche@redhat.com
      Cc: systemtap@sourceware.org
      Link: http://lkml.kernel.org/r/20140417081644.26341.52351.stgit@ltc230.yrl.intra.hitachi.co.jp
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      6381c24c
    • Ingo Molnar's avatar
      x86/mce: Fix CMCI preemption bugs · ea431643
      Ingo Molnar authored
      
      The following commit:
      
        27f6c573 ("x86, CMCI: Add proper detection of end of CMCI storms")
      
      Added two preemption bugs:
      
       - machine_check_poll() does a get_cpu_var() without a matching
         put_cpu_var(), which causes preemption imbalance and crashes upon
         bootup.
      
       - it does percpu ops without disabling preemption. Preemption is not
         disabled due to the mistaken use of a raw spinlock.
      
      To fix these bugs fix the imbalance and change
      cmci_discover_lock to a regular spinlock.
      
      Reported-by: default avatarOwen Kibel <qmewlo@gmail.com>
      Reported-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      Cc: Chen, Gong <gong.chen@linux.intel.com>
      Cc: Josh Boyer <jwboyer@fedoraproject.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Alexander Todorov <atodorov@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Link: http://lkml.kernel.org/n/tip-jtjptvgigpfkpvtQxpEk1at2@git.kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      --
       arch/x86/kernel/cpu/mcheck/mce.c       |    4 +---
       arch/x86/kernel/cpu/mcheck/mce_intel.c |   18 +++++++++---------
       2 files changed, 10 insertions(+), 12 deletions(-)
      ea431643
  20. Apr 16, 2014
    • Ingo Molnar's avatar
      x86: Remove the PCI reboot method from the default chain · 5be44a6f
      Ingo Molnar authored
      
      Steve reported a reboot hang and bisected it back to this commit:
      
        a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list
      
      He heroically tested all reboot methods and found the following:
      
        reboot=t       # triple fault                  ok
        reboot=k       # keyboard ctrl                 FAIL
        reboot=b       # BIOS                          ok
        reboot=a       # ACPI                          FAIL
        reboot=e       # EFI                           FAIL   [system has no EFI]
        reboot=p       # PCI 0xcf9                     FAIL
      
      And I think it's pretty obvious that we should only try PCI 0xcf9 as a
      last resort - if at all.
      
      The other observation is that (on this box) we should never try
      the PCI reboot method, but close with either the 'triple fault'
      or the 'BIOS' (terminal!) reboot methods.
      
      Thirdly, CF9_COND is a total misnomer - it should be something like
      CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ...
      
      So this patch fixes the worst problems:
      
       - it orders the actual reboot logic to follow the reboot ordering
         pattern - it was in a pretty random order before for no good
         reason.
      
       - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and
         BOOT_CF9_SAFE flags to make the code more obvious.
      
       - it tries the BIOS reboot method before the PCI reboot method.
         (Since 'BIOS' is a terminal reboot method resulting in a hang
          if it does not work, this is essentially equivalent to removing
          the PCI reboot method from the default reboot chain.)
      
       - just for the miraculous possibility of terminal (resulting
         in hang) reboot methods of triple fault or BIOS returning
         without having done their job, there's an ordering between
         them as well.
      
      Reported-and-bisected-and-tested-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      Cc: Li Aubrey <aubrey.li@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      5be44a6f
  21. Apr 15, 2014
  22. Apr 14, 2014
Loading