Skip to content
Snippets Groups Projects
  1. Feb 13, 2013
    • Mel Gorman's avatar
      x86/mm: Check if PUD is large when validating a kernel address · 0ee364eb
      Mel Gorman authored
      
      A user reported the following oops when a backup process reads
      /proc/kcore:
      
       BUG: unable to handle kernel paging request at ffffbb00ff33b000
       IP: [<ffffffff8103157e>] kern_addr_valid+0xbe/0x110
       [...]
      
       Call Trace:
        [<ffffffff811b8aaa>] read_kcore+0x17a/0x370
        [<ffffffff811ad847>] proc_reg_read+0x77/0xc0
        [<ffffffff81151687>] vfs_read+0xc7/0x130
        [<ffffffff811517f3>] sys_read+0x53/0xa0
        [<ffffffff81449692>] system_call_fastpath+0x16/0x1b
      
      Investigation determined that the bug triggered when reading
      system RAM at the 4G mark. On this system, that was the first
      address using 1G pages for the virt->phys direct mapping so the
      PUD is pointing to a physical address, not a PMD page.
      
      The problem is that the page table walker in kern_addr_valid() is
      not checking pud_large() and treats the physical address as if
      it was a PMD.  If it happens to look like pmd_none then it'll
      silently fail, probably returning zeros instead of real data. If
      the data happens to look like a present PMD though, it will be
      walked resulting in the oops above.
      
      This patch adds the necessary pud_large() check.
      
      Unfortunately the problem was not readily reproducible and now
      they are running the backup program without accessing
      /proc/kcore so the patch has not been validated but I think it
      makes sense.
      
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarRik van Riel <riel@redhat.coM>
      Reviewed-by: default avatarMichal Hocko <mhocko@suse.cz>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: stable@vger.kernel.org
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20130211145236.GX21389@suse.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0ee364eb
  2. Feb 11, 2013
    • Stoney Wang's avatar
      x86/apic: Work around boot failure on HP ProLiant DL980 G7 Server systems · cb214ede
      Stoney Wang authored
      
      When a HP ProLiant DL980 G7 Server boots a regular kernel,
      there will be intermittent lost interrupts which could
      result in a hang or (in extreme cases) data loss.
      
      The reason is that this system only supports x2apic physical
      mode, while the kernel boots with a logical-cluster default
      setting.
      
      This bug can be worked around by specifying the "x2apic_phys" or
      "nox2apic" boot option, but we want to handle this system
      without requiring manual workarounds.
      
      The BIOS sets ACPI_FADT_APIC_PHYSICAL in FADT table.
      As all apicids are smaller than 255, BIOS need to pass the
      control to the OS with xapic mode, according to x2apic-spec,
      chapter 2.9.
      
      Current code handle x2apic when BIOS pass with xapic mode
      enabled:
      
      When user specifies x2apic_phys, or FADT indicates PHYSICAL:
      
      1. During madt oem check, apic driver is set with xapic logical
         or xapic phys driver at first.
      
      2. enable_IR_x2apic() will enable x2apic_mode.
      
      3. if user specifies x2apic_phys on the boot line, x2apic_phys_probe()
         will install the correct x2apic phys driver and use x2apic phys mode.
         Otherwise it will skip the driver will let x2apic_cluster_probe to
         take over to install x2apic cluster driver (wrong one) even though FADT
         indicates PHYSICAL, because x2apic_phys_probe does not check
         FADT PHYSICAL.
      
      Add checking x2apic_fadt_phys in x2apic_phys_probe() to fix the
      problem.
      
      Signed-off-by: default avatarStoney Wang <song-bo.wang@hp.com>
      [ updated the changelog and simplified the code ]
      Signed-off-by: default avatarYinghai Lu <yinghai@kernel.org>
      Cc: stable@kernel.org
      Link: http://lkml.kernel.org/r/1360263182-16226-1-git-send-email-yinghai@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      cb214ede
  3. Feb 07, 2013
  4. Feb 04, 2013
    • Borislav Petkov's avatar
      x86/intel/cacheinfo: Shut up annoying warning · f76e39c5
      Borislav Petkov authored
      
      I've been getting the following warning when doing randbuilds
      since forever. Now it finally pissed me off just the perfect
      amount so that I can fix it.
      
        arch/x86/kernel/cpu/intel_cacheinfo.c:489:27: warning: ‘cache_disable_0’ defined but not used [-Wunused-variable]
        arch/x86/kernel/cpu/intel_cacheinfo.c:491:27: warning: ‘cache_disable_1’ defined but not used [-Wunused-variable] arch/x86/kernel/cpu/intel_cacheinfo.c:524:27: warning: ‘subcaches’ defined but not used [-Wunused-variable]
      
      It happens because in randconfigs where CONFIG_SYSFS is not set,
      the whole sysfs-interface to L3 cache index disabling is
      remaining unused and gcc correctly warns about it. Make it
      optional, depending on CONFIG_SYSFS too, as is the case with
      other sysfs-related machinery in this file.
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Link: http://lkml.kernel.org/r/1359969195-27362-1-git-send-email-bp@alien8.de
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      f76e39c5
  5. Jan 31, 2013
  6. Jan 30, 2013
    • Matt Fleming's avatar
      efi: Make 'efi_enabled' a function to query EFI facilities · 83e68189
      Matt Fleming authored
      Originally 'efi_enabled' indicated whether a kernel was booted from
      EFI firmware. Over time its semantics have changed, and it now
      indicates whether or not we are booted on an EFI machine with
      bit-native firmware, e.g. 64-bit kernel with 64-bit firmware.
      
      The immediate motivation for this patch is the bug report at,
      
          https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557
      
      which details how running a platform driver on an EFI machine that is
      designed to run under BIOS can cause the machine to become
      bricked. Also, the following report,
      
          https://bugzilla.kernel.org/show_bug.cgi?id=47121
      
      
      
      details how running said driver can also cause Machine Check
      Exceptions. Drivers need a new means of detecting whether they're
      running on an EFI machine, as sadly the expression,
      
          if (!efi_enabled)
      
      hasn't been a sufficient condition for quite some time.
      
      Users actually want to query 'efi_enabled' for different reasons -
      what they really want access to is the list of available EFI
      facilities.
      
      For instance, the x86 reboot code needs to know whether it can invoke
      the ResetSystem() function provided by the EFI runtime services, while
      the ACPI OSL code wants to know whether the EFI config tables were
      mapped successfully. There are also checks in some of the platform
      driver code to simply see if they're running on an EFI machine (which
      would make it a bad idea to do BIOS-y things).
      
      This patch is a prereq for the samsung-laptop fix patch.
      
      Cc: David Airlie <airlied@linux.ie>
      Cc: Corentin Chary <corentincj@iksaif.net>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Dave Jiang <dave.jiang@intel.com>
      Cc: Olof Johansson <olof@lixom.net>
      Cc: Peter Jones <pjones@redhat.com>
      Cc: Colin Ian King <colin.king@canonical.com>
      Cc: Steve Langasek <steve.langasek@canonical.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Konrad Rzeszutek Wilk <konrad@kernel.org>
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      83e68189
  7. Jan 29, 2013
  8. Jan 28, 2013
  9. Jan 27, 2013
  10. Jan 25, 2013
  11. Jan 24, 2013
  12. Jan 22, 2013
    • Oleg Nesterov's avatar
      ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL · 9899d11f
      Oleg Nesterov authored
      
      putreg() assumes that the tracee is not running and pt_regs_access() can
      safely play with its stack.  However a killed tracee can return from
      ptrace_stop() to the low-level asm code and do RESTORE_REST, this means
      that debugger can actually read/modify the kernel stack until the tracee
      does SAVE_REST again.
      
      set_task_blockstep() can race with SIGKILL too and in some sense this
      race is even worse, the very fact the tracee can be woken up breaks the
      logic.
      
      As Linus suggested we can clear TASK_WAKEKILL around the arch_ptrace()
      call, this ensures that nobody can ever wakeup the tracee while the
      debugger looks at it.  Not only this fixes the mentioned problems, we
      can do some cleanups/simplifications in arch_ptrace() paths.
      
      Probably ptrace_unfreeze_traced() needs more callers, for example it
      makes sense to make the tracee killable for oom-killer before
      access_process_vm().
      
      While at it, add the comment into may_ptrace_stop() to explain why
      ptrace_stop() still can't rely on SIGKILL and signal_pending_state().
      
      Reported-by: default avatarSalman Qazi <sqazi@google.com>
      Reported-by: default avatarSuleiman Souhlal <suleiman@google.com>
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarOleg Nesterov <oleg@redhat.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9899d11f
  13. Jan 19, 2013
    • H. Peter Anvin's avatar
      x86-32: Start out cr0 clean, disable paging before modifying cr3/4 · 021ef050
      H. Peter Anvin authored
      
      Patch
      
        5a5a51db x86-32: Start out eflags and cr4 clean
      
      ... made x86-32 match x86-64 in that we initialize %eflags and %cr4
      from scratch.  This broke OLPC XO-1.5, because the XO enters the
      kernel with paging enabled, which the kernel doesn't expect.
      
      Since we no longer support 386 (the source of most of the variability
      in %cr0 configuration), we can simply match further x86-64 and
      initialize %cr0 to a fixed value -- the one variable part remaining in
      %cr0 is for FPU control, but all that is handled later on in
      initialization; in particular, configuring %cr0 as if the FPU is
      present until proven otherwise is correct and necessary for the probe
      to work.
      
      To deal with the XO case sanely, explicitly disable paging in %cr0
      before we muck with %cr3, %cr4 or EFER -- those operations are
      inherently unsafe with paging enabled.
      
      NOTE: There is still a lot of 386-related junk in head_32.S which we
      can and should get rid of, however, this is intended as a minimal fix
      whereas the cleanup can be deferred to the next merge window.
      
      Reported-by: default avatarAndres Salomon <dilinger@queued.net>
      Tested-by: default avatarDaniel Drake <dsd@laptop.org>
      Link: http://lkml.kernel.org/r/50FA0661.2060400@linux.intel.com
      
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      021ef050
  14. Jan 18, 2013
    • Nathan Zimmer's avatar
      efi, x86: Pass a proper identity mapping in efi_call_phys_prelog · b8f2c21d
      Nathan Zimmer authored
      
      Update efi_call_phys_prelog to install an identity mapping of all available
      memory.  This corrects a bug on very large systems with more then 512 GB in
      which bios would not be able to access addresses above not in the mapping.
      
      The result is a crash that looks much like this.
      
      BUG: unable to handle kernel paging request at 000000effd870020
      IP: [<0000000078bce331>] 0x78bce330
      PGD 0
      Oops: 0000 [#1] SMP
      Modules linked in:
      CPU 0
      Pid: 0, comm: swapper/0 Tainted: G        W    3.8.0-rc1-next-20121224-medusa_ntz+ #2 Intel Corp. Stoutland Platform
      RIP: 0010:[<0000000078bce331>]  [<0000000078bce331>] 0x78bce330
      RSP: 0000:ffffffff81601d28  EFLAGS: 00010006
      RAX: 0000000078b80e18 RBX: 0000000000000004 RCX: 0000000000000004
      RDX: 0000000078bcf958 RSI: 0000000000002400 RDI: 8000000000000000
      RBP: 0000000078bcf760 R08: 000000effd870000 R09: 0000000000000000
      R10: 0000000000000000 R11: 00000000000000c3 R12: 0000000000000030
      R13: 000000effd870000 R14: 0000000000000000 R15: ffff88effd870000
      FS:  0000000000000000(0000) GS:ffff88effe400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 000000effd870020 CR3: 000000000160c000 CR4: 00000000000006b0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper/0 (pid: 0, threadinfo ffffffff81600000, task ffffffff81614400)
      Stack:
       0000000078b80d18 0000000000000004 0000000078bced7b ffff880078b81fff
       0000000000000000 0000000000000082 0000000078bce3a8 0000000000002400
       0000000060000202 0000000078b80da0 0000000078bce45d ffffffff8107cb5a
      Call Trace:
       [<ffffffff8107cb5a>] ? on_each_cpu+0x77/0x83
       [<ffffffff8102f4eb>] ? change_page_attr_set_clr+0x32f/0x3ed
       [<ffffffff81035946>] ? efi_call4+0x46/0x80
       [<ffffffff816c5abb>] ? efi_enter_virtual_mode+0x1f5/0x305
       [<ffffffff816aeb24>] ? start_kernel+0x34a/0x3d2
       [<ffffffff816ae5ed>] ? repair_env_string+0x60/0x60
       [<ffffffff816ae2be>] ? x86_64_start_reservations+0xba/0xc1
       [<ffffffff816ae120>] ? early_idt_handlers+0x120/0x120
       [<ffffffff816ae419>] ? x86_64_start_kernel+0x154/0x163
      Code:  Bad RIP value.
      RIP  [<0000000078bce331>] 0x78bce330
       RSP <ffffffff81601d28>
      CR2: 000000effd870020
      ---[ end trace ead828934fef5eab ]---
      
      Cc: stable@vger.kernel.org
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarNathan Zimmer <nzimmer@sgi.com>
      Signed-off-by: default avatarRobin Holt <holt@sgi.com>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      b8f2c21d
  15. Jan 16, 2013
    • Andrew Cooper's avatar
      xen: Fix stack corruption in xen_failsafe_callback for 32bit PVOPS guests. · 9174adbe
      Andrew Cooper authored
      
      This fixes CVE-2013-0190 / XSA-40
      
      There has been an error on the xen_failsafe_callback path for failed
      iret, which causes the stack pointer to be wrong when entering the
      iret_exc error path.  This can result in the kernel crashing.
      
      In the classic kernel case, the relevant code looked a little like:
      
              popl %eax      # Error code from hypervisor
              jz 5f
              addl $16,%esp
              jmp iret_exc   # Hypervisor said iret fault
      5:      addl $16,%esp
                             # Hypervisor said segment selector fault
      
      Here, there are two identical addls on either option of a branch which
      appears to have been optimised by hoisting it above the jz, and
      converting it to an lea, which leaves the flags register unaffected.
      
      In the PVOPS case, the code looks like:
      
              popl_cfi %eax         # Error from the hypervisor
              lea 16(%esp),%esp     # Add $16 before choosing fault path
              CFI_ADJUST_CFA_OFFSET -16
              jz 5f
              addl $16,%esp         # Incorrectly adjust %esp again
              jmp iret_exc
      
      It is possible unprivileged userspace applications to cause this
      behaviour, for example by loading an LDT code selector, then changing
      the code selector to be not-present.  At this point, there is a race
      condition where it is possible for the hypervisor to return back to
      userspace from an interrupt, fault on its own iret, and inject a
      failsafe_callback into the kernel.
      
      This bug has been present since the introduction of Xen PVOPS support
      in commit 5ead97c8 (xen: Core Xen implementation), in 2.6.23.
      
      Signed-off-by: default avatarFrediano Ziglio <frediano.ziglio@citrix.com>
      Signed-off-by: default avatarAndrew Cooper <andrew.cooper3@citrix.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      9174adbe
    • Konrad Rzeszutek Wilk's avatar
      Revert "xen/smp: Fix CPU online/offline bug triggering a BUG: scheduling while atomic." · d55bf532
      Konrad Rzeszutek Wilk authored
      
      This reverts commit 41bd956d.
      
      The fix is incorrect and not appropiate for the latest kernels.
      In fact it _causes_ the BUG: scheduling while atomic while
      doing vCPU hotplug.
      
      Suggested-by: default avatarWei Liu <wei.liu2@citrix.com>
      Signed-off-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      d55bf532
  16. Jan 14, 2013
  17. Jan 11, 2013
    • Jesse Barnes's avatar
      x86/Sandy Bridge: reserve pages when integrated graphics is present · a9acc536
      Jesse Barnes authored
      
      SNB graphics devices have a bug that prevent them from accessing certain
      memory ranges, namely anything below 1M and in the pages listed in the
      table.  So reserve those at boot if set detect a SNB gfx device on the
      CPU to avoid GPU hangs.
      
      Stephane Marchesin had a similar patch to the page allocator awhile
      back, but rather than reserving pages up front, it leaked them at
      allocation time.
      
      [ hpa: made a number of stylistic changes, marked arrays as static
        const, and made less verbose; use "memblock=debug" for full
        verbosity. ]
      
      Signed-off-by: default avatarJesse Barnes <jbarnes@virtuousgeek.org>
      Signed-off-by: default avatarH. Peter Anvin <hpa@linux.intel.com>
      a9acc536
  18. Jan 10, 2013
    • David Ahern's avatar
      perf x86: revert 20b279 - require exclude_guest to use PEBS - kernel side · a706d965
      David Ahern authored
      This patch is brought to you by the letter 'H'.
      
      Commit 20b279 breaks compatiblity with older perf binaries when run with
      precise modifier (:p or :pp) by requiring the exclude_guest attribute to be
      set. Older binaries default exclude_guest to 0 (ie., wanting guest-based
      samples) unless host only profiling is requested (:H modifier). The workaround
      for older binaries is to add H to the modifier list (e.g., -e cycles:ppH -
      toggles exclude_guest to 1). This was deemed unacceptable by Linus:
      
      https://lkml.org/lkml/2012/12/12/570
      
      
      
      Between family in town and the fresh snow in Breckenridge there is no time left
      to be working on the proper fix for this over the holidays. In the New Year I
      have more pressing problems to resolve -- like some memory leaks in perf which
      are proving to be elusive -- although the aforementioned snow is probably why
      they are proving to be elusive. Either way I do not have any spare time to work
      on this and from the time I have managed to spend on it the solution is more
      difficult than just moving to a new exclude_guest flag (does not work) or
      flipping the logic to include_guest (which is not as trivial as one would
      think).
      
      So, two options: silently force exclude_guest on as suggested by Gleb which
      means no impact to older perf binaries or revert the original patch which
      caused the breakage.
      
      This patch does the latter -- reverts the original patch that introduced the
      regression. The problem can be revisited in the future as time allows.
      
      Signed-off-by: default avatarDavid Ahern <dsahern@gmail.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Robert Richter <robert.richter@amd.com>
      Link: http://lkml.kernel.org/r/1356749767-17322-1-git-send-email-dsahern@gmail.com
      
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      a706d965
  19. Jan 09, 2013
    • Borislav Petkov's avatar
      x86, MCE: Retract most UAPI exports · f51bde6f
      Borislav Petkov authored
      
      Retract back most macro definitions which went into the
      user-visible mce.h header. Even though those bits are mostly
      hardware-defined/-architectural, their naming is not. If we export them
      to userspace, any kernel unification/renaming/cleanup cannot be done
      anymore since those are effectively cast in stone. Besides, if userspace
      wants those definitions, they can write their own defines and go crazy.
      
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      f51bde6f
  20. Jan 08, 2013
  21. Jan 04, 2013
    • Greg Kroah-Hartman's avatar
      X86: drivers: remove __dev* attributes. · a18e3690
      Greg Kroah-Hartman authored
      
      CONFIG_HOTPLUG is going away as an option.  As a result, the __dev*
      markings need to be removed.
      
      This change removes the use of __devinit, __devexit_p, __devinitconst,
      and __devexit from these drivers.
      
      Based on patches originally written by Bill Pemberton, but redone by me
      in order to handle some of the coding style issues better, by hand.
      
      Cc: Bill Pemberton <wfp5p@virginia.edu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Daniel Drake <dsd@laptop.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a18e3690
  22. Dec 26, 2012
  23. Dec 20, 2012
Loading