Skip to content
Snippets Groups Projects
  1. Mar 04, 2014
    • Matt Fleming's avatar
      x86/efi: Add mixed runtime services support · 4f9dbcfc
      Matt Fleming authored
      
      Setup the runtime services based on whether we're booting in EFI native
      mode or not. For non-native mode we need to thunk from 64-bit into
      32-bit mode before invoking the EFI runtime services.
      
      Using the runtime services after SetVirtualAddressMap() is slightly more
      complicated because we need to ensure that all the addresses we pass to
      the firmware are below the 4GB boundary so that they can be addressed
      with 32-bit pointers, see efi_setup_page_tables().
      
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      4f9dbcfc
    • Matt Fleming's avatar
      x86/efi: Firmware agnostic handover entry points · b8ff87a6
      Matt Fleming authored
      
      The EFI handover code only works if the "bitness" of the firmware and
      the kernel match, i.e. 64-bit firmware and 64-bit kernel - it is not
      possible to mix the two. This goes against the tradition that a 32-bit
      kernel can be loaded on a 64-bit BIOS platform without having to do
      anything special in the boot loader. Linux distributions, for one thing,
      regularly run only 32-bit kernels on their live media.
      
      Despite having only one 'handover_offset' field in the kernel header,
      EFI boot loaders use two separate entry points to enter the kernel based
      on the architecture the boot loader was compiled for,
      
          (1) 32-bit loader: handover_offset
          (2) 64-bit loader: handover_offset + 512
      
      Since we already have two entry points, we can leverage them to infer
      the bitness of the firmware we're running on, without requiring any boot
      loader modifications, by making (1) and (2) valid entry points for both
      CONFIG_X86_32 and CONFIG_X86_64 kernels.
      
      To be clear, a 32-bit boot loader will always use (1) and a 64-bit boot
      loader will always use (2). It's just that, if a single kernel image
      supports (1) and (2) that image can be used with both 32-bit and 64-bit
      boot loaders, and hence both 32-bit and 64-bit EFI.
      
      (1) and (2) must be 512 bytes apart at all times, but that is already
      part of the boot ABI and we could never change that delta without
      breaking existing boot loaders anyhow.
      
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      b8ff87a6
    • Matt Fleming's avatar
      x86/efi: Split the boot stub into 32/64 code paths · c116e8d6
      Matt Fleming authored
      
      Make the decision which code path to take at runtime based on
      efi_early->is64.
      
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      c116e8d6
    • Matt Fleming's avatar
      x86/efi: Add early thunk code to go from 64-bit to 32-bit · 0154416a
      Matt Fleming authored
      
      Implement the transition code to go from IA32e mode to protected mode in
      the EFI boot stub. This is required to use 32-bit EFI services from a
      64-bit kernel.
      
      Since EFI boot stub is executed in an identity-mapped region, there's
      not much we need to do before invoking the 32-bit EFI boot services.
      However, we do reload the firmware's global descriptor table
      (efi32_boot_gdt) in case things like timer events are still running in
      the firmware.
      
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      0154416a
    • Matt Fleming's avatar
      x86/efi: Build our own EFI services pointer table · 54b52d87
      Matt Fleming authored
      
      It's not possible to dereference the EFI System table directly when
      booting a 64-bit kernel on a 32-bit EFI firmware because the size of
      pointers don't match.
      
      In preparation for supporting the above use case, build a list of
      function pointers on boot so that callers don't have to worry about
      converting pointer sizes through multiple levels of indirection.
      
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      54b52d87
    • Matt Fleming's avatar
      efi: Add separate 32-bit/64-bit definitions · 677703ce
      Matt Fleming authored
      
      The traditional approach of using machine-specific types such as
      'unsigned long' does not allow the kernel to interact with firmware
      running in a different CPU mode, e.g. 64-bit kernel with 32-bit EFI.
      
      Add distinct EFI structure definitions for both 32-bit and 64-bit so
      that we can use them in the 32-bit and 64-bit code paths.
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      677703ce
    • Matt Fleming's avatar
      x86/efi: Delete dead code when checking for non-native · 099240ac
      Matt Fleming authored
      
      Both efi_free_boot_services() and efi_enter_virtual_mode() are invoked
      from init/main.c, but only if the EFI runtime services are available.
      This is not the case for non-native boots, e.g. where a 64-bit kernel is
      booted with 32-bit EFI firmware.
      
      Delete the dead code.
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      099240ac
    • Matt Fleming's avatar
      x86/mm/pageattr: Always dump the right page table in an oops · 426e34cc
      Matt Fleming authored
      
      Now that we have EFI-specific page tables we need to lookup the pgd when
      dumping those page tables, rather than assuming that swapper_pgdir is
      the current pgdir.
      
      Remove the double underscore prefix, which is usually reserved for
      static functions.
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      426e34cc
    • Matt Fleming's avatar
      x86, tools: Consolidate #ifdef code · 993c30a0
      Matt Fleming authored
      
      Instead of littering main() with #ifdef CONFIG_EFI_STUB, move the logic
      into separate functions that do nothing if the config option isn't set.
      This makes main() much easier to read.
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      993c30a0
    • Matt Fleming's avatar
      x86/boot: Cleanup header.S by removing some #ifdefs · 86134a1b
      Matt Fleming authored
      
      handover_offset is now filled out by build.c. Don't set a default value
      as it will be overwritten anyway.
      
      Acked-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      86134a1b
  2. Feb 14, 2014
    • Matt Fleming's avatar
      x86/efi: Check status field to validate BGRT header · 09503379
      Matt Fleming authored
      
      Madper reported seeing the following crash,
      
        BUG: unable to handle kernel paging request at ffffffffff340003
        IP: [<ffffffff81d85ba4>] efi_bgrt_init+0x9d/0x133
        Call Trace:
         [<ffffffff81d8525d>] efi_late_init+0x9/0xb
         [<ffffffff81d68f59>] start_kernel+0x436/0x450
         [<ffffffff81d6892c>] ? repair_env_string+0x5c/0x5c
         [<ffffffff81d68120>] ? early_idt_handlers+0x120/0x120
         [<ffffffff81d685de>] x86_64_start_reservations+0x2a/0x2c
         [<ffffffff81d6871e>] x86_64_start_kernel+0x13e/0x14d
      
      This is caused because the layout of the ACPI BGRT header on this system
      doesn't match the definition from the ACPI spec, and so we get a bogus
      physical address when dereferencing ->image_address in efi_bgrt_init().
      
      Luckily the status field in the BGRT header clearly marks it as invalid,
      so we can check that field and skip BGRT initialisation.
      
      Reported-by: default avatarMadper Xie <cxie@redhat.com>
      Suggested-by: default avatarToshi Kani <toshi.kani@hp.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      09503379
    • Borislav Petkov's avatar
      x86/efi: Fix 32-bit fallout · c55d016f
      Borislav Petkov authored
      
      We do not enable the new efi memmap on 32-bit and thus we need to run
      runtime_code_page_mkexec() unconditionally there. Fix that.
      
      Reported-and-tested-by: default avatarLejun Zhu <lejun.zhu@intel.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      c55d016f
  3. Feb 13, 2014
  4. Feb 12, 2014
    • Steven Rostedt (Red Hat)'s avatar
      ftrace/x86: Use breakpoints for converting function graph caller · 87fbb2ac
      Steven Rostedt (Red Hat) authored
      
      When the conversion was made to remove stop machine and use the breakpoint
      logic instead, the modification of the function graph caller is still
      done directly as though it was being done under stop machine.
      
      As it is not converted via stop machine anymore, there is a possibility
      that the code could be layed across cache lines and if another CPU is
      accessing that function graph call when it is being updated, it could
      cause a General Protection Fault.
      
      Convert the update of the function graph caller to use the breakpoint
      method as well.
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # 3.5+
      Fixes: 08d636b6 "ftrace/x86: Have arch x86_64 use breakpoints instead of stop machine"
      Signed-off-by: default avatarSteven Rostedt <rostedt@goodmis.org>
      87fbb2ac
  5. Feb 11, 2014
    • Mel Gorman's avatar
      xen: properly account for _PAGE_NUMA during xen pte translations · a9c8e4be
      Mel Gorman authored
      
      Steven Noonan forwarded a users report where they had a problem starting
      vsftpd on a Xen paravirtualized guest, with this in dmesg:
      
        BUG: Bad page map in process vsftpd  pte:8000000493b88165 pmd:e9cc01067
        page:ffffea00124ee200 count:0 mapcount:-1 mapping:     (null) index:0x0
        page flags: 0x2ffc0000000014(referenced|dirty)
        addr:00007f97eea74000 vm_flags:00100071 anon_vma:ffff880e98f80380 mapping:          (null) index:7f97eea74
        CPU: 4 PID: 587 Comm: vsftpd Not tainted 3.12.7-1-ec2 #1
        Call Trace:
          dump_stack+0x45/0x56
          print_bad_pte+0x22e/0x250
          unmap_single_vma+0x583/0x890
          unmap_vmas+0x65/0x90
          exit_mmap+0xc5/0x170
          mmput+0x65/0x100
          do_exit+0x393/0x9e0
          do_group_exit+0xcc/0x140
          SyS_exit_group+0x14/0x20
          system_call_fastpath+0x1a/0x1f
        Disabling lock debugging due to kernel taint
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:0 val:-1
        BUG: Bad rss-counter state mm:ffff880e9ca60580 idx:1 val:1
      
      The issue could not be reproduced under an HVM instance with the same
      kernel, so it appears to be exclusive to paravirtual Xen guests.  He
      bisected the problem to commit 1667918b ("mm: numa: clear numa
      hinting information on mprotect") that was also included in 3.12-stable.
      
      The problem was related to how xen translates ptes because it was not
      accounting for the _PAGE_NUMA bit.  This patch splits pte_present to add
      a pteval_present helper for use by xen so both bare metal and xen use
      the same code when checking if a PTE is present.
      
      [mgorman@suse.de: wrote changelog, proposed minor modifications]
      [akpm@linux-foundation.org: fix typo in comment]
      Reported-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Tested-by: default avatarSteven Noonan <steven@uplinklabs.net>
      Signed-off-by: default avatarElena Ufimtseva <ufimtseva@gmail.com>
      Signed-off-by: default avatarMel Gorman <mgorman@suse.de>
      Reviewed-by: default avatarDavid Vrabel <david.vrabel@citrix.com>
      Acked-by: default avatarKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a9c8e4be
  6. Feb 09, 2014
  7. Feb 06, 2014
    • Tang Chen's avatar
      arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved. · 7bc35fdd
      Tang Chen authored
      
      The following path will cause array out of bound.
      
      memblock_add_region() will always set nid in memblock.reserved to
      MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
      correct valus in memblock.reserved, we called setup_node_data(), and
      used memblock_alloc_nid() to allocate memory, with nid set to
      MAX_NUMNODES.
      
      The nodemask_t type can be seen as a bit array.  And the index is 0 ~
      MAX_NUMNODES-1.
      
      After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
      the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
      MAX_NUMNODES-1].
      
      See below:
      
      numa_init()
       |---> numa_register_memblks()
       |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
       |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
       |      |......
       |      |---> setup_node_data()
       |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
       |......
       |---> numa_clear_kernel_node_hotplug()
              |---> node_set()			here, we have an index 1024, and overflowed
      
      This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
      this problem.
      
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: default avatarDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7bc35fdd
    • Tang Chen's avatar
      arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug() · 017c217a
      Tang Chen authored
      
      On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
      was not initialized.  So we need to initialize it.
      
      [akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
      Signed-off-by: default avatarTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: default avatarGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: default avatarDave Jones <davej@redhat.com>
      Reported-by: default avatarDavid Rientjes <rientjes@google.com>
      Tested-by: default avatarDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      017c217a
    • Borislav Petkov's avatar
      x86, microcode, AMD: Unify valid container checks · 75a1ba5b
      Borislav Petkov authored
      
      For additional coverage, BorisO and friends unknowlingly did swap AMD
      microcode with Intel microcode blobs in order to see what happens. What
      did happen on 32-bit was
      
      [    5.722656] BUG: unable to handle kernel paging request at be3a6008
      [    5.722693] IP: [<c106d6b4>] load_microcode_amd+0x24/0x3f0
      [    5.722716] *pdpt = 0000000000000000 *pde = 0000000000000000
      
      because there was a valid initrd there but without valid microcode in it
      and the container check happened *after* the relocated ramdisk handling
      on 32-bit, which was clearly wrong.
      
      While at it, take care of the ramdisk relocation on both 32- and 64-bit
      as it is done on both. Also, comment what we're doing because this code
      is a bit tricky.
      
      Reported-and-tested-by: default avatarBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: default avatarBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1391460104-7261-1-git-send-email-bp@alien8.de
      
      
      Signed-off-by: default avatarH. Peter Anvin <hpa@zytor.com>
      75a1ba5b
    • Matt Fleming's avatar
      x86/efi: Allow mapping BGRT on x86-32 · 081cd62a
      Matt Fleming authored
      CONFIG_X86_32 doesn't map the boot services regions into the EFI memory
      map (see commit 70087011 ("x86, efi: Don't map Boot Services on
      i386")), and so efi_lookup_mapped_addr() will fail to return a valid
      address. Executing the ioremap() path in efi_bgrt_init() causes the
      following warning on x86-32 because we're trying to ioremap() RAM,
      
       WARNING: CPU: 0 PID: 0 at arch/x86/mm/ioremap.c:102 __ioremap_caller+0x2ad/0x2c0()
       Modules linked in:
       CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-0.rc5.git0.1.2.fc21.i686 #1
       Hardware name: DellInc. Venue 8 Pro 5830/09RP78, BIOS A02 10/17/2013
        00000000 00000000 c0c0df08 c09a5196 00000000 c0c0df38 c0448c1e c0b41310
        00000000 00000000 c0b37bc1 00000066 c043bbfd c043bbfd 00e7dfe0 00073eff
        00073eff c0c0df48 c0448ce2 00000009 00000000 c0c0df9c c043bbfd 00078d88
       Call Trace:
        [<c09a5196>] dump_stack+0x41/0x52
        [<c0448c1e>] warn_slowpath_common+0x7e/0xa0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c043bbfd>] ? __ioremap_caller+0x2ad/0x2c0
        [<c0448ce2>] warn_slowpath_null+0x22/0x30
        [<c043bbfd>] __ioremap_caller+0x2ad/0x2c0
        [<c0718f92>] ? acpi_tb_verify_table+0x1c/0x43
        [<c0719c78>] ? acpi_get_table_with_size+0x63/0xb5
        [<c087cd5e>] ? efi_lookup_mapped_addr+0xe/0xf0
        [<c043bc2b>] ioremap_nocache+0x1b/0x20
        [<c0cb01c8>] ? efi_bgrt_init+0x83/0x10c
        [<c0cb01c8>] efi_bgrt_init+0x83/0x10c
        [<c0cafd82>] efi_late_init+0x8/0xa
        [<c0c9bab2>] start_kernel+0x3ae/0x3c3
        [<c0c9b53b>] ? repair_env_string+0x51/0x51
        [<c0c9b378>] i386_start_kernel+0x12e/0x131
      
      Switch to using early_memremap(), which won't trigger this warning, and
      has the added benefit of more accurately conveying what we're trying to
      do - map a chunk of memory.
      
      This patch addresses the following bug report,
      
        https://bugzilla.kernel.org/show_bug.cgi?id=67911
      
      
      
      Reported-by: default avatarAdam Williamson <awilliam@redhat.com>
      Cc: Josh Triplett <josh@joshtriplett.org>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Signed-off-by: default avatarMatt Fleming <matt.fleming@intel.com>
      081cd62a
  8. Feb 05, 2014
  9. Feb 03, 2014
  10. Feb 02, 2014
  11. Jan 31, 2014
  12. Jan 30, 2014
  13. Jan 29, 2014
Loading