Commits · 947166043732b69878123bf31f51933ad0316080 · E-EXK4 - Operating System Group / projects / Linux

Feb 28, 2014

powerpc/powernv: Dump PHB diag-data immediately · 94716604

Gavin Shan authored 11 years ago


The PHB diag-data is important to help locating the root cause for
EEH errors such as frozen PE or fenced PHB. However, the EEH core
enables IO path by clearing part of HW registers before collecting
this data causing it to be corrupted.

This patch fixes this by dumping the PHB diag-data immediately when
frozen/fenced state on PE or PHB is detected for the first time in
eeh_ops::get_state() or next_error() backend.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

94716604

powerpc: Increase stack redzone for 64-bit userspace to 512 bytes · 573ebfa6

Paul Mackerras authored 11 years ago


The new ELFv2 little-endian ABI increases the stack redzone -- the
area below the stack pointer that can be used for storing data --
from 288 bytes to 512 bytes.  This means that we need to allow more
space on the user stack when delivering a signal to a 64-bit process.

To make the code a bit clearer, we define new USER_REDZONE_SIZE and
KERNEL_REDZONE_SIZE symbols in ptrace.h.  For now, we leave the
kernel redzone size at 288 bytes, since increasing it to 512 bytes
would increase the size of interrupt stack frames correspondingly.

Gcc currently only makes use of 288 bytes of redzone even when
compiling for the new little-endian ABI, and the kernel cannot
currently be compiled with the new ABI anyway.

In the future, hopefully gcc will provide an option to control the
amount of redzone used, and then we could reduce it even more.

This also changes the code in arch_compat_alloc_user_space() to
preserve the expanded redzone.  It is not clear why this function would
ever be used on a 64-bit process, though.

Signed-off-by: Paul Mackerras <paulus@samba.org>
CC: <stable@vger.kernel.org> [v3.13]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

573ebfa6

powerpc/ftrace: bugfix for test_24bit_addr · a95fc585

Liu Ping Fan authored 11 years ago

The branch target should be the func addr, not the addr of func_descr_t.
So using ppc_function_entry() to generate the right target addr.

Signed-off-by: Liu Ping Fan <pingfank@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

a95fc585

powerpc/crashdump : Fix page frame number check in copy_oldmem_page · f5295bd8

Laurent Dufour authored 11 years ago


In copy_oldmem_page, the current check using max_pfn and min_low_pfn to
decide if the page is backed or not, is not valid when the memory layout is
not continuous.

This happens when running as a QEMU/KVM guest, where RTAS is mapped higher
in the memory. In that case max_pfn points to the end of RTAS, and a hole
between the end of the kdump kernel and RTAS is not backed by PTEs. As a
consequence, the kdump kernel is crashing in copy_oldmem_page when accessing
in a direct way the pages in that hole.

This fix relies on the memblock's service memblock_is_region_memory to
check if the read page is part or not of the directly accessible memory.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Tested-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
CC: <stable@vger.kernel.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

f5295bd8

powerpc/le: Ensure that the 'stop-self' RTAS token is handled correctly · 41dd03a9

Tony Breeds authored 11 years ago


Currently we're storing a host endian RTAS token in
rtas_stop_self_args.token.  We then pass that directly to rtas.  This is
fine on big endian however on little endian the token is not what we
expect.

This will typically result in hitting:
	panic("Alas, I survived.\n");

To fix this we always use the stop-self token in host order and always
convert it to be32 before passing this to rtas.

Signed-off-by: Tony Breeds <tony@bakeyournoodle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

41dd03a9

Feb 17, 2014

powerpc/eeh: Disable EEH on reboot · 66f9af83

Gavin Shan authored 11 years ago


We possiblly detect EEH errors during reboot, particularly in kexec
path, but it's impossible for device drivers and EEH core to handle
or recover them properly.

The patch registers one reboot notifier for EEH and disable EEH
subsystem during reboot. That means the EEH errors is going to be
cleared by hardware reset or second kernel during early stage of
PCI probe.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

66f9af83

powerpc/eeh: Cleanup on eeh_subsystem_enabled · 2ec5a0ad

Gavin Shan authored 11 years ago

The patch cleans up variable eeh_subsystem_enabled so that we needn't
refer the variable directly from external. Instead, we will use
function eeh_enabled() and eeh_set_enable() to operate the variable.

Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

2ec5a0ad

powerpc/powernv: Rework EEH reset · 5b2e198e

Gavin Shan authored 11 years ago


When doing reset in order to recover the affected PE, we issue
hot reset on PE primary bus if it's not root bus. Otherwise, we
issue hot or fundamental reset on root port or PHB accordingly.
For the later case, we didn't cover the situation where PE only
includes root port and it potentially causes kernel crash upon
EEH error to the PE.

The patch reworks the logic of EEH reset to improve the code
readability and also avoid the kernel crash.

Cc: stable@vger.kernel.org
Reported-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

5b2e198e

powerpc: Use unstripped VDSO image for more accurate profiling data · 24b659a1

Anton Blanchard authored 11 years ago


We are seeing a lot of hits in the VDSO that are not resolved by perf.
A while(1) gettimeofday() loop shows the issue:

27.64%  [vdso]  [.] 0x000000000000060c
22.57%  [vdso]  [.] 0x0000000000000628
16.88%  [vdso]  [.] 0x0000000000000610
12.39%  [vdso]  [.] __kernel_gettimeofday
 6.09%  [vdso]  [.] 0x00000000000005f8
 3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
 2.94%  [vdso]  [.] __kernel_datapage_offset
 2.90%  test    [.] main

We are using a stripped VDSO image which means only symbols with
relocation info can be resolved. There isn't a lot of point to
stripping the VDSO, the debug info is only about 1kB:

4680 arch/powerpc/kernel/vdso64/vdso64.so
5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg

By using the unstripped image, we can resolve all the symbols in the
VDSO and the perf profile data looks much better:

76.53%  [vdso]  [.] __do_get_tspec
12.20%  [vdso]  [.] __kernel_gettimeofday
 5.05%  [vdso]  [.] __get_datapage
 3.20%  test    [.] main
 2.92%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

24b659a1

powerpc: Link VDSOs at 0x0 · a0a4419e

Anton Blanchard authored 11 years ago


perf is failing to resolve symbols in the VDSO. A while (1)
gettimeofday() loop shows:

93.99%  [vdso]  [.] 0x00000000000005e0
 3.12%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
 2.81%  test    [.] main

The reason for this is that we are linking our VDSO shared libraries
at 1MB, which is a little weird. Even though this is uncommon, Alan
points out that it is valid and we should probably fix perf userspace.

Regardless, I can't see a reason why we are doing this. The code
is all position independent and we never rely on the VDSO ending
up at 1M (and we never place it there on 64bit tasks).

Changing our link address to 0x0 fixes perf VDSO symbol resolution:

73.18%  [vdso]  [.] 0x000000000000060c
12.39%  [vdso]  [.] __kernel_gettimeofday
 3.58%  test    [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18
 2.94%  [vdso]  [.] __kernel_datapage_offset
 2.90%  test    [.] main

We still have some local symbol resolution issues that will be
fixed in a subsequent patch.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

a0a4419e

mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit · 56eecdb9

Aneesh Kumar K.V authored 11 years ago

Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using
a hash table MMU for various reasons (the flush is handled as part of
the PTE modification when necessary).

ppc64 thus doesn't implement flush_tlb_range for hash based MMUs.

Additionally ppc64 require the tlb flushing to be batched within ptl locks.

The reason to do that is to ensure that the hash page table is in sync with
linux page table.

We track the hpte index in linux pte and if we clear them without flushing
hash and drop the ptl lock, we can have another cpu update the pte and can
end up with duplicate entry in the hash table, which is fatal.

We also want to keep set_pte_at simpler by not requiring them to do hash
flush for performance reason. We do that by assuming that set_pte_at() is
never *ever* called on a PTE that is already valid.

This was the case until the NUMA code went in which broke that assumption.

Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a
way similar to ptep/pmdp_set_wrprotect(), with a generic implementation
using set_pte_at() and a powerpc specific one using the appropriate
mechanism needed to keep the hash table in sync.

Acked-by: Mel Gorman <mgorman@suse.de>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

56eecdb9

powerpc/mm: Add new "set" flag argument to pte/pmd update function · 88247e8d

Aneesh Kumar K.V authored 11 years ago


pte_update() is a powerpc-ism used to change the bits of a PTE
when the access permission is being restricted (a flush is
potentially needed).

It uses atomic operations on when needed and handles the hash
synchronization on hash based processors.

It is currently only used to clear PTE bits and so the current
implementation doesn't provide a way to also set PTE bits.

The new _PAGE_NUMA bit, when set, is actually restricting access
so it must use that function too, so this change adds the ability
for pte_update() to also set bits.

We will use this later to set the _PAGE_NUMA bit.

Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

88247e8d

powerpc/pseries: Add Gen3 definitions for PCIE link speed · 49d9684a

Kleber Sacilotto de Souza authored 11 years ago


Rev3 of the PCI Express Base Specification defines a Supported Link
Speeds Vector where the bit definitions within this field are:

Bit 0 - 2.5 GT/s
Bit 1 - 5.0 GT/s
Bit 2 - 8.0 GT/s

This vector definition is used by the platform firmware to export the
maximum and current link speeds of the PCI bus via the
"ibm,pcie-link-speed-stats" device-tree property.

This patch updates pseries_root_bridge_prepare() to detect Gen3
speed buses (defined by 0x04).

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

49d9684a

powerpc/pseries: Fix regression on PCI link speed · b020cc6c

Kleber Sacilotto de Souza authored 11 years ago

Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue)
introduced a regression on the PCI link speed detection using the
device-tree property. The ibm,pcie-link-speed-stats property is composed
of two 32-bit integers, the first one being the maxinum link speed and
the second the current link speed. The changes introduced by the
aforementioned commit are considering just the first integer.

Fix this issue by changing how the property is accessed, using the
helper functions to properly access the array of values. The explicit
byte swapping is not needed anymore here, since it's done by the helper
functions.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

b020cc6c

powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack · 1a18a664

Kevin Hao authored 11 years ago


Guenter Roeck has got the following call trace on a p2020 board:
  Kernel stack overflow in process eb3e5a00, r1=eb79df90
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  task: eb3e5a00 ti: c0616000 task.ti: ef440000
  NIP: c003a420 LR: c003a410 CTR: c0017518
  REGS: eb79dee0 TRAP: 0901   Not tainted (3.13.0-rc8-juniper-00146-g19eca00)
  MSR: 00029000 <CE,EE,ME>  CR: 24008444  XER: 00000000
  GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000
  GPR08: 00000000 020b8000 00000000 00000000 44008442
  NIP [c003a420] __do_softirq+0x94/0x1ec
  LR [c003a410] __do_softirq+0x84/0x1ec
  Call Trace:
  [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable)
  [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8
  [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c
  [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8
  [ef441f40] [c000e7f4] ret_from_except+0x0/0x18
  --- Exception: 501 at 0xfcda524
      LR = 0x10024900
  Instruction dump:
  7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9
  5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f
  Kernel panic - not syncing: kernel stack overflow
  CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4
  Call Trace:

The reason is that we have used the wrong register to calculate the
ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64).
Just fix it.

As suggested by Benjamin Herrenschmidt, also add the C prototype of the
function in the comment in order to avoid such kind of errors in the
future.

Cc: stable@vger.kernel.org # 3.12
Reported-by: Guenter Roeck <linux@roeck-us.net>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

1a18a664

Feb 11, 2014

powerpc/powernv: Add iommu DMA bypass support for IODA2 · cd15b048

Benjamin Herrenschmidt authored 11 years ago


This patch adds the support for to create a direct iommu "bypass"
window on IODA2 bridges (such as Power8) allowing to bypass iommu
page translation completely for 64-bit DMA capable devices, thus
significantly improving DMA performances.

Additionally, this adds a hook to the struct iommu_table so that
the IOMMU API / VFIO can disable the bypass when external ownership
is requested, since in that case, the device will be used by an
environment such as userspace or a KVM guest which must not be
allowed to bypass translations.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

cd15b048

powerpc: Fix endian issues in kexec and crash dump code · ea961a82

Anton Blanchard authored 11 years ago


We expose a number of OF properties in the kexec and crash dump code
and these need to be big endian.

Cc: stable@vger.kernel.org # v3.13
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

ea961a82

powerpc/ppc32: Fix the bug in the init of non-base exception stack for UP · 04a34113

Kevin Hao authored 11 years ago


We would allocate one specific exception stack for each kind of
non-base exceptions for every CPU. For ppc32 the CPU hard ID is
used as the subscript to get the specific exception stack for
one CPU. But for an UP kernel, there is only one element in the
each kind of exception stack array. We would get stuck if the
CPU hard ID is not equal to '0'. So in this case we should use the
subscript '0' no matter what the CPU hard ID is.

Signed-off-by: Kevin Hao <haokexin@gmail.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

04a34113

powerpc/xmon: Don't signal we've entered until we're finished printing · d2b496e5

Michael Ellerman authored 11 years ago


Currently we set our cpu's bit in cpus_in_xmon, and then we take the
output lock and print the exception information.

This can race with the master cpu entering the command loop and printing
the backtrace. The result is that the backtrace gets garbled with
another cpu's exception print out.

Fix it by delaying the set of cpus_in_xmon until we are finished
printing.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

d2b496e5

powerpc/xmon: Fix timeout loop in get_output_lock() · 15075897

Michael Ellerman authored 11 years ago


As far as I can tell, our 70s era timeout loop in get_output_lock() is
generating no code.

This leads to the hostile takeover happening more or less simultaneously
on all cpus. The result is "interesting", some example output that is
more readable than most:

    cpu 0x1: Vector: 100 (Scypsut e0mx bR:e setV)e catto xc0p:u[ c 00
    c0:0  000t0o0V0erc0td:o5 rfc28050000]0c00 0 0  0 6t(pSrycsV1ppuot
    uxe 1m 2 0Rx21e3:0s0ce000c00000t00)00 60602oV2SerucSayt0y 0p 1sxs

Fix it by using udelay() in the timeout loop. The wait time and check
frequency are arbitrary, but seem to work OK. We already rely on
udelay() working so this is not a new dependency.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

15075897

powerpc/xmon: Don't loop forever in get_output_lock() · 730efb61

Michael Ellerman authored 11 years ago


If we enter with xmon_speaker != 0 we skip the first cmpxchg(), we also
skip the while loop because xmon_speaker != last_speaker (0) - meaning we
skip the second cmpxchg() also.

Following that code path the compiler sees no memory barriers and so is
within its rights to never reload xmon_speaker. The end result is we loop
forever.

This manifests as all cpus being in xmon ('c' command), but they refuse
to take control when you switch to them ('c x' for cpu # x).

I have seen this deadlock in practice and also checked the generated code to
confirm this is what's happening.

The simplest fix is just to always try the cmpxchg().

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

730efb61

powerpc/perf: Configure BHRB filter before enabling PMU interrupts · b4d6c06c

Anshuman Khandual authored 11 years ago


Right now the config_bhrb() PMU specific call happens after
write_mmcr0(), which actually enables the PMU for event counting and
interrupts. So there is a small window of time where the PMU and BHRB
runs without the required HW branch filter (if any) enabled in BHRB.

This can cause some of the branch samples to be collected through BHRB
without any filter applied and hence affects the correctness of
the results. This patch moves the BHRB config function call before
enabling interrupts.

Here are some data points captured via trace prints which depicts how we
could get PMU interrupts with BHRB filter NOT enabled with a standard
perf record command line (asking for branch record information as well).

    $ perf record -j any_call ls

Before the patch:-

    ls-1962  [003] d...  2065.299590: .perf_event_interrupt: MMCRA: 40000000000
    ls-1962  [003] d...  2065.299603: .perf_event_interrupt: MMCRA: 40000000000
    ...

    All the PMU interrupts before this point did not have the requested
    HW branch filter enabled in the MMCRA.

    ls-1962  [003] d...  2065.299647: .perf_event_interrupt: MMCRA: 40040000000
    ls-1962  [003] d...  2065.299662: .perf_event_interrupt: MMCRA: 40040000000

After the patch:-

    ls-1850  [008] d...   190.311828: .perf_event_interrupt: MMCRA: 40040000000
    ls-1850  [008] d...   190.311848: .perf_event_interrupt: MMCRA: 40040000000

    All the PMU interrupts have the requested HW BHRB branch filter
    enabled in MMCRA.

Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Fixed up whitespace and cleaned up changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

b4d6c06c

powerpc/pseries: Select ARCH_RANDOM on pseries · 8d4887ee

Michael Ellerman authored 11 years ago


We have a driver for the ARCH_RANDOM hook in rng.c, so we should select
ARCH_RANDOM on pseries.

Without this the build breaks if you turn ARCH_RANDOM off.

This hasn't broken the build because pseries_defconfig doesn't specify a
value for PPC_POWERNV, which is default y, and selects ARCH_RANDOM.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

8d4887ee

powerpc/perf: Add Power8 cache & TLB events · 2fdd313f

Michael Ellerman authored 11 years ago


Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

2fdd313f

powerpc/relocate fix relocate processing in LE mode · 3b830c82

Laurent Dufour authored 11 years ago

Relocation's code is not working in little endian mode because the r_info
field, which is a 64 bits value, should be read from the right offset.

The current code is optimized to read the r_info field as a 32 bits value
starting at the middle of the double word (offset 12). When running in LE
mode, the read value is not correct since only the MSB is read.

This patch removes this optimization which consist to deal with a 32 bits
value instead of a 64 bits one. This way it works in big and little endian
mode.

Signed-off-by: Laurent Dufour <ldufour@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

3b830c82

powerpc: Fix kdump hang issue on p8 with relocation on exception enabled. · 429d2e83

Mahesh Salgaonkar authored 11 years ago

On p8 systems, with relocation on exception feature enabled we are seeing
kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this
feature enabled, exception are raised with MMU (IR=DR=1) ON with the
default offset of 0xc*4000. Since exception is raised in virtual mode it
requires the vector region to be executable without which it fails to
fetch and execute instruction at 0xc*4xxx. For default kernel since kernel
is loaded at real 0, the htab mappings sets the entire kernel text region
executable. But for relocatable kernel (e.g. kdump case) we only copy
interrupt vectors down to real 0 and never marked that region as
executable because in p7 and below we always get exception in real mode.

This patch fixes this issue by marking htab mapping range as executable
that overlaps with the interrupt vector region for relocatable kernel.

Thanks to Ben who helped me to debug this issue and find the root cause.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

429d2e83

powerpc/pseries: Disable relocation on exception while going down during crash. · 3ec8b78f

Mahesh Salgaonkar authored 11 years ago

Disable relocation on exception while going down even in kdump case. This
is because we are about clear htab mappings while kexec-ing into kdump
kernel and we may run into issues if we still have AIL ON.

Signed-off-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

3ec8b78f

powerpc/eeh: Drop taken reference to driver on eeh_rmv_device · 8cc6b6cd

Thadeu Lima de Souza Cascardo authored 11 years ago


Commit f5c57710 ("powerpc/eeh: Use
partial hotplug for EEH unaware drivers") introduces eeh_rmv_device,
which may grab a reference to a driver, but not release it.

That prevents a driver from being removed after it has gone through EEH
recovery.

This patch drops the reference if it was taken.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Acked-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

8cc6b6cd

powerpc: Fix build failure in sysdev/mpic.c for MPIC_WEIRD=y · 0215b4aa

Paul Gortmaker authored 11 years ago


Commit 446f6d06 ("powerpc/mpic: Properly
set default triggers") breaks the mpc7447_hpc_defconfig as follows:

  CC      arch/powerpc/sysdev/mpic.o
arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type':
arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant
arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant
arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant
arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant

Looking at the cpp output (gcc 4.7.3), I see:

   case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] |
        mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]:

The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set
for this platform, thus enabling the following:

  -------------------
  #ifdef CONFIG_MPIC_WEIRD
  static u32 mpic_infos[][MPIC_IDX_END] = {
        [0] = { /* Original OpenPIC compatible MPIC */

  [...]

  #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name]

  #else /* CONFIG_MPIC_WEIRD */

  #define MPIC_INFO(name) MPIC_##name

  #endif /* CONFIG_MPIC_WEIRD */
  -------------------

Here we convert the case section to if/else if, and also add
the equivalent of a default case to warn about unknown types.
Boot tested on sbc8548, build tested on all defconfigs.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

0215b4aa

Jan 29, 2014

powerpc: Wire up sched_setattr and sched_getattr syscalls · f878f843
Benjamin Herrenschmidt authored 11 years ago
```
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
```
f878f843

powerpc/hugetlb: Replace __get_cpu_var with get_cpu_var · 94b09d75

Tiejun Chen authored 11 years ago


Replace __get_cpu_var safely with get_cpu_var to avoid
the following call trace:

[ 7253.637591] BUG: using smp_processor_id() in preemptible [00000000 00000000]
code: hugemmap01/9048
[ 7253.637601] caller is free_hugepd_range.constprop.25+0x88/0x1a8
[ 7253.637605] CPU: 1 PID: 9048 Comm: hugemmap01 Not tainted 3.10.20-rt14+ #114
[ 7253.637606] Call Trace:
[ 7253.637617] [cb049d80] [c0007ea4] show_stack+0x4c/0x168 (unreliable)
[ 7253.637624] [cb049dc0] [c031c674] debug_smp_processor_id+0x114/0x134
[ 7253.637628] [cb049de0] [c0016d28] free_hugepd_range.constprop.25+0x88/0x1a8
[ 7253.637632] [cb049e00] [c001711c] hugetlb_free_pgd_range+0x6c/0x168
[ 7253.637639] [cb049e40] [c0117408] free_pgtables+0x12c/0x150
[ 7253.637646] [cb049e70] [c011ce38] unmap_region+0xa0/0x11c
[ 7253.637671] [cb049ef0] [c011f03c] do_munmap+0x224/0x3bc
[ 7253.637676] [cb049f20] [c011f2e0] vm_munmap+0x38/0x5c
[ 7253.637682] [cb049f40] [c000ef88] ret_from_syscall+0x0/0x3c
[ 7253.637686] --- Exception: c01 at 0xff16004

Signed-off-by: Tiejun <Chen&lt;tiejun.chen@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

94b09d75

powerpc: Make sure "cache" directory is removed when offlining cpu · 91b973f9

Paul Mackerras authored 11 years ago


The code in remove_cache_dir() is supposed to remove the "cache"
subdirectory from the sysfs directory for a CPU when that CPU is
being offlined.  It tries to do this by calling kobject_put() on
the kobject for the subdirectory.  However, the subdirectory only
gets removed once the last reference goes away, and the reference
being put here may well not be the last reference.  That means
that the "cache" subdirectory may still exist when the offlining
operation has finished.  If the same CPU subsequently gets onlined,
the code tries to add a new "cache" subdirectory.  If the old
subdirectory has not yet been removed, we get a WARN_ON in the
sysfs code, with stack trace, and an error message printed on the
console.  Further, we ultimately end up with an online cpu with no
"cache" subdirectory.

This fixes it by doing an explicit kobject_del() at the point where
we want the subdirectory to go away.  kobject_del() removes the sysfs
directory even though the object still exists in memory.  The object
will get freed at some point in the future.  A subsequent onlining
operation can create a new sysfs directory, even if the old object
still exists in memory, without causing any problems.

Cc: stable@vger.kernel.org # v3.0+
Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

91b973f9

powerpc/mm: Fix mmap errno when MAP_FIXED is set and mapping exceeds the allowed address space · 19751c07

jmarchan@redhat.com authored 11 years ago


According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if
the requested mapping exceeds the allowed range for address space of
the process. The generic code set it right, but the specific powerpc
slice_get_unmapped_area() function currently returns -EINVAL in that
case.
This patch corrects it.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

19751c07

powerpc/powernv/cpuidle: Back-end cpuidle driver for powernv platform. · 2c2e6ecf

Deepthi Dharwar authored 11 years ago


Following patch ports the cpuidle framework for powernv
platform and also implements a cpuidle back-end powernv
idle driver calling on to power7_nap and snooze idle states.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

2c2e6ecf

powerpc/pseries/cpuidle: smt-snooze-delay cleanup. · 3fa8cad8

Deepthi Dharwar authored 11 years ago


smt-snooze-delay was designed to disable NAP state or delay the entry
to the NAP state prior to adoption of cpuidle framework. This
is per-cpu variable. With the coming of CPUIDLE framework,
states can be disabled on per-cpu basis using the cpuidle/enable
sysfs entry.

Also, with the coming of cpuidle driver each state's target residency
is per-driver unlike earlier which was per-device. Therefore,
the per-cpu sysfs smt-snooze-delay which decides the target residency
of the idle state on a particular cpu causes more confusion to the user
as we cannot have different smt-snooze-delay (target residency)
values for each cpu.

In the current code, smt-snooze-delay functionality is completely broken.
It makes sense to remove smt-snooze-delay from idle driver with the
coming of cpuidle framework.
However, sysfs files are retained as ppc64_util currently
utilises it. Once we fix ppc64_util, propose to clean
up the kernel code.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

3fa8cad8

powerpc/pseries/cpuidle: Move processor_idle.c to drivers/cpuidle. · 962e7bd4

Deepthi Dharwar authored 11 years ago


Move the file from arch specific pseries/processor_idle.c
to drivers/cpuidle/cpuidle-pseries.c
Make the relevant Makefile and Kconfig changes.
Also, introduce Kconfig.powerpc in drivers/cpuidle
for all powerpc cpuidle drivers.

Signed-off-by: Deepthi Dharwar <deepthi@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

962e7bd4

powerpc: Fix 32-bit frames for signals delivered when transactional · d765ff23

Paul Mackerras authored 11 years ago


Commit d31626f7 ("powerpc: Don't corrupt transactional state when
using FP/VMX in kernel") introduced a bug where the uc_link and uc_regs
fields of the ucontext_t that is created to hold the transactional
values of the registers in a 32-bit signal frame didn't get set
correctly.  The reason is that we now clear the MSR_TS bits in the MSR
in save_tm_user_regs(), before the code that sets uc_link and uc_regs.
To fix this, we move the setting of uc_link and uc_regs into the same
if statement that selects whether to call save_tm_user_regs() or
save_user_regs().

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

d765ff23

powerpc/iommu: Fix initialisation of DART iommu table · 67bfa0ee

Alistair Popple authored 11 years ago


Commit d0847757 switched the generic
powerpc iommu backend code to use the it_page_shift field to determine
page size. Commit 3a553170 should have
initiliased this field for all platforms, however the DART iommu table
code was not updated.

This commit initialises the it_page_shift field to 4K for the DART
iommu.

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

67bfa0ee

powerpc/numa: Fix decimal permissions · 316d7188

Joe Perches authored 11 years ago


This should have been octal.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

316d7188

powerpc/mm: Fix compile error of pgtable-ppc64.h · fd120dc2

Li Zhong authored 11 years ago


It seems that forward declaration couldn't work well with typedef, use
struct spinlock directly to avoiding following build errors:

In file included from include/linux/spinlock.h:81,
                 from include/linux/seqlock.h:35,
                 from include/linux/time.h:5,
                 from include/uapi/linux/timex.h:56,
                 from include/linux/timex.h:56,
                 from include/linux/sched.h:17,
                 from arch/powerpc/kernel/asm-offsets.c:17:
include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t'
/root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

fd120dc2