- Feb 28, 2014
-
-
Gavin Shan authored
The PHB diag-data is important to help locating the root cause for EEH errors such as frozen PE or fenced PHB. However, the EEH core enables IO path by clearing part of HW registers before collecting this data causing it to be corrupted. This patch fixes this by dumping the PHB diag-data immediately when frozen/fenced state on PE or PHB is detected for the first time in eeh_ops::get_state() or next_error() backend. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Paul Mackerras authored
The new ELFv2 little-endian ABI increases the stack redzone -- the area below the stack pointer that can be used for storing data -- from 288 bytes to 512 bytes. This means that we need to allow more space on the user stack when delivering a signal to a 64-bit process. To make the code a bit clearer, we define new USER_REDZONE_SIZE and KERNEL_REDZONE_SIZE symbols in ptrace.h. For now, we leave the kernel redzone size at 288 bytes, since increasing it to 512 bytes would increase the size of interrupt stack frames correspondingly. Gcc currently only makes use of 288 bytes of redzone even when compiling for the new little-endian ABI, and the kernel cannot currently be compiled with the new ABI anyway. In the future, hopefully gcc will provide an option to control the amount of redzone used, and then we could reduce it even more. This also changes the code in arch_compat_alloc_user_space() to preserve the expanded redzone. It is not clear why this function would ever be used on a 64-bit process, though. Signed-off-by:
Paul Mackerras <paulus@samba.org> CC: <stable@vger.kernel.org> [v3.13] Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Liu Ping Fan authored
The branch target should be the func addr, not the addr of func_descr_t. So using ppc_function_entry() to generate the right target addr. Signed-off-by:
Liu Ping Fan <pingfank@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Laurent Dufour authored
In copy_oldmem_page, the current check using max_pfn and min_low_pfn to decide if the page is backed or not, is not valid when the memory layout is not continuous. This happens when running as a QEMU/KVM guest, where RTAS is mapped higher in the memory. In that case max_pfn points to the end of RTAS, and a hole between the end of the kdump kernel and RTAS is not backed by PTEs. As a consequence, the kdump kernel is crashing in copy_oldmem_page when accessing in a direct way the pages in that hole. This fix relies on the memblock's service memblock_is_region_memory to check if the read page is part or not of the directly accessible memory. Signed-off-by:
Laurent Dufour <ldufour@linux.vnet.ibm.com> Tested-by:
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> CC: <stable@vger.kernel.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Tony Breeds authored
Currently we're storing a host endian RTAS token in rtas_stop_self_args.token. We then pass that directly to rtas. This is fine on big endian however on little endian the token is not what we expect. This will typically result in hitting: panic("Alas, I survived.\n"); To fix this we always use the stop-self token in host order and always convert it to be32 before passing this to rtas. Signed-off-by:
Tony Breeds <tony@bakeyournoodle.com> Cc: stable@vger.kernel.org Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Feb 17, 2014
-
-
Gavin Shan authored
We possiblly detect EEH errors during reboot, particularly in kexec path, but it's impossible for device drivers and EEH core to handle or recover them properly. The patch registers one reboot notifier for EEH and disable EEH subsystem during reboot. That means the EEH errors is going to be cleared by hardware reset or second kernel during early stage of PCI probe. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
The patch cleans up variable eeh_subsystem_enabled so that we needn't refer the variable directly from external. Instead, we will use function eeh_enabled() and eeh_set_enable() to operate the variable. Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Gavin Shan authored
When doing reset in order to recover the affected PE, we issue hot reset on PE primary bus if it's not root bus. Otherwise, we issue hot or fundamental reset on root port or PHB accordingly. For the later case, we didn't cover the situation where PE only includes root port and it potentially causes kernel crash upon EEH error to the PE. The patch reworks the logic of EEH reset to improve the code readability and also avoid the kernel crash. Cc: stable@vger.kernel.org Reported-by:
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Signed-off-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Anton Blanchard authored
We are seeing a lot of hits in the VDSO that are not resolved by perf. A while(1) gettimeofday() loop shows the issue: 27.64% [vdso] [.] 0x000000000000060c 22.57% [vdso] [.] 0x0000000000000628 16.88% [vdso] [.] 0x0000000000000610 12.39% [vdso] [.] __kernel_gettimeofday 6.09% [vdso] [.] 0x00000000000005f8 3.58% test [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18 2.94% [vdso] [.] __kernel_datapage_offset 2.90% test [.] main We are using a stripped VDSO image which means only symbols with relocation info can be resolved. There isn't a lot of point to stripping the VDSO, the debug info is only about 1kB: 4680 arch/powerpc/kernel/vdso64/vdso64.so 5815 arch/powerpc/kernel/vdso64/vdso64.so.dbg By using the unstripped image, we can resolve all the symbols in the VDSO and the perf profile data looks much better: 76.53% [vdso] [.] __do_get_tspec 12.20% [vdso] [.] __kernel_gettimeofday 5.05% [vdso] [.] __get_datapage 3.20% test [.] main 2.92% test [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18 Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Anton Blanchard authored
perf is failing to resolve symbols in the VDSO. A while (1) gettimeofday() loop shows: 93.99% [vdso] [.] 0x00000000000005e0 3.12% test [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18 2.81% test [.] main The reason for this is that we are linking our VDSO shared libraries at 1MB, which is a little weird. Even though this is uncommon, Alan points out that it is valid and we should probably fix perf userspace. Regardless, I can't see a reason why we are doing this. The code is all position independent and we never rely on the VDSO ending up at 1M (and we never place it there on 64bit tasks). Changing our link address to 0x0 fixes perf VDSO symbol resolution: 73.18% [vdso] [.] 0x000000000000060c 12.39% [vdso] [.] __kernel_gettimeofday 3.58% test [.] 00000037.plt_call.gettimeofday@@GLIBC_2.18 2.94% [vdso] [.] __kernel_datapage_offset 2.90% test [.] main We still have some local symbol resolution issues that will be fixed in a subsequent patch. Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Aneesh Kumar K.V authored
Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using a hash table MMU for various reasons (the flush is handled as part of the PTE modification when necessary). ppc64 thus doesn't implement flush_tlb_range for hash based MMUs. Additionally ppc64 require the tlb flushing to be batched within ptl locks. The reason to do that is to ensure that the hash page table is in sync with linux page table. We track the hpte index in linux pte and if we clear them without flushing hash and drop the ptl lock, we can have another cpu update the pte and can end up with duplicate entry in the hash table, which is fatal. We also want to keep set_pte_at simpler by not requiring them to do hash flush for performance reason. We do that by assuming that set_pte_at() is never *ever* called on a PTE that is already valid. This was the case until the NUMA code went in which broke that assumption. Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a way similar to ptep/pmdp_set_wrprotect(), with a generic implementation using set_pte_at() and a powerpc specific one using the appropriate mechanism needed to keep the hash table in sync. Acked-by:
Mel Gorman <mgorman@suse.de> Reviewed-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Aneesh Kumar K.V authored
pte_update() is a powerpc-ism used to change the bits of a PTE when the access permission is being restricted (a flush is potentially needed). It uses atomic operations on when needed and handles the hash synchronization on hash based processors. It is currently only used to clear PTE bits and so the current implementation doesn't provide a way to also set PTE bits. The new _PAGE_NUMA bit, when set, is actually restricting access so it must use that function too, so this change adds the ability for pte_update() to also set bits. We will use this later to set the _PAGE_NUMA bit. Acked-by:
Mel Gorman <mgorman@suse.de> Acked-by:
Rik van Riel <riel@redhat.com> Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Kleber Sacilotto de Souza authored
Rev3 of the PCI Express Base Specification defines a Supported Link Speeds Vector where the bit definitions within this field are: Bit 0 - 2.5 GT/s Bit 1 - 5.0 GT/s Bit 2 - 8.0 GT/s This vector definition is used by the platform firmware to export the maximum and current link speeds of the PCI bus via the "ibm,pcie-link-speed-stats" device-tree property. This patch updates pseries_root_bridge_prepare() to detect Gen3 speed buses (defined by 0x04). Signed-off-by:
Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Kleber Sacilotto de Souza authored
Commit 5091f0c9 (powerpc/pseries: Fix PCIE link speed endian issue) introduced a regression on the PCI link speed detection using the device-tree property. The ibm,pcie-link-speed-stats property is composed of two 32-bit integers, the first one being the maxinum link speed and the second the current link speed. The changes introduced by the aforementioned commit are considering just the first integer. Fix this issue by changing how the property is accessed, using the helper functions to properly access the array of values. The explicit byte swapping is not needed anymore here, since it's done by the helper functions. Signed-off-by:
Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Kevin Hao authored
Guenter Roeck has got the following call trace on a p2020 board: Kernel stack overflow in process eb3e5a00, r1=eb79df90 CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 task: eb3e5a00 ti: c0616000 task.ti: ef440000 NIP: c003a420 LR: c003a410 CTR: c0017518 REGS: eb79dee0 TRAP: 0901 Not tainted (3.13.0-rc8-juniper-00146-g19eca00) MSR: 00029000 <CE,EE,ME> CR: 24008444 XER: 00000000 GPR00: c003a410 eb79df90 eb3e5a00 00000000 eb05d900 00000001 65d87646 00000000 GPR08: 00000000 020b8000 00000000 00000000 44008442 NIP [c003a420] __do_softirq+0x94/0x1ec LR [c003a410] __do_softirq+0x84/0x1ec Call Trace: [eb79df90] [c003a410] __do_softirq+0x84/0x1ec (unreliable) [eb79dfe0] [c003a970] irq_exit+0xbc/0xc8 [eb79dff0] [c000cc1c] call_do_irq+0x24/0x3c [ef441f20] [c00046a8] do_IRQ+0x8c/0xf8 [ef441f40] [c000e7f4] ret_from_except+0x0/0x18 --- Exception: 501 at 0xfcda524 LR = 0x10024900 Instruction dump: 7c781b78 3b40000a 3a73b040 543c0024 3a800000 3b3913a0 7ef5bb78 48201bf9 5463103a 7d3b182e 7e89b92e 7c008146 <3ba00000> 7e7e9b78 48000014 57fff87f Kernel panic - not syncing: kernel stack overflow CPU: 0 PID: 2838 Comm: ssh Not tainted 3.13.0-rc8-juniper-00146-g19eca00 #4 Call Trace: The reason is that we have used the wrong register to calculate the ksp_limit in commit cbc9565e (powerpc: Remove ksp_limit on ppc64). Just fix it. As suggested by Benjamin Herrenschmidt, also add the C prototype of the function in the comment in order to avoid such kind of errors in the future. Cc: stable@vger.kernel.org # 3.12 Reported-by:
Guenter Roeck <linux@roeck-us.net> Tested-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
Kevin Hao <haokexin@gmail.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Feb 11, 2014
-
-
Benjamin Herrenschmidt authored
This patch adds the support for to create a direct iommu "bypass" window on IODA2 bridges (such as Power8) allowing to bypass iommu page translation completely for 64-bit DMA capable devices, thus significantly improving DMA performances. Additionally, this adds a hook to the struct iommu_table so that the IOMMU API / VFIO can disable the bypass when external ownership is requested, since in that case, the device will be used by an environment such as userspace or a KVM guest which must not be allowed to bypass translations. Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Anton Blanchard authored
We expose a number of OF properties in the kexec and crash dump code and these need to be big endian. Cc: stable@vger.kernel.org # v3.13 Signed-off-by:
Anton Blanchard <anton@samba.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Kevin Hao authored
We would allocate one specific exception stack for each kind of non-base exceptions for every CPU. For ppc32 the CPU hard ID is used as the subscript to get the specific exception stack for one CPU. But for an UP kernel, there is only one element in the each kind of exception stack array. We would get stuck if the CPU hard ID is not equal to '0'. So in this case we should use the subscript '0' no matter what the CPU hard ID is. Signed-off-by:
Kevin Hao <haokexin@gmail.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Michael Ellerman authored
Currently we set our cpu's bit in cpus_in_xmon, and then we take the output lock and print the exception information. This can race with the master cpu entering the command loop and printing the backtrace. The result is that the backtrace gets garbled with another cpu's exception print out. Fix it by delaying the set of cpus_in_xmon until we are finished printing. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Michael Ellerman authored
As far as I can tell, our 70s era timeout loop in get_output_lock() is generating no code. This leads to the hostile takeover happening more or less simultaneously on all cpus. The result is "interesting", some example output that is more readable than most: cpu 0x1: Vector: 100 (Scypsut e0mx bR:e setV)e catto xc0p:u[ c 00 c0:0 000t0o0V0erc0td:o5 rfc28050000]0c00 0 0 0 6t(pSrycsV1ppuot uxe 1m 2 0Rx21e3:0s0ce000c00000t00)00 60602oV2SerucSayt0y 0p 1sxs Fix it by using udelay() in the timeout loop. The wait time and check frequency are arbitrary, but seem to work OK. We already rely on udelay() working so this is not a new dependency. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Michael Ellerman authored
If we enter with xmon_speaker != 0 we skip the first cmpxchg(), we also skip the while loop because xmon_speaker != last_speaker (0) - meaning we skip the second cmpxchg() also. Following that code path the compiler sees no memory barriers and so is within its rights to never reload xmon_speaker. The end result is we loop forever. This manifests as all cpus being in xmon ('c' command), but they refuse to take control when you switch to them ('c x' for cpu # x). I have seen this deadlock in practice and also checked the generated code to confirm this is what's happening. The simplest fix is just to always try the cmpxchg(). Signed-off-by:
Michael Ellerman <michael@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Anshuman Khandual authored
Right now the config_bhrb() PMU specific call happens after write_mmcr0(), which actually enables the PMU for event counting and interrupts. So there is a small window of time where the PMU and BHRB runs without the required HW branch filter (if any) enabled in BHRB. This can cause some of the branch samples to be collected through BHRB without any filter applied and hence affects the correctness of the results. This patch moves the BHRB config function call before enabling interrupts. Here are some data points captured via trace prints which depicts how we could get PMU interrupts with BHRB filter NOT enabled with a standard perf record command line (asking for branch record information as well). $ perf record -j any_call ls Before the patch:- ls-1962 [003] d... 2065.299590: .perf_event_interrupt: MMCRA: 40000000000 ls-1962 [003] d... 2065.299603: .perf_event_interrupt: MMCRA: 40000000000 ... All the PMU interrupts before this point did not have the requested HW branch filter enabled in the MMCRA. ls-1962 [003] d... 2065.299647: .perf_event_interrupt: MMCRA: 40040000000 ls-1962 [003] d... 2065.299662: .perf_event_interrupt: MMCRA: 40040000000 After the patch:- ls-1850 [008] d... 190.311828: .perf_event_interrupt: MMCRA: 40040000000 ls-1850 [008] d... 190.311848: .perf_event_interrupt: MMCRA: 40040000000 All the PMU interrupts have the requested HW BHRB branch filter enabled in MMCRA. Signed-off-by:
Anshuman Khandual <khandual@linux.vnet.ibm.com> [mpe: Fixed up whitespace and cleaned up changelog] Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Michael Ellerman authored
We have a driver for the ARCH_RANDOM hook in rng.c, so we should select ARCH_RANDOM on pseries. Without this the build breaks if you turn ARCH_RANDOM off. This hasn't broken the build because pseries_defconfig doesn't specify a value for PPC_POWERNV, which is default y, and selects ARCH_RANDOM. Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Michael Ellerman authored
Signed-off-by:
Michael Ellerman <mpe@ellerman.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Laurent Dufour authored
Relocation's code is not working in little endian mode because the r_info field, which is a 64 bits value, should be read from the right offset. The current code is optimized to read the r_info field as a 32 bits value starting at the middle of the double word (offset 12). When running in LE mode, the read value is not correct since only the MSB is read. This patch removes this optimization which consist to deal with a 32 bits value instead of a 64 bits one. This way it works in big and little endian mode. Signed-off-by:
Laurent Dufour <ldufour@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Mahesh Salgaonkar authored
On p8 systems, with relocation on exception feature enabled we are seeing kdump kernel hang at interrupt vector 0xc*4400. The reason is, with this feature enabled, exception are raised with MMU (IR=DR=1) ON with the default offset of 0xc*4000. Since exception is raised in virtual mode it requires the vector region to be executable without which it fails to fetch and execute instruction at 0xc*4xxx. For default kernel since kernel is loaded at real 0, the htab mappings sets the entire kernel text region executable. But for relocatable kernel (e.g. kdump case) we only copy interrupt vectors down to real 0 and never marked that region as executable because in p7 and below we always get exception in real mode. This patch fixes this issue by marking htab mapping range as executable that overlaps with the interrupt vector region for relocatable kernel. Thanks to Ben who helped me to debug this issue and find the root cause. Signed-off-by:
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Mahesh Salgaonkar authored
Disable relocation on exception while going down even in kdump case. This is because we are about clear htab mappings while kexec-ing into kdump kernel and we may run into issues if we still have AIL ON. Signed-off-by:
Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Thadeu Lima de Souza Cascardo authored
Commit f5c57710 ("powerpc/eeh: Use partial hotplug for EEH unaware drivers") introduces eeh_rmv_device, which may grab a reference to a driver, but not release it. That prevents a driver from being removed after it has gone through EEH recovery. This patch drops the reference if it was taken. Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com> Acked-by:
Gavin Shan <shangw@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Paul Gortmaker authored
Commit 446f6d06 ("powerpc/mpic: Properly set default triggers") breaks the mpc7447_hpc_defconfig as follows: CC arch/powerpc/sysdev/mpic.o arch/powerpc/sysdev/mpic.c: In function 'mpic_set_irq_type': arch/powerpc/sysdev/mpic.c:886:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:890:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:894:9: error: case label does not reduce to an integer constant arch/powerpc/sysdev/mpic.c:898:9: error: case label does not reduce to an integer constant Looking at the cpp output (gcc 4.7.3), I see: case mpic->hw_set[MPIC_IDX_VECPRI_SENSE_EDGE] | mpic->hw_set[MPIC_IDX_VECPRI_POLARITY_POSITIVE]: The pointer into an array appears because CONFIG_MPIC_WEIRD=y is set for this platform, thus enabling the following: ------------------- #ifdef CONFIG_MPIC_WEIRD static u32 mpic_infos[][MPIC_IDX_END] = { [0] = { /* Original OpenPIC compatible MPIC */ [...] #define MPIC_INFO(name) mpic->hw_set[MPIC_IDX_##name] #else /* CONFIG_MPIC_WEIRD */ #define MPIC_INFO(name) MPIC_##name #endif /* CONFIG_MPIC_WEIRD */ ------------------- Here we convert the case section to if/else if, and also add the equivalent of a default case to warn about unknown types. Boot tested on sbc8548, build tested on all defconfigs. Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
- Jan 29, 2014
-
-
Benjamin Herrenschmidt authored
Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Tiejun Chen authored
Replace __get_cpu_var safely with get_cpu_var to avoid the following call trace: [ 7253.637591] BUG: using smp_processor_id() in preemptible [00000000 00000000] code: hugemmap01/9048 [ 7253.637601] caller is free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637605] CPU: 1 PID: 9048 Comm: hugemmap01 Not tainted 3.10.20-rt14+ #114 [ 7253.637606] Call Trace: [ 7253.637617] [cb049d80] [c0007ea4] show_stack+0x4c/0x168 (unreliable) [ 7253.637624] [cb049dc0] [c031c674] debug_smp_processor_id+0x114/0x134 [ 7253.637628] [cb049de0] [c0016d28] free_hugepd_range.constprop.25+0x88/0x1a8 [ 7253.637632] [cb049e00] [c001711c] hugetlb_free_pgd_range+0x6c/0x168 [ 7253.637639] [cb049e40] [c0117408] free_pgtables+0x12c/0x150 [ 7253.637646] [cb049e70] [c011ce38] unmap_region+0xa0/0x11c [ 7253.637671] [cb049ef0] [c011f03c] do_munmap+0x224/0x3bc [ 7253.637676] [cb049f20] [c011f2e0] vm_munmap+0x38/0x5c [ 7253.637682] [cb049f40] [c000ef88] ret_from_syscall+0x0/0x3c [ 7253.637686] --- Exception: c01 at 0xff16004 Signed-off-by:
Tiejun <Chen<tiejun.chen@windriver.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Paul Mackerras authored
The code in remove_cache_dir() is supposed to remove the "cache" subdirectory from the sysfs directory for a CPU when that CPU is being offlined. It tries to do this by calling kobject_put() on the kobject for the subdirectory. However, the subdirectory only gets removed once the last reference goes away, and the reference being put here may well not be the last reference. That means that the "cache" subdirectory may still exist when the offlining operation has finished. If the same CPU subsequently gets onlined, the code tries to add a new "cache" subdirectory. If the old subdirectory has not yet been removed, we get a WARN_ON in the sysfs code, with stack trace, and an error message printed on the console. Further, we ultimately end up with an online cpu with no "cache" subdirectory. This fixes it by doing an explicit kobject_del() at the point where we want the subdirectory to go away. kobject_del() removes the sysfs directory even though the object still exists in memory. The object will get freed at some point in the future. A subsequent onlining operation can create a new sysfs directory, even if the old object still exists in memory, without causing any problems. Cc: stable@vger.kernel.org # v3.0+ Signed-off-by:
Paul Mackerras <paulus@samba.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
jmarchan@redhat.com authored
According to Posix, if MAP_FIXED is specified mmap shall set ENOMEM if the requested mapping exceeds the allowed range for address space of the process. The generic code set it right, but the specific powerpc slice_get_unmapped_area() function currently returns -EINVAL in that case. This patch corrects it. Signed-off-by:
Jerome Marchand <jmarchan@redhat.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Deepthi Dharwar authored
Following patch ports the cpuidle framework for powernv platform and also implements a cpuidle back-end powernv idle driver calling on to power7_nap and snooze idle states. Signed-off-by:
Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Deepthi Dharwar authored
smt-snooze-delay was designed to disable NAP state or delay the entry to the NAP state prior to adoption of cpuidle framework. This is per-cpu variable. With the coming of CPUIDLE framework, states can be disabled on per-cpu basis using the cpuidle/enable sysfs entry. Also, with the coming of cpuidle driver each state's target residency is per-driver unlike earlier which was per-device. Therefore, the per-cpu sysfs smt-snooze-delay which decides the target residency of the idle state on a particular cpu causes more confusion to the user as we cannot have different smt-snooze-delay (target residency) values for each cpu. In the current code, smt-snooze-delay functionality is completely broken. It makes sense to remove smt-snooze-delay from idle driver with the coming of cpuidle framework. However, sysfs files are retained as ppc64_util currently utilises it. Once we fix ppc64_util, propose to clean up the kernel code. Signed-off-by:
Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Deepthi Dharwar authored
Move the file from arch specific pseries/processor_idle.c to drivers/cpuidle/cpuidle-pseries.c Make the relevant Makefile and Kconfig changes. Also, introduce Kconfig.powerpc in drivers/cpuidle for all powerpc cpuidle drivers. Signed-off-by:
Deepthi Dharwar <deepthi@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Paul Mackerras authored
Commit d31626f7 ("powerpc: Don't corrupt transactional state when using FP/VMX in kernel") introduced a bug where the uc_link and uc_regs fields of the ucontext_t that is created to hold the transactional values of the registers in a 32-bit signal frame didn't get set correctly. The reason is that we now clear the MSR_TS bits in the MSR in save_tm_user_regs(), before the code that sets uc_link and uc_regs. To fix this, we move the setting of uc_link and uc_regs into the same if statement that selects whether to call save_tm_user_regs() or save_user_regs(). Signed-off-by:
Paul Mackerras <paulus@samba.org> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Alistair Popple authored
Commit d0847757 switched the generic powerpc iommu backend code to use the it_page_shift field to determine page size. Commit 3a553170 should have initiliased this field for all platforms, however the DART iommu table code was not updated. This commit initialises the it_page_shift field to 4K for the DART iommu. Signed-off-by:
Alistair Popple <alistair@popple.id.au> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Joe Perches authored
This should have been octal. Signed-off-by:
Joe Perches <joe@perches.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-
Li Zhong authored
It seems that forward declaration couldn't work well with typedef, use struct spinlock directly to avoiding following build errors: In file included from include/linux/spinlock.h:81, from include/linux/seqlock.h:35, from include/linux/time.h:5, from include/uapi/linux/timex.h:56, from include/linux/timex.h:56, from include/linux/sched.h:17, from arch/powerpc/kernel/asm-offsets.c:17: include/linux/spinlock_types.h:76: error: redefinition of typedef 'spinlock_t' /root/linux-next/arch/powerpc/include/asm/pgtable-ppc64.h:563: note: previous declaration of 'spinlock_t' was here Signed-off-by:
Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by:
Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org>
-