- May 07, 2014
-
-
Juri Lelli authored
yield_task_dl() is broken: o it forces current to be throttled setting its runtime to zero; o it sets current's dl_se->dl_new to one, expecting that dl_task_timer() will queue it back with proper parameters at replenish time. Unfortunately, dl_task_timer() has this check at the very beginning: if (!dl_task(p) || dl_se->dl_new) goto unlock; So, it just bails out and the task is never replenished. It actually yielded forever. To fix this, introduce a new flag indicating that the task properly yielded the CPU before its current runtime expired. While this is a little overdoing at the moment, the flag would be useful in the future to discriminate between "good" jobs (of which remaining runtime could be reclaimed, i.e. recycled) and "bad" jobs (for which dl_throttled task has been set) that needed to be stopped. Reported-by:
yjay.kim <yjay.kim@lge.com> Signed-off-by:
Juri Lelli <juri.lelli@gmail.com> Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140429103953.e68eba1b2ac3309214e3dc5a@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 19, 2014
-
-
Mel Gorman authored
David Vrabel identified a regression when using automatic NUMA balancing under Xen whereby page table entries were getting corrupted due to the use of native PTE operations. Quoting him Xen PV guest page tables require that their entries use machine addresses if the preset bit (_PAGE_PRESENT) is set, and (for successful migration) non-present PTEs must use pseudo-physical addresses. This is because on migration MFNs in present PTEs are translated to PFNs (canonicalised) so they may be translated back to the new MFN in the destination domain (uncanonicalised). pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set and clear the _PAGE_PRESENT bit using pte_set_flags(), pte_clear_flags(), etc. In a Xen PV guest, these functions must translate MFNs to PFNs when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting _PAGE_PRESENT. His suggested fix converted p[te|md]_[set|clear]_flags to using paravirt-friendly ops but this is overkill. He suggested an alternative of using p[te|md]_modify in the NUMA page table operations but this is does more work than necessary and would require looking up a VMA for protections. This patch modifies the NUMA page table operations to use paravirt friendly operations to set/clear the flags of interest. Unfortunately this will take a performance hit when updating the PTEs on CONFIG_PARAVIRT but I do not see a way around it that does not break Xen. Signed-off-by:
Mel Gorman <mgorman@suse.de> Acked-by:
David Vrabel <david.vrabel@citrix.com> Tested-by:
David Vrabel <david.vrabel@citrix.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Peter Anvin <hpa@zytor.com> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Steven Noonan <steven@uplinklabs.net> Cc: Rik van Riel <riel@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Peter Zijlstra authored
Stick in a comment before someone else tries to fix the sparse warning this generates. Suggested-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.org Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Viresh Kumar authored
shiraz.hashim@st.com email-id doesn't exist anymore as he has left the company. Replace ST's id with shiraz.linux.kernel@gmail.com. It also updates .mailmap file to fix address for 'git shortlog'. Signed-off-by:
Viresh Kumar <viresh.kumar@linaro.org> Cc: Shiraz Hashim <shiraz.linux.kernel@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Vlad Yasevich authored
Currently, it is possible to create an SCTP socket, then switch auth_enable via sysctl setting to 1 and crash the system on connect: Oops[#1]: CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1 task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000 [...] Call Trace: [<ffffffff8043c4e8>] sctp_auth_asoc_set_default_hmac+0x68/0x80 [<ffffffff8042b300>] sctp_process_init+0x5e0/0x8a4 [<ffffffff8042188c>] sctp_sf_do_5_1B_init+0x234/0x34c [<ffffffff804228c8>] sctp_do_sm+0xb4/0x1e8 [<ffffffff80425a08>] sctp_endpoint_bh_rcv+0x1c4/0x214 [<ffffffff8043af68>] sctp_rcv+0x588/0x630 [<ffffffff8043e8e8>] sctp6_rcv+0x10/0x24 [<ffffffff803acb50>] ip6_input+0x2c0/0x440 [<ffffffff8030fc00>] __netif_receive_skb_core+0x4a8/0x564 [<ffffffff80310650>] process_backlog+0xb4/0x18c [<ffffffff80313cbc>] net_rx_action+0x12c/0x210 [<ffffffff80034254>] __do_softirq+0x17c/0x2ac [<ffffffff800345e0>] irq_exit+0x54/0xb0 [<ffffffff800075a4>] ret_from_irq+0x0/0x4 [<ffffffff800090ec>] rm7k_wait_irqoff+0x24/0x48 [<ffffffff8005e388>] cpu_startup_entry+0xc0/0x148 [<ffffffff805a88b0>] start_kernel+0x37c/0x398 Code: dd0900b8 000330f8 0126302d <dcc60000> 50c0fff1 0047182a a48306a0 03e00008 00000000 ---[ end trace b530b0551467f2fd ]--- Kernel panic - not syncing: Fatal exception in interrupt What happens while auth_enable=0 in that case is, that ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs() when endpoint is being created. After that point, if an admin switches over to auth_enable=1, the machine can crash due to NULL pointer dereference during reception of an INIT chunk. When we enter sctp_process_init() via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk, the INIT verification succeeds and while we walk and process all INIT params via sctp_process_param() we find that net->sctp.auth_enable is set, therefore do not fall through, but invoke sctp_auth_asoc_set_default_hmac() instead, and thus, dereference what we have set to NULL during endpoint initialization phase. The fix is to make auth_enable immutable by caching its value during endpoint initialization, so that its original value is being carried along until destruction. The bug seems to originate from the very first days. Fix in joint work with Daniel Borkmann. Reported-by:
Joshua Kinard <kumba@gentoo.org> Signed-off-by:
Vlad Yasevich <vyasevic@redhat.com> Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Acked-by:
Neil Horman <nhorman@tuxdriver.com> Tested-by:
Joshua Kinard <kumba@gentoo.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Apr 18, 2014
-
-
Alexander Shiyan authored
Add an empty version of of_find_node_by_path(). This fixes following build error for asoc tree: sound/soc/fsl/fsl_ssi.c: In function 'fsl_ssi_probe': sound/soc/fsl/fsl_ssi.c:1471:2: error: implicit declaration of function 'of_find_node_by_path' [-Werror=implicit-function-declaration] sprop = of_get_property(of_find_node_by_path("/"), "compatible", NULL); Reported-by:
Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by:
Alexander Shiyan <shc_work@mail.ru> Signed-off-by:
Rob Herring <robh@kernel.org>
-
Daniel Vetter authored
This is leftover stuff from my previous doc round which I kinda wanted to do but didn't yet due to rebase hell. The modeset helpers and the probing helpers a independent and e.g. i915 uses the probing stuff but has its own modeset infrastructure. It hence makes to split this up. While at it add a DOC: comment for the probing libraray. It would be rather neat to pull some of the DocBook documenting these two helpers into in-line DOC: comments. But unfortunately kerneldoc doesn't support markdown or something similar to make nice-looking documentation, so the current state is better. Signed-off-by:
Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by:
Dave Airlie <airlied@redhat.com>
-
- Apr 17, 2014
-
-
Corey Minyard authored
Convert some ints to bools. Signed-off-by:
Corey Minyard <cminyard@mvista.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Corey Minyard authored
The IPMI driver would wake up periodically looking for events and watchdog pretimeouts. If there is nothing waiting for these events, it's really kind of pointless to be checking for them. So modify the driver so the message handler can pass down if it needs the lower layer to be waiting for these. Modify the system interface lower layer to turn off all timer and thread activity if the upper layer doesn't need anything and it is not currently handling messages. And modify the message handler to not restart the timer if its timer is not needed. The timers and kthread will still be enabled if: - the SI interface is handling a message. - a user has enabled watching for events. - the IPMI watchdog timer is in use (since it uses pretimeouts). - the message handler is waiting on a remote response. - a user has registered to receive commands. This mostly affects interfaces without interrupts. Interfaces with interrupts already don't use CPU in the system interface when the interface is idle. Signed-off-by:
Corey Minyard <cminyard@mvista.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Apr 16, 2014
-
-
K. Y. Srinivasan authored
Only ws2012r2 hosts support the ability to reconnect to the host on VMBUS. This functionality is needed by kexec in Linux. To use this functionality we need to negotiate version 3.0 of the VMBUS protocol. Signed-off-by:
K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> [3.9+] Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
K. Y. Srinivasan authored
Return the appropriate error code and handle the case when the target file exists correctly. This fixes a bug. Signed-off-by:
K. Y. Srinivasan <kys@microsoft.com> Cc: <stable@vger.kernel.org> [3.14] Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Guenter Roeck authored
This is for a system with fixed assignments of input and output pins (various variants of Kontron COMe). Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Guenter Roeck authored
Some systems using mdio-gpio may use active-low gpio pins (eg with inverters or FETs connected to all or some of the gpio pins). Signed-off-by:
Guenter Roeck <linux@roeck-us.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Cong Wang authored
As suggested by Julian: Simply, flowi4_iif must not contain 0, it does not look logical to ignore all ip rules with specified iif. because in fib_rule_match() we do: if (rule->iifindex && (rule->iifindex != fl->flowi_iif)) goto out; flowi4_iif should be LOOPBACK_IFINDEX by default. We need to move LOOPBACK_IFINDEX to include/net/flow.h: 1) It is mostly used by flowi_iif 2) Fix the following compile error if we use it in flow.h by the patches latter: In file included from include/linux/netfilter.h:277:0, from include/net/netns/netfilter.h:5, from include/net/net_namespace.h:21, from include/linux/netdevice.h:43, from include/linux/icmpv6.h:12, from include/linux/ipv6.h:61, from include/net/ipv6.h:16, from include/linux/sunrpc/clnt.h:27, from include/linux/nfs_fs.h:30, from init/do_mounts.c:32: include/net/flow.h: In function ‘flowi4_init_output’: include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function) Cc: Eric Biederman <ebiederm@xmission.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: David S. Miller <davem@davemloft.net> Signed-off-by:
Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by:
Cong Wang <cwang@twopensource.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Tejun Heo authored
All device_schedule_callback_owner() users are converted to use device_remove_file_self(). Remove now unused {sysfs|device}_schedule_callback_owner(). Signed-off-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Thomas Petazzoni authored
This commit adds the necessary definitions for the PHY layer to recognize "qsgmii" as a valid PHY interface. A QSMII interface, as defined at http://en.wikipedia.org/wiki/Media_Independent_Interface#Quad_Serial_Gigabit_Media_Independent_Interface , is "is a method of combining four SGMII lines into a 5Gbit/s interface. QSGMII, like SGMII, uses LVDS signalling for the TX and RX data and a single LVDS clock signal. QSGMII uses significantly fewer signal lines than four SGMII busses." This type of MAC <-> PHY connection might require special handling on the MAC driver side, so it should be possible to express this type of MAC <-> PHY connection, for example in the Device Tree. Signed-off-by:
Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: devicetree@vger.kernel.org Reviewed-by:
Florian Fainelli <f.fainelli@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Thierry Reding authored
The version of the drm_tegra_submit structure that was merged all the way back in 3.10 contains a pad field that was originally intended to properly pad the following __u64 field. Unfortunately it seems like a different field was dropped during review that caused this padding to become unnecessary, but the pad field wasn't removed at that time. One possible side-effect of this is that since the __u64 following the pad is now no longer properly aligned, the compiler may (or may not) introduce padding itself, which results in no predictable ABI. Rectify this by removing the pad field so that all fields are again naturally aligned. Technically this is breaking existing userspace ABI, but given that there aren't any (released) userspace drivers that make use of this yet, the fallout should be minimal. Fixes: d43f81cb ("drm/tegra: Add gr2d device") Cc: <stable@vger.kernel.org> # 3.10 Signed-off-by:
Thierry Reding <treding@nvidia.com>
-
Ingo Molnar authored
Steve reported a reboot hang and bisected it back to this commit: a4f1987e x86, reboot: Add EFI and CF9 reboot methods into the default list He heroically tested all reboot methods and found the following: reboot=t # triple fault ok reboot=k # keyboard ctrl FAIL reboot=b # BIOS ok reboot=a # ACPI FAIL reboot=e # EFI FAIL [system has no EFI] reboot=p # PCI 0xcf9 FAIL And I think it's pretty obvious that we should only try PCI 0xcf9 as a last resort - if at all. The other observation is that (on this box) we should never try the PCI reboot method, but close with either the 'triple fault' or the 'BIOS' (terminal!) reboot methods. Thirdly, CF9_COND is a total misnomer - it should be something like CF9_SAFE or CF9_CAREFUL, and 'CF9' should be 'CF9_FORCE' ... So this patch fixes the worst problems: - it orders the actual reboot logic to follow the reboot ordering pattern - it was in a pretty random order before for no good reason. - it fixes the CF9 misnomers and uses BOOT_CF9_FORCE and BOOT_CF9_SAFE flags to make the code more obvious. - it tries the BIOS reboot method before the PCI reboot method. (Since 'BIOS' is a terminal reboot method resulting in a hang if it does not work, this is essentially equivalent to removing the PCI reboot method from the default reboot chain.) - just for the miraculous possibility of terminal (resulting in hang) reboot methods of triple fault or BIOS returning without having done their job, there's an ordering between them as well. Reported-and-bisected-and-tested-by:
Steven Rostedt <rostedt@goodmis.org> Cc: Li Aubrey <aubrey.li@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Link: http://lkml.kernel.org/r/20140404064120.GB11877@gmail.com Signed-off-by:
Ingo Molnar <mingo@kernel.org>
-
- Apr 15, 2014
-
-
Eric Dumazet authored
In the dst->output() path for ipv4, the code assumes the skb it has to transmit is attached to an inet socket, specifically via ip_mc_output() : The sk_mc_loop() test triggers a WARN_ON() when the provider of the packet is an AF_PACKET socket. The dst->output() method gets an additional 'struct sock *sk' parameter. This needs a cascade of changes so that this parameter can be propagated from vxlan to final consumer. Fixes: 8f646c92 ("vxlan: keep original skb ownership") Reported-by:
lucien xin <lucien.xin@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
ip_queue_xmit() assumes the skb it has to transmit is attached to an inet socket. Commit 31c70d59 ("l2tp: keep original skb ownership") changed l2tp to not change skb ownership and thus broke this assumption. One fix is to add a new 'struct sock *sk' parameter to ip_queue_xmit(), so that we do not assume skb->sk points to the socket used by l2tp tunnel. Fixes: 31c70d59 ("l2tp: keep original skb ownership") Reported-by:
Zhan Jianyu <nasa4836@gmail.com> Tested-by:
Zhan Jianyu <nasa4836@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Theodore Ts'o authored
In retrospect, this was a bad way to handle things, since it limited testing of these patches. We should just get the VFS level changes merged in first. Signed-off-by:
"Theodore Ts'o" <tytso@mit.edu>
-
- Apr 14, 2014
-
-
Daniel Borkmann authored
This reverts commit ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") as it introduced a serious performance regression on SCTP over IPv4 and IPv6, though a not as dramatic on the latter. Measurements are on 10Gbit/s with ixgbe NICs. Current state: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.241.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 17:56:21 GMT Connecting to host 192.168.241.3, port 5201 Cookie: Lab200slot2.1397238981.812898.548918 [ 4] local 192.168.241.2 port 38616 connected to 192.168.241.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.09 sec 20.8 MBytes 161 Mbits/sec [ 4] 1.09-2.13 sec 10.8 MBytes 86.8 Mbits/sec [ 4] 2.13-3.15 sec 3.57 MBytes 29.5 Mbits/sec [ 4] 3.15-4.16 sec 4.33 MBytes 35.7 Mbits/sec [ 4] 4.16-6.21 sec 10.4 MBytes 42.7 Mbits/sec [ 4] 6.21-6.21 sec 0.00 Bytes 0.00 bits/sec [ 4] 6.21-7.35 sec 34.6 MBytes 253 Mbits/sec [ 4] 7.35-11.45 sec 22.0 MBytes 45.0 Mbits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-11.45 sec 0.00 Bytes 0.00 bits/sec [ 4] 11.45-12.51 sec 16.0 MBytes 126 Mbits/sec [ 4] 12.51-13.59 sec 20.3 MBytes 158 Mbits/sec [ 4] 13.59-14.65 sec 13.4 MBytes 107 Mbits/sec [ 4] 14.65-16.79 sec 33.3 MBytes 130 Mbits/sec [ 4] 16.79-16.79 sec 0.00 Bytes 0.00 bits/sec [ 4] 16.79-17.82 sec 5.94 MBytes 48.7 Mbits/sec (etc) [root@Lab200slot2 ~]# iperf3 --sctp -6 -c 2001:db8:0:f101::1 -V -l 1400 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0 #1 SMP Thu Apr 3 23:18:29 EDT 2014 x86_64 Time: Fri, 11 Apr 2014 19:08:41 GMT Connecting to host 2001:db8:0:f101::1, port 5201 Cookie: Lab200slot2.1397243321.714295.2b3f7c [ 4] local 2001:db8:0:f101::2 port 55804 connected to 2001:db8:0:f101::1 port 5201 Starting Test: protocol: SCTP, 1 streams, 1400 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 169 MBytes 1.42 Gbits/sec [ 4] 1.00-2.00 sec 201 MBytes 1.69 Gbits/sec [ 4] 2.00-3.00 sec 188 MBytes 1.58 Gbits/sec [ 4] 3.00-4.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 4.00-5.00 sec 165 MBytes 1.39 Gbits/sec [ 4] 5.00-6.00 sec 199 MBytes 1.67 Gbits/sec [ 4] 6.00-7.00 sec 163 MBytes 1.36 Gbits/sec [ 4] 7.00-8.00 sec 174 MBytes 1.46 Gbits/sec [ 4] 8.00-9.00 sec 193 MBytes 1.62 Gbits/sec [ 4] 9.00-10.00 sec 196 MBytes 1.65 Gbits/sec [ 4] 10.00-11.00 sec 157 MBytes 1.31 Gbits/sec [ 4] 11.00-12.00 sec 175 MBytes 1.47 Gbits/sec [ 4] 12.00-13.00 sec 192 MBytes 1.61 Gbits/sec [ 4] 13.00-14.00 sec 199 MBytes 1.67 Gbits/sec (etc) After patch: [root@Lab200slot2 ~]# iperf3 --sctp -4 -c 192.168.240.3 -V -l 1452 -t 60 iperf version 3.0.1 (10 January 2014) Linux Lab200slot2 3.14.0+ #1 SMP Mon Apr 14 12:06:40 EDT 2014 x86_64 Time: Mon, 14 Apr 2014 16:40:48 GMT Connecting to host 192.168.240.3, port 5201 Cookie: Lab200slot2.1397493648.413274.65e131 [ 4] local 192.168.240.2 port 50548 connected to 192.168.240.3 port 5201 Starting Test: protocol: SCTP, 1 streams, 1452 byte blocks, omitting 0 seconds, 60 second test [ ID] Interval Transfer Bandwidth [ 4] 0.00-1.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 1.00-2.00 sec 239 MBytes 2.01 Gbits/sec [ 4] 2.00-3.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 3.00-4.00 sec 239 MBytes 2.00 Gbits/sec [ 4] 4.00-5.00 sec 245 MBytes 2.05 Gbits/sec [ 4] 5.00-6.00 sec 240 MBytes 2.01 Gbits/sec [ 4] 6.00-7.00 sec 240 MBytes 2.02 Gbits/sec [ 4] 7.00-8.00 sec 239 MBytes 2.01 Gbits/sec With the reverted patch applied, the SCTP/IPv4 performance is back to normal on latest upstream for IPv4 and IPv6 and has same throughput as 3.4.2 test kernel, steady and interval reports are smooth again. Fixes: ef2820a7 ("net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer") Reported-by:
Peter Butler <pbutler@sonusnet.com> Reported-by:
Dongsheng Song <dongsheng.song@gmail.com> Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Tested-by:
Peter Butler <pbutler@sonusnet.com> Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Cc: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Cc: Alexander Sverdlin <alexander.sverdlin@nsn.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Acked-by:
Vlad Yasevich <vyasevich@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
While reviewing seccomp code, we found that BPF_S_ANC_SECCOMP_LD_W has been wrongly decoded by commit a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)") into the opcode BPF_LD|BPF_B|BPF_ABS although it should have been decoded as BPF_LD|BPF_W|BPF_ABS. In practice, this should not have much side-effect though, as such conversion is/was being done through prctl(2) PR_SET_SECCOMP. Reverse operation PR_GET_SECCOMP will only return the current seccomp mode, but not the filter itself. Since the transition to the new BPF infrastructure, it's also not used anymore, so we can simply remove this as it's unreachable. Fixes: a8fc9277 ("sk-filter: Add ability to get socket filter program (v2)") Signed-off-by:
Daniel Borkmann <dborkman@redhat.com> Signed-off-by:
Alexei Starovoitov <ast@plumgrid.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Francois reported that setting big mtu on loopback device could prevent tcp sessions making progress. We do not support (yet ?) IPv6 Jumbograms and cook corrupted packets. We must limit the IPv6 MTU to (65535 + 40) bytes in theory. Tested: ifconfig lo mtu 70000 netperf -H ::1 Before patch : Throughput : 0.05 Mbits After patch : Throughput : 35484 Mbits Reported-by:
Francois WELLENREITER <f.wellenreiter@gmail.com> Signed-off-by:
Eric Dumazet <edumazet@google.com> Acked-by:
YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Acked-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Patrick McHardy authored
nft_cmp_fast is used for equality comparisions of size <= 4. For comparisions of size < 4 byte a mask is calculated that is applied to both the data from userspace (during initialization) and the register value (during runtime). Both values are stored using (in effect) memcpy to a memory area that is then interpreted as u32 by nft_cmp_fast. This works fine on little endian since smaller types have the same base address, however on big endian this is not true and the smaller types are interpreted as a big number with trailing zero bytes. The mask therefore must not include the lower bytes, but the higher bytes on big endian. Add a helper function that does a cpu_to_le32 to switch the bytes on big endian. Since we're dealing with a mask of just consequitive bits, this works out fine. Signed-off-by:
Patrick McHardy <kaber@trash.net> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
- Apr 11, 2014
-
-
David S. Miller authored
Several spots in the kernel perform a sequence like: skb_queue_tail(&sk->s_receive_queue, skb); sk->sk_data_ready(sk, skb->len); But at the moment we place the SKB onto the socket receive queue it can be consumed and freed up. So this skb->len access is potentially to freed up memory. Furthermore, the skb->len can be modified by the consumer so it is possible that the value isn't accurate. And finally, no actual implementation of this callback actually uses the length argument. And since nobody actually cared about it's value, lots of call sites pass arbitrary values in such as '0' and even '1'. So just remove the length argument from the callback, that way there is no confusion whatsoever and all of these use-after-free cases get fixed as a side effect. Based upon a patch by Eric Dumazet and his suggestion to audit this issue tree-wide. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Dave Hansen authored
'struct page' has two list_head fields: 'lru' and 'list'. Conveniently, they are unioned together. This means that code can use them interchangably, which gets horribly confusing like with this nugget from slab.c: > list_del(&page->lru); > if (page->active == cachep->num) > list_add(&page->list, &n->slabs_full); This patch makes the slab and slub code use page->lru universally instead of mixing ->list and ->lru. So, the new rule is: page->lru is what the you use if you want to keep your page on a list. Don't like the fact that it's not called ->list? Too bad. Signed-off-by:
Dave Hansen <dave.hansen@linux.intel.com> Acked-by:
Christoph Lameter <cl@linux.com> Acked-by:
David Rientjes <rientjes@google.com> Cc: Pekka Enberg <penberg@kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Pekka Enberg <penberg@kernel.org>
-
Eli Cohen authored
Add support for the block multicast loopback QP creation flag along the proper firmware API for that. Signed-off-by:
Eli Cohen <eli@mellanox.com> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
Roland Dreier <roland@purestorage.com>
-
- Apr 10, 2014
-
-
Chris Metcalf authored
On systems with CONFIG_COMPAT we introduced the new requirement that audit_classify_compat_syscall() exists. This wasn't true for everything (apparently not for "tilegx", which I know less that nothing about.) Instead of wrapping the preprocessor optomization with CONFIG_COMPAT we should have used the new CONFIG_AUDIT_COMPAT_GENERIC. This patch uses that config option to make sure only arches which intend to implement this have the requirement. This works fine for tilegx according to Chris Metcalf Signed-off-by:
Eric Paris <eparis@redhat.com>
-
Keith Busch authored
For commands returned with failed status, queue these for resubmission and continue retrying them until success or for a limited amount of time. The final timeout was arbitrarily chosen so requests can't be retried indefinitely. Since these are requeued on the nvmeq that submitted the command, the callbacks have to take an nvmeq instead of an nvme_dev as a parameter so that we can use the locked queue to append the iod to retry later. The nvme_iod conviently can be used to track how long we've been trying to successfully complete an iod request. The nvme_iod also provides the nvme prp dma mappings, so I had to move a few things around so we can keep those mappings. Signed-off-by:
Keith Busch <keith.busch@intel.com> [fixed checkpatch issue with long line] Signed-off-by:
Matthew Wilcox <matthew.r.wilcox@intel.com>
-
Keith Busch authored
Increase the default timeout to 30 seconds to match SCSI. Signed-off-by:
Keith Busch <keith.busch@intel.com> [use byte instead of ushort] Signed-off-by:
Matthew Wilcox <matthew.r.wilcox@intel.com>
-
Keith Busch authored
Registers with hot cpu notification to rebalance, and potentially allocate additional, io queues. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Matthew Wilcox <matthew.r.wilcox@intel.com>
-
Keith Busch authored
The device's IO queues are associated with CPUs, so we can use a per-cpu variable to map the a qid to a cpu. This provides a convienient way to optimally assign queues to multiple cpus when the device supports fewer queues than the host has cpus. The previous implementation may have assigned these poorly in these situations. This patch addresses this by sharing queues among cpus that are "close" together and should have a lower lock contention penalty. Signed-off-by:
Keith Busch <keith.busch@intel.com> Signed-off-by:
Matthew Wilcox <matthew.r.wilcox@intel.com>
-
Jens Axboe authored
Martin reported that his test system would not boot with current git, it oopsed with this: BUG: unable to handle kernel paging request at ffff88046c6c9e80 IP: [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 PGD 1ddf067 PUD 1de2067 PMD 47fc7d067 PTE 800000046c6c9060 Oops: 0002 [#1] SMP DEBUG_PAGEALLOC Modules linked in: sd_mod lpfc(+) scsi_transport_fc scsi_tgt oracleasm rpcsec_gss_krb5 ipv6 igb dca i2c_algo_bit i2c_core hwmon CPU: 3 PID: 87 Comm: kworker/u17:1 Not tainted 3.14.0+ #246 Hardware name: Supermicro X9DRX+-F/X9DRX+-F, BIOS 3.00 07/09/2013 Workqueue: events_unbound async_run_entry_fn task: ffff8802743c2150 ti: ffff880273d02000 task.ti: ffff880273d02000 RIP: 0010:[<ffffffff812971e0>] [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 RSP: 0018:ffff880273d03a58 EFLAGS: 00010092 RAX: ffff88046c6c9e78 RBX: ffff880077208e78 RCX: 00000000fffc8da6 RDX: 00000000fffc186d RSI: 0000000000000009 RDI: 00000000fffc8d9d RBP: ffff880273d03a88 R08: 0000000000000001 R09: ffff8800021c2410 R10: 0000000000000005 R11: 0000000000015b30 R12: ffff88046c5bb8a0 R13: ffff88046c5c0890 R14: 000000000000001e R15: 000000000000001e FS: 0000000000000000(0000) GS:ffff880277b00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff88046c6c9e80 CR3: 00000000018f6000 CR4: 00000000000407e0 Stack: ffff880273d03a98 ffff880474b18800 0000000000000000 ffff880474157000 ffff88046c5c0890 ffff880077208e78 ffff880273d03ae8 ffffffff813b9e62 ffff880200000010 ffff880474b18968 ffff880474b18848 ffff88046c5c0cd8 Call Trace: [<ffffffff813b9e62>] scsi_request_fn+0xf2/0x510 [<ffffffff81293167>] __blk_run_queue+0x37/0x50 [<ffffffff8129ac43>] blk_execute_rq_nowait+0xb3/0x130 [<ffffffff8129ad24>] blk_execute_rq+0x64/0xf0 [<ffffffff8108d2b0>] ? bit_waitqueue+0xd0/0xd0 [<ffffffff813bba35>] scsi_execute+0xe5/0x180 [<ffffffff813bbe4a>] scsi_execute_req_flags+0x9a/0x110 [<ffffffffa01b1304>] sd_spinup_disk+0x94/0x460 [sd_mod] [<ffffffff81160000>] ? __unmap_hugepage_range+0x200/0x2f0 [<ffffffffa01b2b9a>] sd_revalidate_disk+0xaa/0x3f0 [sd_mod] [<ffffffffa01b2fb8>] sd_probe_async+0xd8/0x200 [sd_mod] [<ffffffff8107703f>] async_run_entry_fn+0x3f/0x140 [<ffffffff8106a1c5>] process_one_work+0x175/0x410 [<ffffffff8106b373>] worker_thread+0x123/0x400 [<ffffffff8106b250>] ? manage_workers+0x160/0x160 [<ffffffff8107104e>] kthread+0xce/0xf0 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70 [<ffffffff815f0bac>] ret_from_fork+0x7c/0xb0 [<ffffffff81070f80>] ? kthread_freezable_should_stop+0x70/0x70 Code: 48 0f ab 11 72 db 48 81 4b 40 00 00 10 00 89 83 08 01 00 00 48 89 df 49 8b 04 24 48 89 1c d0 e8 f7 a8 ff ff 49 8b 85 28 05 00 00 <48> 89 58 08 48 89 03 49 8d 85 28 05 00 00 48 89 43 08 49 89 9d RIP [<ffffffff812971e0>] blk_queue_start_tag+0x90/0x150 RSP <ffff880273d03a58> CR2: ffff88046c6c9e80 Martin bisected and found this to be the problem patch; commit 6d113398 Author: Jan Kara <jack@suse.cz> Date: Mon Feb 24 16:39:54 2014 +0100 block: Stop abusing rq->csd.list in blk-softirq and the problem was immediately apparent. The patch states that it is safe to reuse queuelist at completion time, since it is no longer used. However, that is not true if a device is using block enabled tagging. If that is the case, then the queuelist is reused to keep track of busy tags. If a device also ended up using softirq completions, we'd reuse ->queuelist for the IPI handling while block tagging was still using it. Boom. Fix this by adding a new ipi_list list head, and share the memory used with the request hash table. The hash table is never used after the request is moved to the dispatch list, which happens long before any potential completion of the request. Add a new request bit for this, so we don't have cases that check rq->hash while it could potentially have been reused for the IPI completion. Reported-by:
Martin K. Petersen <martin.petersen@oracle.com> Tested-by:
Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Martin K. Petersen authored
cmd_flags in struct request is now 64 bits wide but the scsi_execute functions truncated arguments passed to int leading to errors. Make sure the flags parameters are u64. Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Cc: Jens Axboe <axboe@fb.com> CC: Jan Kara <jack@suse.cz> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by:
Jens Axboe <axboe@fb.com>
-
Mathieu Desnoyers authored
gcc <= 4.5.x has significant limitations with respect to initialization of anonymous unions within structures. They need to be surrounded by brackets, _and_ they need to be initialized in the same order in which they appear in the structure declaration. Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 Link: http://lkml.kernel.org/r/1397077568-3156-1-git-send-email-mathieu.desnoyers@efficios.com Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
- Apr 09, 2014
-
-
Behan Webster authored
Similar to the fix in 40413dcb MODULE_DEVICE_TABLE(x86cpu, ...) expects the struct to be called struct x86cpu_device_id, and not struct x86_cpu_id which is what is used in the rest of the kernel code. Although gcc seems to ignore this error, clang fails without this define to fix the name. Code from drivers/thermal/x86_pkg_temp_thermal.c static const struct x86_cpu_id __initconst pkg_temp_thermal_ids[] = { ... }; MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids); Error from clang: drivers/thermal/x86_pkg_temp_thermal.c:577:1: error: variable has incomplete type 'const struct x86cpu_device_id' MODULE_DEVICE_TABLE(x86cpu, pkg_temp_thermal_ids); ^ include/linux/module.h:145:3: note: expanded from macro 'MODULE_DEVICE_TABLE' MODULE_GENERIC_TABLE(type##_device, name) ^ include/linux/module.h:87:32: note: expanded from macro 'MODULE_GENERIC_TABLE' extern const struct gtype##_id __mod_##gtype##_table \ ^ <scratch space>:143:1: note: expanded from here __mod_x86cpu_device_table ^ drivers/thermal/x86_pkg_temp_thermal.c:577:1: note: forward declaration of 'struct x86cpu_device_id' include/linux/module.h:145:3: note: expanded from macro 'MODULE_DEVICE_TABLE' MODULE_GENERIC_TABLE(type##_device, name) ^ include/linux/module.h:87:21: note: expanded from macro 'MODULE_GENERIC_TABLE' extern const struct gtype##_id __mod_##gtype##_table \ ^ <scratch space>:141:1: note: expanded from here x86cpu_device_id ^ 1 error generated. Signed-off-by:
Behan Webster <behanw@converseincode.com> Signed-off-by:
Jan-Simon Möller <dl9pf@gmx.de> Acked-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Mark Charlebois authored
Add a compiler-clang.h file to add specific macros needed for compiling the kernel with clang. Initially the only override required is the macro for silencing the compiler for a purposefully uninintialized variable. Author: Mark Charlebois <charlebm@gmail.com> Signed-off-by:
Mark Charlebois <charlebm@gmail.com> Signed-off-by:
Behan Webster <behanw@converseincode.com>
-
Behan Webster authored
Fix uninitialized return code in default case in cmpxchg-local.h This patch fixes the code to prevent an uninitialized return value that is detected when compiling with clang. The bug produces numerous warnings when compiling the Linux kernel with clang. Signed-off-by:
Behan Webster <behanw@converseincode.com> Signed-off-by:
Mark Charlebois <charlebm@gmail.com> Acked-by:
David Howells <dhowells@redhat.com> Acked-by:
Arnd Bergmann <arnd@arndb.de>
-
Mathieu Desnoyers authored
Fix the following sparse warnings: CHECK kernel/tracepoint.c kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces) kernel/tracepoint.c:184:18: expected struct tracepoint_func *tp_funcs kernel/tracepoint.c:184:18: got struct tracepoint_func [noderef] <asn:4>*funcs kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces) kernel/tracepoint.c:216:18: expected struct tracepoint_func *tp_funcs kernel/tracepoint.c:216:18: got struct tracepoint_func [noderef] <asn:4>*funcs kernel/tracepoint.c:392:24: error: return expression in void function CC kernel/tracepoint.o kernel/tracepoint.c: In function tracepoint_module_going: kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static? kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static? Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.com Signed-off-by:
Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-