- Oct 24, 2009
-
-
Eric Dumazet authored
IPIP tunnels use one rwlock to protect their hash tables. This locking scheme can be converted to RCU for free, since netdevice already must wait for a RCU grace period at dismantle time. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 21, 2009
-
-
Krishna Kumar authored
dst_negative_advice() should check for changed dst and reset sk_tx_queue_mapping accordingly. Pass sock to the callers of dst_negative_advice. (sk_reset_txq is defined just for use by dst_negative_advice. The only way I could find to get around this is to move dst_negative_() from dst.h to dst.c, include sock.h in dst.c, etc) Signed-off-by:
Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 20, 2009
-
-
John Dykstra authored
Use symbols instead of magic constants while checking PMTU discovery setsockopt. Remove redundant test in ip_rt_frag_needed() (done by caller). Signed-off-by:
John Dykstra <john.dykstra1@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 19, 2009
-
-
Steffen Klassert authored
This patch converts ah4 to the new ahash interface. Signed-off-by:
Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
- skb_kill_datagram() can increment sk->sk_drops itself, not callers. - UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
In order to have better cache layouts of struct sock (separate zones for rx/tx paths), we need this preliminary patch. Goal is to transfert fields used at lookup time in the first read-mostly cache line (inside struct sock_common) and move sk_refcnt to a separate cache line (only written by rx path) This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr, sport and id fields. This allows a future patch to define these fields as macros, like sk_refcnt, without name clashes. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 15, 2009
-
-
Eric Dumazet authored
sock_queue_rcv_skb() can update sk_drops itself, removing need for callers to take care of it. This is more consistent since sock_queue_rcv_skb() also reads sk_drops when queueing a skb. This adds sk_drops managment to many protocols that not cared yet. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 13, 2009
-
-
Eric Dumazet authored
Storing the mask (size - 1) instead of the size allows fast path to be a bit faster. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
udp_poll() can in some circumstances drop frames with incorrect checksums. Problem is we now have to lock the socket while dropping frames, or risk sk_forward corruption. This bug is present since commit 95766fff ([UDP]: Add memory accounting.) While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Willy Tarreau authored
I was trying to use TCP_DEFER_ACCEPT and noticed that if the client does not talk, the connection is never accepted and remains in SYN_RECV state until the retransmits expire, where it finally is deleted. This is bad when some firewall such as netfilter sits between the client and the server because the firewall sees the connection in ESTABLISHED state while the server will finally silently drop it without sending an RST. This behaviour contradicts the man page which says it should wait only for some time : TCP_DEFER_ACCEPT (since Linux 2.4) Allows a listener to be awakened only when data arrives on the socket. Takes an integer value (seconds), this can bound the maximum number of attempts TCP will make to complete the connection. This option should not be used in code intended to be portable. Also, looking at ipv4/tcp.c, a retransmit counter is correctly computed : case TCP_DEFER_ACCEPT: icsk->icsk_accept_queue.rskq_defer_accept = 0; if (val > 0) { /* Translate value in seconds to number of * retransmits */ while (icsk->icsk_accept_queue.rskq_defer_accept < 32 && val > ((TCP_TIMEOUT_INIT / HZ) << icsk->icsk_accept_queue.rskq_defer_accept)) icsk->icsk_accept_queue.rskq_defer_accept++; icsk->icsk_accept_queue.rskq_defer_accept++; } break; ==> rskq_defer_accept is used as a counter of retransmits. But in tcp_minisocks.c, this counter is only checked. And in fact, I have found no location which updates it. So I think that what was intended was to decrease it in tcp_minisocks whenever it is checked, which the trivial patch below does. Signed-off-by:
Willy Tarreau <w@1wt.eu> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 12, 2009
-
-
Neil Horman authored
Create a new socket level option to report number of queue overflows Recently I augmented the AF_PACKET protocol to report the number of frames lost on the socket receive queue between any two enqueued frames. This value was exported via a SOL_PACKET level cmsg. AFter I completed that work it was requested that this feature be generalized so that any datagram oriented socket could make use of this option. As such I've created this patch, It creates a new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue overflowed between any two given frames. It also augments the AF_PACKET protocol to take advantage of this new feature (as it previously did not touch sk->sk_drops, which this patch uses to record the overflow count). Tested successfully by me. Notes: 1) Unlike my previous patch, this patch simply records the sk_drops value, which is not a number of drops between packets, but rather a total number of drops. Deltas must be computed in user space. 2) While this patch currently works with datagram oriented protocols, it will also be accepted by non-datagram oriented protocols. I'm not sure if thats agreeable to everyone, but my argument in favor of doing so is that, for those protocols which aren't applicable to this option, sk_drops will always be zero, and reporting no drops on a receive queue that isn't used for those non-participating protocols seems reasonable to me. This also saves us having to code in a per-protocol opt in mechanism. 3) This applies cleanly to net-next assuming that commit 97775007 (my af packet cmsg patch) is reverted Signed-off-by:
Neil Horman <nhorman@tuxdriver.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 08, 2009
-
-
Eric Dumazet authored
UDP_HTABLE_SIZE was initialy defined to 128, which is a bit small for several setups. 4000 active UDP sockets -> 32 sockets per chain in average. An incoming frame has to lookup all sockets to find best match, so long chains hurt latency. Instead of a fixed size hash table that cant be perfect for every needs, let UDP stack choose its table size at boot time like tcp/ip route, using alloc_large_system_hash() helper Add an optional boot parameter, uhash_entries=x so that an admin can force a size between 256 and 65536 if needed, like thash_entries and rhash_entries. dmesg logs two new lines : [ 0.647039] UDP hash table entries: 512 (order: 0, 4096 bytes) [ 0.647099] UDP Lite hash table entries: 512 (order: 0, 4096 bytes) Maximal size on 64bit arches would be 65536 slots, ie 1 MBytes for non debugging spinlocks. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 07, 2009
-
-
Hagen Paul Pfeifer authored
There is no reason that cipso_v4_delopt() is not defined as a static function. Signed-off-by:
Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Atis Elsts authored
Add support for route lookup using sk_mark on IPv4 listening sockets. Signed-off-by:
Atis Elsts <atis@mikrotik.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
This fixes a bug with arp_notify. If arp_notify is enabled, kernel will crash if address is changed and no IP address is assigned. http://bugzilla.kernel.org/show_bug.cgi?id=14330 Reported-by:
Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
All usages of structure net_proto_ops should be declared const. Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ilia K authored
When routing daemon wants to enable forwarding of multicast traffic it performs something like: struct vifctl vc = { .vifc_vifi = 1, .vifc_flags = 0, .vifc_threshold = 1, .vifc_rate_limit = 0, .vifc_lcl_addr = ip, /* <--- ip address of physical interface, e.g. eth0 */ .vifc_rmt_addr.s_addr = htonl(INADDR_ANY), }; setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc)); This leads (in the kernel) to calling vif_add() function call which search the (physical) device using assigned IP address: dev = ip_dev_find(net, vifc->vifc_lcl_addr.s_addr); The current API (struct vifctl) does not allow to specify an interface other way than using it's IP, and if there are more than a single interface with specified IP only the first one will be found. The attached patch (against 2.6.30.4) allows to specify an interface by its index, instead of IP address: struct vifctl vc = { .vifc_vifi = 1, .vifc_flags = VIFF_USE_IFINDEX, /* NEW */ .vifc_threshold = 1, .vifc_rate_limit = 0, .vifc_lcl_ifindex = if_nametoindex("eth0"), /* NEW */ .vifc_rmt_addr.s_addr = htonl(INADDR_ANY), }; setsockopt(fd, IPPROTO_IP, MRT_ADD_VIF, &vc, sizeof(vc)); Signed-off-by:
Ilia K. <mail4ilia@gmail.com> === modified file 'include/linux/mroute.h' Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 05, 2009
-
-
Eric Dumazet authored
We currently dirty a cache line to update tunnel device stats (tx_packets/tx_bytes). We better use the txq->tx_bytes/tx_packets counters that already are present in cpu cache, in the cache line shared with txq->_xmit_lock This patch extends IPTUNNEL_XMIT() macro to use txq pointer provided by the caller. Also &tunnel->dev->stats can be replaced by &dev->stats Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
The FIB algorithim for IPV4 is set at compile time, but kernel goes through the overhead of function call indirection at runtime. Save some cycles by turning the indirect calls to direct calls to either hash or trie code. Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
We can make icmp messages tx completion callback a litle bit faster. Setting SOCK_USE_WRITE_QUEUE sk flag tells sock_wfree() to not call sk_write_space() on a socket we know no thread is posssibly waiting for write space. (on per cpu kernel internal icmp sockets only) This avoids the sock_def_write_space() call and read_lock(&sk->sk_callback_lock)/read_unlock(&sk->sk_callback_lock) calls as well. We avoid three atomic ops. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 02, 2009
-
-
Eric Dumazet authored
tcp_splice_read() doesnt take into account socket's O_NONBLOCK flag Before this patch : splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE); causes a random endless block (if pipe is full) and splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK); will return 0 immediately if the TCP buffer is empty. User application has no way to instruct splice() that socket should be in blocking mode but pipe in nonblock more. Many projects cannot use splice(tcp -> pipe) because of this flaw. http://git.samba.org/?p=samba.git;a=history;f=source3/lib/recvfile.c;h=ea0159642137390a0f7e57a123684e6e63e47581;hb=HEAD http://lkml.indiana.edu/hypermail/linux/kernel/0807.2/0687.html Linus introduced SPLICE_F_NONBLOCK in commit 29e35094 (splice: add SPLICE_F_NONBLOCK flag ) It doesn't make the splice itself necessarily nonblocking (because the actual file descriptors that are spliced from/to may block unless they have the O_NONBLOCK flag set), but it makes the splice pipe operations nonblocking. Linus intention was clear : let SPLICE_F_NONBLOCK control the splice pipe mode only This patch instruct tcp_splice_read() to use the underlying file O_NONBLOCK flag, as other socket operations do. Users will then call : splice(socket,0,pipe,0,128*1024,SPLICE_F_MOVE | SPLICE_F_NONBLOCK ); to block on data coming from socket (if file is in blocking mode), and not block on pipe output (to avoid deadlock) First version of this patch was submitted by Octavian Purdila Reported-by:
Volker Lendecke <vl@samba.org> Reported-by:
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
Octavian Purdila <opurdila@ixiacom.com> Acked-by:
Linus Torvalds <torvalds@linux-foundation.org> Acked-by:
Jens Axboe <jens.axboe@oracle.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Atis Elsts authored
This patch against v2.6.31 adds support for route lookup using sk_mark in some more places. The benefits from this patch are the following. First, SO_MARK option now has effect on UDP sockets too. Second, ip_queue_xmit() and inet_sk_rebuild_header() could fail to do routing lookup correctly if TCP sockets with SO_MARK were used. Signed-off-by:
Atis Elsts <atis@mikrotik.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com>
-
Ori Finkelman authored
Acknowledge TCP window scale support by inserting the proper option in SYN/ACK and SYN headers even if our window scale is zero. This fixes the following observed behavior: 1. Client sends a SYN with TCP window scaling option and non zero window scale value to a Linux box. 2. Linux box notes large receive window from client. 3. Linux decides on a zero value of window scale for its part. 4. Due to compare against requested window scale size option, Linux does not to send windows scale TCP option header on SYN/ACK at all. With the following result: Client box thinks TCP window scaling is not supported, since SYN/ACK had no TCP window scale option, while Linux thinks that TCP window scaling is supported (and scale might be non zero), since SYN had TCP window scale option and we have a mismatched idea between the client and server regarding window sizes. Probably it also fixes up the following bug (not observed in practice): 1. Linux box opens TCP connection to some server. 2. Linux decides on zero value of window scale. 3. Due to compare against computed window scale size option, Linux does not to set windows scale TCP option header on SYN. With the expected result that the server OS does not use window scale option due to not receiving such an option in the SYN headers, leading to suboptimal performance. Signed-off-by:
Gilad Ben-Yossef <gilad@codefidence.com> Signed-off-by:
Ori Finkelman <ori@comsleep.com> Acked-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Andrew Morton authored
net/ipv4/tcp.c: In function 'do_tcp_setsockopt': net/ipv4/tcp.c:2050: warning: comparison of distinct pointer types lacks a cast Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 01, 2009
-
-
David S. Miller authored
This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 25, 2009
-
-
Eric Dumazet authored
It seems recursion field from "struct ip_tunnel" is not anymore needed. recursion prevention is done at the upper level (in dev_queue_xmit()), since we use HARD_TX_LOCK protection for tunnels. This avoids a cache line ping pong on "struct ip_tunnel" : This structure should be now mostly read on xmit and receive paths. Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Shan Wei authored
Due to man page of setsockopt, if optlen is not valid, kernel should return -EINVAL. But a simple testcase as following, errno is 0, which means setsockopt is successful. addr.s_addr = inet_addr("192.1.2.3"); setsockopt(s, IPPROTO_IP, IP_MULTICAST_IF, &addr, 1); printf("errno is %d\n", errno); Xiaotian Feng(dfeng@redhat.com) caught the bug. We fix it firstly checking the availability of optlen and then dealing with the logic like other options. Reported-by:
Xiaotian Feng <dfeng@redhat.com> Signed-off-by:
Shan Wei <shanwei@cn.fujitsu.com> Acked-by:
Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 24, 2009
-
-
Alexey Dobriyan authored
It's unused. It isn't needed -- read or write flag is already passed and sysctl shouldn't care about the rest. It _was_ used in two places at arch/frv for some reason. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: James Morris <jmorris@namei.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 22, 2009
-
-
Jan Beulich authored
Sizing of memory allocations shouldn't depend on the number of physical pages found in a system, as that generally includes (perhaps a huge amount of) non-RAM pages. The amount of what actually is usable as storage should instead be used as a basis here. Some of the calculations (i.e. those not intending to use high memory) should likely even use (totalram_pages - totalhigh_pages). Signed-off-by:
Jan Beulich <jbeulich@novell.com> Acked-by:
Rusty Russell <rusty@rustcorp.com.au> Acked-by:
Ingo Molnar <mingo@elte.hu> Cc: Dave Airlie <airlied@linux.ie> Cc: Kyle McMartin <kyle@mcmartin.ca> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Pekka Enberg <penberg@cs.helsinki.fi> Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 16, 2009
-
-
Robert Varga authored
I have recently came across a preemption imbalance detected by: <4>huh, entered ffffffff80644630 with preempt_count 00000102, exited with 00000101? <0>------------[ cut here ]------------ <2>kernel BUG at /usr/src/linux/kernel/timer.c:664! <0>invalid opcode: 0000 [1] PREEMPT SMP with ffffffff80644630 being inet_twdr_hangman(). This appeared after I enabled CONFIG_TCP_MD5SIG and played with it a bit, so I looked at what might have caused it. One thing that struck me as strange is tcp_twsk_destructor(), as it calls tcp_put_md5sig_pool() -- which entails a put_cpu(), causing the detected imbalance. Found on 2.6.23.9, but 2.6.31 is affected as well, as far as I can tell. Signed-off-by:
Robert Varga <nite@hq.alert.sk> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 15, 2009
-
-
Moni Shoua authored
This patch fixes commit e36b9d16. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by:
Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by:
Moni Shoua <monis@voltaire.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Ilpo Järvinen authored
It was once upon time so that snd_sthresh was a 16-bit quantity. ...That has not been true for long period of time. I run across some ancient compares which still seem to trust such legacy. Put all that magic into a single place, I hopefully found all of them. Compile tested, though linking of allyesconfig is ridiculous nowadays it seems. Signed-off-by:
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Alexey Dobriyan authored
Remove long removed "inet_protocol_base" declaration. Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 09, 2009
-
-
Alexey Dobriyan authored
Signed-off-by:
Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 03, 2009
-
-
Wu Fengguang authored
This fixed a lockdep warning which appeared when doing stress memory tests over NFS: inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage. page reclaim => nfs_writepage => tcp_sendmsg => lock sk_lock mount_root => nfs_root_data => tcp_close => lock sk_lock => tcp_send_fin => alloc_skb_fclone => page reclaim David raised a concern that if the allocation fails in tcp_send_fin(), and it's GFP_ATOMIC, we are going to yield() (which sleeps) and loop endlessly waiting for the allocation to succeed. But fact is, the original GFP_KERNEL also sleeps. GFP_ATOMIC+yield() looks weird, but it is no worse the implicit sleep inside GFP_KERNEL. Both could loop endlessly under memory pressure. CC: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> CC: David S. Miller <davem@davemloft.net> CC: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by:
Wu Fengguang <fengguang.wu@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
Christoph Lameter pointed out that packet drops at qdisc level where not accounted in SNMP counters. Only if application sets IP_RECVERR, drops are reported to user (-ENOBUFS errors) and SNMP counters updated. IP_RECVERR is used to enable extended reliable error message passing, but these are not needed to update system wide SNMP stats. This patch changes things a bit to allow SNMP counters to be updated, regardless of IP_RECVERR being set or not on the socket. Example after an UDP tx flood # netstat -s ... IP: 1487048 outgoing packets dropped ... Udp: ... SndbufErrors: 1487048 send() syscalls, do however still return an OK status, to not break applications. Note : send() manual page explicitly says for -ENOBUFS error : "The output queue for a network interface was full. This generally indicates that the interface has stopped sending, but may be caused by transient congestion. (Normally, this does not occur in Linux. Packets are just silently dropped when a device queue overflows.) " This is not true for IP_RECVERR enabled sockets : a send() syscall that hit a qdisc drop returns an ENOBUFS error. Many thanks to Christoph, David, and last but not least, Alexey ! Signed-off-by:
Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 02, 2009
-
-
Stephen Hemminger authored
The function block inet_connect_sock_af_ops contains no data make it constant. Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Stephen Hemminger authored
These tables are never modified at runtime. Move to read-only section. Signed-off-by:
Stephen Hemminger <shemminger@vyatta.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Sep 01, 2009
-
-
Damian Lukowski authored
RFC 1122 specifies two threshold values R1 and R2 for connection timeouts, which may represent a number of allowed retransmissions or a timeout value. Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds in number of allowed retransmissions. For any desired threshold R2 (by means of time) one can specify tcp_retries2 (by means of number of retransmissions) such that TCP will not time out earlier than R2. This is the case, because the RTO schedule follows a fixed pattern, namely exponential backoff. However, the RTO behaviour is not predictable any more if RTO backoffs can be reverted, as it is the case in the draft "Make TCP more Robust to Long Connectivity Disruptions" (http://tools.ietf.org/html/draft-zimmermann-tcp-lcd ). In the worst case TCP would time out a connection after 3.2 seconds, if the initial RTO equaled MIN_RTO and each backoff has been reverted. This patch introduces a function retransmits_timed_out(N), which calculates the timeout of a TCP connection, assuming an initial RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions. Whenever timeout decisions are made by comparing the retransmission counter to some value N, this function can be used, instead. The meaning of tcp_retries2 will be changed, as many more RTO retransmissions can occur than the value indicates. However, it yields a timeout which is similar to the one of an unpatched, exponentially backing off TCP in the same scenario. As no application could rely on an RTO greater than MIN_RTO, there should be no risk of a regression. Signed-off-by:
Damian Lukowski <damian@tvk.rwth-aachen.de> Acked-by:
Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by:
David S. Miller <davem@davemloft.net>
-