- Jul 12, 2012
-
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
This sets things up so that we can have the protocol error handlers call down into the ipv6 route code for redirects just as ipv4 already does. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
This is going to be used internally by the rt6 redirect code. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer needed, as the protocol handlers now all properly propagate the redirect back into the routing code. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
All of the redirect acceptance policy is now contained within. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Pass in the SKB rather than just the IP addresses, so that policy and other aspects can reside in ip_rt_redirect() rather then icmp_redirect(). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
This introduce TSQ (TCP Small Queues) TSQ goal is to reduce number of TCP packets in xmit queues (qdisc & device queues), to reduce RTT and cwnd bias, part of the bufferbloat problem. sk->sk_wmem_alloc not allowed to grow above a given limit, allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a given time. TSO packets are sized/capped to half the limit, so that we have two TSO packets in flight, allowing better bandwidth use. As a side effect, setting the limit to 40000 automatically reduces the standard gso max limit (65536) to 40000/2 : It can help to reduce latencies of high prio packets, having smaller TSO packets. This means we divert sock_wfree() to a tcp_wfree() handler, to queue/send following frames when skb_orphan() [2] is called for the already queued skbs. Results on my dev machines (tg3/ixgbe nics) are really impressive, using standard pfifo_fast, and with or without TSO/GSO. Without reduction of nominal bandwidth, we have reduction of buffering per bulk sender : < 1ms on Gbit (instead of 50ms with TSO) < 8ms on 100Mbit (instead of 132 ms) I no longer have 4 MBytes backlogged in qdisc by a single netperf session, and both side socket autotuning no longer use 4 Mbytes. As skb destructor cannot restart xmit itself ( as qdisc lock might be taken at this point ), we delegate the work to a tasklet. We use one tasklest per cpu for performance reasons. If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag. This flag is tested in a new protocol method called from release_sock(), to eventually send new segments. [1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable [2] skb_orphan() is usually called at TX completion time, but some drivers call it in their start_xmit() handler. These drivers should at least use BQL, or else a single TCP session can still fill the whole NIC TX ring, since TSQ will have no effect. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Dave Taht <dave.taht@bufferbloat.net> Cc: Tom Herbert <therbert@google.com> Cc: Matt Mathis <mattmathis@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Nandita Dukkipati <nanditad@google.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 11, 2012
-
-
David S. Miller authored
Fixes build when ipv6 is disabled. Reported-by:
Fengguang Wu <wfg@linux.intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Eric Dumazet authored
On 64 bit arches having efficient unaligned accesses (eg x86_64) we can use long words to reduce number of instructions for free. Joe Perches suggested to change ipv6_masked_addr_cmp() to return a bool instead of 'int', to make sure ipv6_masked_addr_cmp() cannot be used in a sorting function. Signed-off-by:
Eric Dumazet <edumazet@google.com> Cc: Joe Perches <joe@perches.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Maintaining this in the inetpeer entries was not the right way to do this at all. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Nobody provides non-zero values any longer. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer needed. TCP writes metrics, but now in it's own special cache that does not dirty the route metrics. Therefore there is no longer any reason to pre-cow metrics in this way. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
With help from Lin Ming. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Maintain a local hash table of TCP dynamic metrics blobs. Computed TCP metrics are no longer maintained in the route metrics. The table uses RCU and an extremely simple hash so that it has low latency and low overhead. A simple hash is legitimate because we only make metrics blobs for fully established connections. Some tweaking of the default hash table sizes, metric timeouts, and the hash chain length limit certainly could use some tweaking. But the basic design seems sound. With help from Eric Dumazet and Joe Perches. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Johannes Berg authored
A lot of code has either the memset or an inefficient copy from a static array that contains the all-ones broadcast address. Introduce eth_broadcast_addr() to fill an address with all ones, making the code clearer and allowing us to get rid of some constant arrays. Signed-off-by:
Johannes Berg <johannes.berg@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any successful call to fib_lookup() will initialize the fib_result->r value to something. We violated that expectation in the new fib_lookup() fast path. Reported-by:
Or Gerlitz <ogerlitz@mellanox.com> Tested-by:
Eric Dumazet <eric.dumazet@gmail.com> Tested-by:
Greg Rose <gregory.v.rose@intel.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 09, 2012
-
-
Pablo Neira Ayuso authored
Hans reports that he's still hitting: BUG: unable to handle kernel NULL pointer dereference at 000000000000027c IP: [<ffffffff813615db>] netlink_has_listeners+0xb/0x60 PGD 0 Oops: 0000 [#3] PREEMPT SMP CPU 0 It happens when adding a number of containers with do: nfct_query(h, NFCT_Q_CREATE, ct); and most likely one namespace shuts down. this problem was supposed to be fixed by: 70e9942f netfilter: nf_conntrack: make event callback registration per-netns Still, it was missing one rcu_access_pointer to check if the callback is set or not. Reported-by:
Hans Schillstrom <hans@schillstrom.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-
Christian Hohnstaedt authored
If registering of one of them fails, all already registered drivers of this module will be unregistered. Use the new register/unregister functions in all drivers registering more than one driver. amd.c, realtek.c: Simplify: directly return registration result. Tested with broadcom.c All others compile-tested. Signed-off-by:
Christian Hohnstaedt <chohnstaedt@innominate.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 08, 2012
-
-
Hadar Hen Zion authored
The device managed flow steering API has three promiscuous modes: 1. Uplink - captures all the packets that arrive to the port. 2. Allmulti - captures all multicast packets arriving to the port. 3. Function port - for future use, this mode is not implemented yet. Use these modes with the flow_attach and flow_detach firmware commands according to the promiscuous state of the netdevice. Signed-off-by:
Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
The driver is modified to support three operation modes. If supported by firmware use the device managed flow steering API, that which we call device managed steering mode. Else, if the firmware supports the B0 steering mode use it, and finally, if none of the above, use the A0 steering mode. When the steering mode is device managed, the code is modified such that L2 based rules set by the mlx4_en driver for Ethernet unicast and multicast, and the IB stack multicast attach calls done through the mlx4_ib driver are all routed to use the device managed API. When attaching rule using device managed flow steering API, the firmware returns a 64 bit registration id, which is to be provided during detach. Currently the firmware is always programmed during HCA initialization to use standard L2 hashing. Future work should be done to allow configuring the flow-steering hash function with common, non proprietary means. Signed-off-by:
Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
Add support for firmware commands to attach/detach a new device managed steering mode. Such network steering rules allow the user to provide an L2/L3/L4 flow specification to the firmware and have the device to steer traffic that matches that specification to the provided QP. Signed-off-by:
Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
Hadar Hen Zion authored
Instead of checking the firmware supported steering mode in various places in the code, add a dedicated field in the mlx4 device capabilities structure which is written once during the initialization flow and read across the code. This also set the grounds for add new steering modes. Currently two modes are supported, and are named after the ConnectX HW versions A0 and B0. A0 steering uses mac_index, vlan_index and priority to steer traffic into pre-defined range of QPs. B0 steering uses Ethernet L2 hashing rules and is enabled only if the firmware supports both unicast and multicast B0 steering, The current steering modes are relevant for Ethernet traffic only, such that Infiniband steering remains untouched. Signed-off-by:
Hadar Hen Zion <hadarh@mellanox.co.il> Signed-off-by:
Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 06, 2012
-
-
David S. Miller authored
If the user hasn't actually installed any custom rules, or fiddled with the default ones, don't go through the whole FIB rules layer. It's just pure overhead. Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check the individual tables by hand, one by one. Also, move fib_num_tclassid_users into the ipv4 network namespace. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 05, 2012
-
-
Yuval Mintz authored
Most multi-queue networking driver consider the number of online cpus when configuring RSS queues. This patch adds a wrapper to the number of cpus, setting an upper limit on the number of cpus a driver should consider (by default) when allocating resources for his queues. Signed-off-by:
Yuval Mintz <yuvalmin@broadcom.com> Signed-off-by:
Eilon Greenstein <eilong@broadcom.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
No longer used. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
This makes for a simplified conversion away from dst_get_neighbour*(). All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*(). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
This allows an easy conversion away from dst_get_neighbour*(). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Causes the handler to use the daddr in the ipv4/ipv6 header when the route gateway is unspecified (local subnet). Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
When a dst_confirm() happens, mark the confirmation as pending in the dst. Then on the next packet out, when we have the neigh in-hand, do the update. This removes the dependency in dst_confirm() of dst's having an attached neigh. While we're here, remove the explicit 'dst' NULL check, all except 2 or 3 call sites ensure it's not NULL. So just fix those cases up. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Do not use the dst cached neigh, we'll be getting rid of that. Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Jul 04, 2012
-
-
Krishna Kumar authored
Allow setting of only supported flag bits in queue->flags. Signed-off-by:
Krishna Kumar <krkumar2@in.ibm.com> Signed-off-by:
Pablo Neira Ayuso <pablo@netfilter.org>
-