Commits · ec18d9a2691d69cd14b48f9b919fddcef28b7f5c · E-EXK4 - Operating System Group / projects / Linux

Jul 12, 2012

ipv6: Add redirect support to all protocol icmp error handlers. · ec18d9a2
David S. Miller authored 12 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ec18d9a2
ipv6: Add ip6_redirect() and ip6_sk_redirect() helper functions. · 3a5ad2ee
David S. Miller authored 12 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
3a5ad2ee

ipv6: Move bulk of redirect handling into rt6_redirect(). · e8599ff4

David S. Miller authored 12 years ago


This sets things up so that we can have the protocol error handlers
call down into the ipv6 route code for redirects just as ipv4 already
does.

Signed-off-by: David S. Miller <davem@davemloft.net>

e8599ff4

ipv6: Export ndisc option parsing from ndisc.c · 30f2a5f3

David S. Miller authored 12 years ago


This is going to be used internally by the rt6 redirect code.

Signed-off-by: David S. Miller <davem@davemloft.net>

30f2a5f3

ipv4: Kill ip_rt_redirect(). · 1f42539d

David S. Miller authored 12 years ago


No longer needed, as the protocol handlers now all properly
propagate the redirect back into the routing code.

Signed-off-by: David S. Miller <davem@davemloft.net>

1f42539d

ipv4: Add ipv4_redirect() and ipv4_sk_redirect() helper functions. · b42597e2
David S. Miller authored 12 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
b42597e2
ipv4: Generalize ip_do_redirect() and hook into new dst_ops->redirect. · e47a185b
David S. Miller authored 12 years ago
```
All of the redirect acceptance policy is now contained within.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
e47a185b

ipv4: Rearrange arguments to ip_rt_redirect() · 94206125

David S. Miller authored 12 years ago


Pass in the SKB rather than just the IP addresses, so that policy
and other aspects can reside in ip_rt_redirect() rather then
icmp_redirect().

Signed-off-by: David S. Miller <davem@davemloft.net>

94206125

tcp: TCP Small Queues · 46d3ceab

Eric Dumazet authored 12 years ago


This introduce TSQ (TCP Small Queues)

TSQ goal is to reduce number of TCP packets in xmit queues (qdisc &
device queues), to reduce RTT and cwnd bias, part of the bufferbloat
problem.

sk->sk_wmem_alloc not allowed to grow above a given limit,
allowing no more than ~128KB [1] per tcp socket in qdisc/dev layers at a
given time.

TSO packets are sized/capped to half the limit, so that we have two
TSO packets in flight, allowing better bandwidth use.

As a side effect, setting the limit to 40000 automatically reduces the
standard gso max limit (65536) to 40000/2 : It can help to reduce
latencies of high prio packets, having smaller TSO packets.

This means we divert sock_wfree() to a tcp_wfree() handler, to
queue/send following frames when skb_orphan() [2] is called for the
already queued skbs.

Results on my dev machines (tg3/ixgbe nics) are really impressive,
using standard pfifo_fast, and with or without TSO/GSO.

Without reduction of nominal bandwidth, we have reduction of buffering
per bulk sender :
< 1ms on Gbit (instead of 50ms with TSO)
< 8ms on 100Mbit (instead of 132 ms)

I no longer have 4 MBytes backlogged in qdisc by a single netperf
session, and both side socket autotuning no longer use 4 Mbytes.

As skb destructor cannot restart xmit itself ( as qdisc lock might be
taken at this point ), we delegate the work to a tasklet. We use one
tasklest per cpu for performance reasons.

If tasklet finds a socket owned by the user, it sets TSQ_OWNED flag.
This flag is tested in a new protocol method called from release_sock(),
to eventually send new segments.

[1] New /proc/sys/net/ipv4/tcp_limit_output_bytes tunable
[2] skb_orphan() is usually called at TX completion time,
  but some drivers call it in their start_xmit() handler.
  These drivers should at least use BQL, or else a single TCP
  session can still fill the whole NIC TX ring, since TSQ will
  have no effect.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dave Taht <dave.taht@bufferbloat.net>
Cc: Tom Herbert <therbert@google.com>
Cc: Matt Mathis <mattmathis@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Nandita Dukkipati <nanditad@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

46d3ceab

Jul 11, 2012

ipv6: Move ipv6 twsk accessors outside of CONFIG_IPV6 ifdefs. · 48ee3569

David S. Miller authored 12 years ago


Fixes build when ipv6 is disabled.

Reported-by: Fengguang Wu <wfg@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

48ee3569

ipv6: optimize ipv6 addresses compares · 1a203cb3

Eric Dumazet authored 12 years ago


On 64 bit arches having efficient unaligned accesses (eg x86_64) we can
use long words to reduce number of instructions for free.

Joe Perches suggested to change ipv6_masked_addr_cmp() to return a bool
instead of 'int', to make sure ipv6_masked_addr_cmp() cannot be used
in a sorting function.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

1a203cb3

ipv4: Remove inetpeer from routes. · f185071d
David S. Miller authored 12 years ago
```
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
f185071d

ipv4: Maintain redirect and PMTU info in struct rtable again. · 5943634f

David S. Miller authored 12 years ago


Maintaining this in the inetpeer entries was not the right way to do
this at all.

Signed-off-by: David S. Miller <davem@davemloft.net>

5943634f

rtnetlink: Remove ts/tsage args to rtnl_put_cacheinfo(). · 87a50699
David S. Miller authored 12 years ago
```
Nobody provides non-zero values any longer.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
87a50699

inet: Kill FLOWI_FLAG_PRECOW_METRICS. · 3e12939a

David S. Miller authored 12 years ago


No longer needed.  TCP writes metrics, but now in it's own special
cache that does not dirty the route metrics.  Therefore there is no
longer any reason to pre-cow metrics in this way.

Signed-off-by: David S. Miller <davem@davemloft.net>

3e12939a

inet: Remove ->get_peer() method. · 16d18399
David S. Miller authored 12 years ago
```
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
16d18399
tcp: Remove tw->tw_peer · b6242b9b
David S. Miller authored 12 years ago
```
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
b6242b9b
tcp: Move timestamps from inetpeer to metrics cache. · 81166dd6
David S. Miller authored 12 years ago
```
With help from Lin Ming.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
81166dd6
net: Kill set_dst_metric_rtt(). · 94334d5e
David S. Miller authored 12 years ago
```
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
94334d5e

tcp: Maintain dynamic metrics in local cache. · 51c5d0c4

David S. Miller authored 12 years ago


Maintain a local hash table of TCP dynamic metrics blobs.

Computed TCP metrics are no longer maintained in the route metrics.

The table uses RCU and an extremely simple hash so that it has low
latency and low overhead.  A simple hash is legitimate because we only
make metrics blobs for fully established connections.

Some tweaking of the default hash table sizes, metric timeouts, and
the hash chain length limit certainly could use some tweaking.  But
the basic design seems sound.

With help from Eric Dumazet and Joe Perches.

Signed-off-by: David S. Miller <davem@davemloft.net>

51c5d0c4

tcp: Abstract back handling peer aliveness test into helper function. · ab92bb2f
David S. Miller authored 12 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
ab92bb2f
tcp: Move dynamnic metrics handling into seperate file. · 4aabd8ef
David S. Miller authored 12 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
4aabd8ef

etherdevice: introduce eth_broadcast_addr · ad7eee98

Johannes Berg authored 12 years ago


A lot of code has either the memset or an inefficient copy
from a static array that contains the all-ones broadcast
address. Introduce eth_broadcast_addr() to fill an address
with all ones, making the code clearer and allowing us to
get rid of some constant arrays.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ad7eee98

ipv4: Fix crashes in fib_rules_tclass(). · e044a651

David S. Miller authored 12 years ago


All paths assume, when CONFIG_IP_MULTIPLE_TABLES is enabled, that any
successful call to fib_lookup() will initialize the fib_result->r
value to something.

We violated that expectation in the new fib_lookup() fast path.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Tested-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Greg Rose <gregory.v.rose@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

e044a651

Jul 09, 2012

netfilter: nf_ct_ecache: fix crash with multiple containers, one shutting down · 6bd0405b

Pablo Neira Ayuso authored 12 years ago


Hans reports that he's still hitting:

BUG: unable to handle kernel NULL pointer dereference at 000000000000027c
IP: [<ffffffff813615db>] netlink_has_listeners+0xb/0x60
PGD 0
Oops: 0000 [#3] PREEMPT SMP
CPU 0

It happens when adding a number of containers with do:

nfct_query(h, NFCT_Q_CREATE, ct);

and most likely one namespace shuts down.

this problem was supposed to be fixed by:
70e9942f netfilter: nf_conntrack: make event callback registration per-netns

Still, it was missing one rcu_access_pointer to check if the callback
is set or not.

Reported-by: Hans Schillstrom <hans@schillstrom.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

6bd0405b

phylib: Support registering a bunch of drivers · d5bf9071

Christian Hohnstaedt authored 12 years ago


If registering of one of them fails, all already registered drivers
of this module will be unregistered.

Use the new register/unregister functions in all drivers
registering more than one driver.

amd.c, realtek.c: Simplify: directly return registration result.

Tested with broadcom.c
All others compile-tested.

Signed-off-by: Christian Hohnstaedt <chohnstaedt@innominate.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

d5bf9071

Jul 08, 2012

net/mlx4: Implement promiscuous mode with device managed flow-steering · 592e49dd

Hadar Hen Zion authored 12 years ago


The device managed flow steering API has three promiscuous modes:

1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.

Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

592e49dd

{NET, IB}/mlx4: Add device managed flow steering firmware API · 0ff1fb65

Hadar Hen Zion authored 12 years ago


The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

0ff1fb65

net/mlx4_core: Add firmware commands to support device managed flow steering · 8fcfb4db

Hadar Hen Zion authored 12 years ago

Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

8fcfb4db

net/mlx4: Set steering mode according to device capabilities · c96d97f4

Hadar Hen Zion authored 12 years ago


Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

c96d97f4

Jul 06, 2012

ipv4: Avoid overhead when no custom FIB rules are installed. · f4530fa5

David S. Miller authored 12 years ago


If the user hasn't actually installed any custom rules, or fiddled
with the default ones, don't go through the whole FIB rules layer.

It's just pure overhead.

Instead do what we do with CONFIG_IP_MULTIPLE_TABLES disabled, check
the individual tables by hand, one by one.

Also, move fib_num_tclassid_users into the ipv4 network namespace.

Signed-off-by: David S. Miller <davem@davemloft.net>

f4530fa5

Jul 05, 2012

net-next: Add netif_get_num_default_rss_queues · 16917b87

Yuval Mintz authored 12 years ago


Most multi-queue networking driver consider the number of online cpus when
configuring RSS queues.
This patch adds a wrapper to the number of cpus, setting an upper limit on the
number of cpus a driver should consider (by default) when allocating resources
for his queues.

Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

16917b87

net: Kill dst->_neighbour, accessors, and final uses. · 36bdbcae
David S. Miller authored 12 years ago
```
No longer used.

Signed-off-by: David S. Miller <davem@davemloft.net>
```
36bdbcae

ipv6: Store route neighbour in rt6_info struct. · 97cac082

David S. Miller authored 12 years ago

This makes for a simplified conversion away from dst_get_neighbour*().

All code outside of ipv6 will use neigh lookups via dst_neigh_lookup*().

Signed-off-by: David S. Miller <davem@davemloft.net>

97cac082

net: Pass neighbours and dest address into NETEVENT_REDIRECT events. · 1d248b1c
David S. Miller authored 12 years ago
```
Signed-off-by: David S. Miller <davem@davemloft.net>
```
1d248b1c

decnet: Use neighbours privately in dn_route struct. · fccd7d5c

David S. Miller authored 12 years ago


This allows an easy conversion away from dst_get_neighbour*().

Signed-off-by: David S. Miller <davem@davemloft.net>

fccd7d5c

net: Add optional SKB arg to dst_ops->neigh_lookup(). · f894cbf8

David S. Miller authored 12 years ago


Causes the handler to use the daddr in the ipv4/ipv6 header when
the route gateway is unspecified (local subnet).

Signed-off-by: David S. Miller <davem@davemloft.net>

f894cbf8

net: Do delayed neigh confirmation. · 5110effe

David S. Miller authored 12 years ago


When a dst_confirm() happens, mark the confirmation as pending in the
dst.  Then on the next packet out, when we have the neigh in-hand, do
the update.

This removes the dependency in dst_confirm() of dst's having an
attached neigh.

While we're here, remove the explicit 'dst' NULL check, all except 2
or 3 call sites ensure it's not NULL.  So just fix those cases up.

Signed-off-by: David S. Miller <davem@davemloft.net>

5110effe

ipv4: Make neigh lookups directly in output packet path. · a263b309

David S. Miller authored 12 years ago


Do not use the dst cached neigh, we'll be getting rid of that.

Signed-off-by: David S. Miller <davem@davemloft.net>

a263b309

Jul 04, 2012

netfilter: nfnetlink_queue: do not allow to set unsupported flag bits · 46ba5a25

Krishna Kumar authored 12 years ago


Allow setting of only supported flag bits in queue->flags.

Signed-off-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

46ba5a25