Skip to content
Snippets Groups Projects
  1. Mar 10, 2013
  2. Mar 09, 2013
  3. Jan 28, 2013
    • Eric Dumazet's avatar
      net: fix possible wrong checksum generation · cef401de
      Eric Dumazet authored
      
      Pravin Shelar mentioned that GSO could potentially generate
      wrong TX checksum if skb has fragments that are overwritten
      by the user between the checksum computation and transmit.
      
      He suggested to linearize skbs but this extra copy can be
      avoided for normal tcp skbs cooked by tcp_sendmsg().
      
      This patch introduces a new SKB_GSO_SHARED_FRAG flag, set
      in skb_shinfo(skb)->gso_type if at least one frag can be
      modified by the user.
      
      Typical sources of such possible overwrites are {vm}splice(),
      sendfile(), and macvtap/tun/virtio_net drivers.
      
      Tested:
      
      $ netperf -H 7.7.8.84
      MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to
      7.7.8.84 () port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3959.52
      
      $ netperf -H 7.7.8.84 -t TCP_SENDFILE
      TCP SENDFILE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 7.7.8.84 ()
      port 0 AF_INET
      Recv   Send    Send
      Socket Socket  Message  Elapsed
      Size   Size    Size     Time     Throughput
      bytes  bytes   bytes    secs.    10^6bits/sec
      
       87380  16384  16384    10.00    3216.80
      
      Performance of the SENDFILE is impacted by the extra allocation and
      copy, and because we use order-0 pages, while the TCP_STREAM uses
      bigger pages.
      
      Reported-by: default avatarPravin Shelar <pshelar@nicira.com>
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cef401de
  4. Nov 19, 2012
    • Eric W. Biederman's avatar
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman authored
      
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      52e804c6
  5. Nov 15, 2012
  6. Nov 10, 2012
  7. Nov 09, 2012
  8. Sep 28, 2012
    • stephen hemminger's avatar
      tunnel: drop packet if ECN present with not-ECT · eccc1bb8
      stephen hemminger authored
      
      Linux tunnels were written before RFC6040 and therefore never
      implemented the corner case of ECN getting set in the outer header
      and the inner header not being ready for it.
      
      Section 4.2.  Default Tunnel Egress Behaviour.
       o If the inner ECN field is Not-ECT, the decapsulator MUST NOT
            propagate any other ECN codepoint onwards.  This is because the
            inner Not-ECT marking is set by transports that rely on dropped
            packets as an indication of congestion and would not understand or
            respond to any other ECN codepoint [RFC4774].  Specifically:
      
            *  If the inner ECN field is Not-ECT and the outer ECN field is
               CE, the decapsulator MUST drop the packet.
      
            *  If the inner ECN field is Not-ECT and the outer ECN field is
               Not-ECT, ECT(0), or ECT(1), the decapsulator MUST forward the
               outgoing packet with the ECN field cleared to Not-ECT.
      
      This patch moves the ECN decap logic out of the individual tunnels
      into a common place.
      
      It also adds logging to allow detecting broken systems that
      set ECN bits incorrectly when tunneling (or an intermediate
      router might be changing the header).
      
      Overloads rx_frame_error to keep track of ECN related error.
      
      Thanks to Chris Wright who caught this while reviewing the new VXLAN
      tunnel.
      
      This code was tested by injecting faulty logic in other end GRE
      to send incorrectly encapsulated packets.
      
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      eccc1bb8
    • stephen hemminger's avatar
      xfrm: remove extranous rcu_read_lock · b0558ef2
      stephen hemminger authored
      
      The handlers for xfrm_tunnel are always invoked with rcu read lock
      already.
      
      Signed-off-by: default avatarStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b0558ef2
  9. Jul 20, 2012
  10. Jul 17, 2012
    • David S. Miller's avatar
      net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}() · 6700c270
      David S. Miller authored
      
      This will be used so that we can compose a full flow key.
      
      Even though we have a route in this context, we need more.  In the
      future the routes will be without destination address, source address,
      etc. keying.  One ipv4 route will cover entire subnets, etc.
      
      In this environment we have to have a way to possess persistent storage
      for redirects and PMTU information.  This persistent storage will exist
      in the FIB tables, and that's why we'll need to be able to rebuild a
      full lookup flow key here.  Using that flow key will do a fib_lookup()
      and create/update the persistent entry.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      6700c270
  11. Jul 12, 2012
  12. Jun 15, 2012
    • David S. Miller's avatar
      ipv4: Handle PMTU in all ICMP error handlers. · 36393395
      David S. Miller authored
      
      With ip_rt_frag_needed() removed, we have to explicitly update PMTU
      information in every ICMP error handler.
      
      Create two helper functions to facilitate this.
      
      1) ipv4_sk_update_pmtu()
      
         This updates the PMTU when we have a socket context to
         work with.
      
      2) ipv4_update_pmtu()
      
         Raw version, used when no socket context is available.  For this
         interface, we essentially just pass in explicit arguments for
         the flow identity information we would have extracted from the
         socket.
      
         And you'll notice that ipv4_sk_update_pmtu() is simply implemented
         in terms of ipv4_update_pmtu()
      
      Note that __ip_route_output_key() is used, rather than something like
      ip_route_output_flow() or ip_route_output_key().  This is because we
      absolutely do not want to end up with a route that does IPSEC
      encapsulation and the like.  Instead, we only want the route that
      would get us to the node described by the outermost IP header.
      
      Reported-by: default avatarSteffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      36393395
  13. Apr 15, 2012
  14. Apr 14, 2012
  15. Mar 12, 2012
    • Joe Perches's avatar
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches authored
      
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      
      Signed-off-by: default avatarJoe Perches <joe@perches.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      058bd4d2
  16. Jan 25, 2012
  17. Jan 12, 2012
  18. Dec 13, 2011
    • Ted Feng's avatar
      ipip, sit: copy parms.name after register_netdevice · 72b36015
      Ted Feng authored
      
      Same fix as 731abb9c for ipip and sit tunnel.
      Commit 1c5cae81 removed an explicit call to dev_alloc_name in
      ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice
      will now create a valid name, however the tunnel keeps a copy of the
      name in the private parms structure. Fix this by copying the name back
      after register_netdevice has successfully returned.
      
      This shows up if you do a simple tunnel add, followed by a tunnel show:
      
      $ sudo ip tunnel add mode ipip remote 10.2.20.211
      $ ip tunnel
      tunl0: ip/ip  remote any  local any  ttl inherit  nopmtudisc
      tunl%d: ip/ip  remote 10.2.20.211  local any  ttl inherit
      $ sudo ip tunnel add mode sit remote 10.2.20.212
      $ ip tunnel
      sit0: ipv6/ip  remote any  local any  ttl 64  nopmtudisc 6rd-prefix 2002::/16
      sit%d: ioctl 89f8 failed: No such device
      sit%d: ipv6/ip  remote 10.2.20.212  local any  ttl inherit
      
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarTed Feng <artisdom@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      72b36015
  19. Nov 08, 2011
  20. Aug 02, 2011
  21. May 05, 2011
  22. May 04, 2011
  23. Apr 22, 2011
  24. Mar 13, 2011
  25. Mar 10, 2011
    • Vasiliy Kulikov's avatar
      net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules · 8909c9ad
      Vasiliy Kulikov authored
      Since a8f80e8f any process with
      CAP_NET_ADMIN may load any module from /lib/modules/.  This doesn't mean
      that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
      limited to /lib/modules/**.  However, CAP_NET_ADMIN capability shouldn't
      allow anybody load any module not related to networking.
      
      This patch restricts an ability of autoloading modules to netdev modules
      with explicit aliases.  This fixes CVE-2011-1019.
      
      Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
      of loading netdev modules by name (without any prefix) for processes
      with CAP_SYS_MODULE to maintain the compatibility with network scripts
      that use autoloading netdev modules by aliases like "eth0", "wlan0".
      
      Currently there are only three users of the feature in the upstream
      kernel: ipip, ip_gre and sit.
      
          root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	fffffff800001000
          CapEff:	fffffff800001000
          CapBnd:	fffffff800001000
          root@albatros:~# modprobe xfs
          FATAL: Error inserting xfs
          (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit
          sit: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep sit
          root@albatros:~# ifconfig sit0
          sit0      Link encap:IPv6-in-IPv4
      	      NOARP  MTU:1480  Metric:1
      
          root@albatros:~# lsmod | grep sit
          sit                    10457  0
          tunnel4                 2957  1 sit
      
      For CAP_SYS_MODULE module loading is still relaxed:
      
          root@albatros:~# grep Cap /proc/$$/status
          CapInh:	0000000000000000
          CapPrm:	ffffffffffffffff
          CapEff:	ffffffffffffffff
          CapBnd:	ffffffffffffffff
          root@albatros:~# ifconfig xfs
          xfs: error fetching interface information: Device not found
          root@albatros:~# lsmod | grep xfs
          xfs                   745319  0
      
      Reference: https://lkml.org/lkml/2011/2/24/203
      
      
      
      Signed-off-by: default avatarVasiliy Kulikov <segoon@openwall.com>
      Signed-off-by: default avatarMichael Tokarev <mjt@tls.msk.ru>
      Acked-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarKees Cook <kees.cook@canonical.com>
      Signed-off-by: default avatarJames Morris <jmorris@namei.org>
      8909c9ad
  26. Mar 02, 2011
  27. Dec 01, 2010
  28. Nov 17, 2010
  29. Oct 27, 2010
  30. Oct 05, 2010
    • Eric Dumazet's avatar
      net: add a core netdev->rx_dropped counter · caf586e5
      Eric Dumazet authored
      
      In various situations, a device provides a packet to our stack and we
      drop it before it enters protocol stack :
      - softnet backlog full (accounted in /proc/net/softnet_stat)
      - bad vlan tag (not accounted)
      - unknown/unregistered protocol (not accounted)
      
      We can handle a per-device counter of such dropped frames at core level,
      and automatically adds it to the device provided stats (rx_dropped), so
      that standard tools can be used (ifconfig, ip link, cat /proc/net/dev)
      
      This is a generalization of commit 8990f468 (net: rx_dropped
      accounting), thus reverting it.
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      caf586e5
  31. Sep 29, 2010
    • Eric Dumazet's avatar
      ipip: enable lockless xmits · 153f0943
      Eric Dumazet authored
      
      IPIP tunnels can benefit from lockless xmits, using NETIF_F_LLTX
      
      Bench on a 16 cpus machine (dual E5540 cpus), 16 threads sending
      10000000 UDP frames via one ipip tunnel (size:200 bytes per frame)
      
      Before patch :
      real	2m53.321s
      user	0m10.277s
      sys	46m0.597s
      
      After patch:
      real	0m32.063s
      user	0m9.237s
      sys	8m16.255s
      
      Last problem to solve is the contention on dst :
      
      16118.00 28.3% __ip_route_output_key         vmlinux
       6135.00 10.8% dst_release                   vmlinux
       3220.00  5.6% ip_finish_output              vmlinux
       2149.00  3.8% ip_route_output_flow          vmlinux
       1575.00  2.8% ip_append_data                vmlinux
       1481.00  2.6% ip_push_pending_frames        vmlinux
       1349.00  2.4% __xfrm_lookup                 vmlinux
       1216.00  2.1% csum_partial_copy_generic     vmlinux
       1208.00  2.1% udp_sendmsg                   vmlinux
      
      Signed-off-by: default avatarEric Dumazet <eric.dumazet@gmail.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      153f0943
Loading