Skip to content
Snippets Groups Projects
  1. Jan 31, 2014
  2. Jan 28, 2014
    • Duan Jiong's avatar
      net: gre: use icmp_hdr() to get inner ip header · c0c0c50f
      Duan Jiong authored
      
      When dealing with icmp messages, the skb->data points the
      ip header that triggered the sending of the icmp message.
      
      In gre_cisco_err(), the parse_gre_header() is called, and the
      iptunnel_pull_header() is called to pull the skb at the end of
      the parse_gre_header(), so the skb->data doesn't point the
      inner ip header.
      
      Unfortunately, the ipgre_err still needs those ip addresses in
      inner ip header to look up tunnel by ip_tunnel_lookup().
      
      So just use icmp_hdr() to get inner ip header instead of skb->data.
      
      Signed-off-by: default avatarDuan Jiong <duanj.fnst@cn.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c0c0c50f
    • Holger Eitzenberger's avatar
      net: Fix memory leak if TPROXY used with TCP early demux · a452ce34
      Holger Eitzenberger authored
      
      I see a memory leak when using a transparent HTTP proxy using TPROXY
      together with TCP early demux and Kernel v3.8.13.15 (Ubuntu stable):
      
      unreferenced object 0xffff88008cba4a40 (size 1696):
        comm "softirq", pid 0, jiffies 4294944115 (age 8907.520s)
        hex dump (first 32 bytes):
          0a e0 20 6a 40 04 1b 37 92 be 32 e2 e8 b4 00 00  .. j@..7..2.....
          02 00 07 01 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffff810b710a>] kmem_cache_alloc+0xad/0xb9
          [<ffffffff81270185>] sk_prot_alloc+0x29/0xc5
          [<ffffffff812702cf>] sk_clone_lock+0x14/0x283
          [<ffffffff812aaf3a>] inet_csk_clone_lock+0xf/0x7b
          [<ffffffff8129a893>] netlink_broadcast+0x14/0x16
          [<ffffffff812c1573>] tcp_create_openreq_child+0x1b/0x4c3
          [<ffffffff812c033e>] tcp_v4_syn_recv_sock+0x38/0x25d
          [<ffffffff812c13e4>] tcp_check_req+0x25c/0x3d0
          [<ffffffff812bf87a>] tcp_v4_do_rcv+0x287/0x40e
          [<ffffffff812a08a7>] ip_route_input_noref+0x843/0xa55
          [<ffffffff812bfeca>] tcp_v4_rcv+0x4c9/0x725
          [<ffffffff812a26f4>] ip_local_deliver_finish+0xe9/0x154
          [<ffffffff8127a927>] __netif_receive_skb+0x4b2/0x514
          [<ffffffff8127aa77>] process_backlog+0xee/0x1c5
          [<ffffffff8127c949>] net_rx_action+0xa7/0x200
          [<ffffffff81209d86>] add_interrupt_randomness+0x39/0x157
      
      But there are many more, resulting in the machine going OOM after some
      days.
      
      From looking at the TPROXY code, and with help from Florian, I see
      that the memory leak is introduced in tcp_v4_early_demux():
      
        void tcp_v4_early_demux(struct sk_buff *skb)
        {
          /* ... */
      
          iph = ip_hdr(skb);
          th = tcp_hdr(skb);
      
          if (th->doff < sizeof(struct tcphdr) / 4)
              return;
      
          sk = __inet_lookup_established(dev_net(skb->dev), &tcp_hashinfo,
                             iph->saddr, th->source,
                             iph->daddr, ntohs(th->dest),
                             skb->skb_iif);
          if (sk) {
              skb->sk = sk;
      
      where the socket is assigned unconditionally to skb->sk, also bumping
      the refcnt on it.  This is problematic, because in our case the skb
      has already a socket assigned in the TPROXY target.  This then results
      in the leak I see.
      
      The very same issue seems to be with IPv6, but haven't tested.
      
      Reviewed-by: default avatarFlorian Westphal <fw@strlen.de>
      Signed-off-by: default avatarHolger Eitzenberger <holger@eitzenberger.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      a452ce34
  3. Jan 27, 2014
  4. Jan 25, 2014
  5. Jan 24, 2014
  6. Jan 23, 2014
  7. Jan 22, 2014
  8. Jan 20, 2014
    • Hannes Frederic Sowa's avatar
      ipv6: make IPV6_RECVPKTINFO work for ipv4 datagrams · 4b261c75
      Hannes Frederic Sowa authored
      
      We currently don't report IPV6_RECVPKTINFO in cmsg access ancillary data
      for IPv4 datagrams on IPv6 sockets.
      
      This patch splits the ip6_datagram_recv_ctl into two functions, one
      which handles both protocol families, AF_INET and AF_INET6, while the
      ip6_datagram_recv_specific_ctl only handles IPv6 cmsg data.
      
      ip6_datagram_recv_*_ctl never reported back any errors, so we can make
      them return void. Also provide a helper for protocols which don't offer dual
      personality to further use ip6_datagram_recv_ctl, which is exported to
      modules.
      
      I needed to shuffle the code for ping around a bit to make it easier to
      implement dual personality for ping ipv6 sockets in future.
      
      Reported-by: default avatarGert Doering <gert@space.net>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4b261c75
  9. Jan 19, 2014
  10. Jan 18, 2014
  11. Jan 17, 2014
    • Paul Gortmaker's avatar
      net/ipv4: don't use module_init in non-modular gre_offload · cf172283
      Paul Gortmaker authored
      
      Recent commit 438e38fa
      ("gre_offload: statically build GRE offloading support") added
      new module_init/module_exit calls to the gre_offload.c file.
      
      The file is obj-y and can't be anything other than built-in.
      Currently it can never be built modular, so using module_init
      as an alias for __initcall can be somewhat misleading.
      
      Fix this up now, so that we can relocate module_init from
      init.h into module.h in the future.  If we don't do this, we'd
      have to add module.h to obviously non-modular code, and that
      would be a worse thing.  We also make the inclusion explicit.
      
      Note that direct use of __initcall is discouraged, vs. one
      of the priority categorized subgroups.  As __initcall gets
      mapped onto device_initcall, our use of device_initcall
      directly in this change means that the runtime impact is
      zero -- it will remain at level 6 in initcall ordering.
      
      As for the module_exit, rather than replace it with __exitcall,
      we simply remove it, since it appears only UML does anything
      with those, and even for UML, there is no relevant cleanup
      to be done here.
      
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarPaul Gortmaker <paul.gortmaker@windriver.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      cf172283
  12. Jan 15, 2014
  13. Jan 14, 2014
    • Neal Cardwell's avatar
      inet_diag: fix inet_diag_dump_icsk() to use correct state for timewait sockets · 70315d22
      Neal Cardwell authored
      
      Fix inet_diag_dump_icsk() to reflect the fact that both TCP_TIME_WAIT
      and TCP_FIN_WAIT2 connections are represented by inet_timewait_sock
      (not just TIME_WAIT), and for such sockets the tw_substate field holds
      the real state, which can be either TCP_TIME_WAIT or TCP_FIN_WAIT2.
      
      This brings the inet_diag state-matching code in line with the field
      it uses to populate idiag_state. This is also analogous to the info
      exported in /proc/net/tcp, where get_tcp4_sock() exports sk->sk_state
      and get_timewait4_sock() exports tw->tw_substate.
      
      Before fixing this, (a) neither "ss -nemoi" nor "ss -nemoi state
      fin-wait-2" would return a socket in TCP_FIN_WAIT2; and (b) "ss -nemoi
      state time-wait" would also return sockets in state TCP_FIN_WAIT2.
      
      This is an old bug that predates 05dbc7b5 ("tcp/dccp: remove twchain").
      
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      70315d22
  14. Jan 13, 2014
    • Neal Cardwell's avatar
      gre_offload: simplify GRE header length calculation in gre_gso_segment() · b884b1a4
      Neal Cardwell authored
      
      Simplify the GRE header length calculation in gre_gso_segment().
      Switch to an approach that is simpler, faster, and more general. The
      new approach will continue to be correct even if we add support for
      the optional variable-length routing info that may be present in a GRE
      header.
      
      Signed-off-by: default avatarNeal Cardwell <ncardwell@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: H.K. Jerry Chu <hkchu@google.com>
      Cc: Pravin B Shelar <pshelar@nicira.com>
      Acked-by: default avatarEric Dumazet <edumazet@google.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      b884b1a4
    • Wei Yongjun's avatar
      gre_offload: fix sparse non static symbol warning · d10dbad2
      Wei Yongjun authored
      
      Fixes the following sparse warning:
      
      net/ipv4/gre_offload.c:253:5: warning:
       symbol 'gre_gro_complete' was not declared. Should it be static?
      
      Signed-off-by: default avatarWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d10dbad2
    • Hannes Frederic Sowa's avatar
      ipv4: introduce hardened ip_no_pmtu_disc mode · 8ed1dc44
      Hannes Frederic Sowa authored
      
      This new ip_no_pmtu_disc mode only allowes fragmentation-needed errors
      to be honored by protocols which do more stringent validation on the
      ICMP's packet payload. This knob is useful for people who e.g. want to
      run an unmodified DNS server in a namespace where they need to use pmtu
      for TCP connections (as they are used for zone transfers or fallback
      for requests) but don't want to use possibly spoofed UDP pmtu information.
      
      Currently the whitelisted protocols are TCP, SCTP and DCCP as they check
      if the returned packet is in the window or if the association is valid.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: John Heffner <johnwheffner@gmail.com>
      Suggested-by: default avatarFlorian Weimer <fweimer@redhat.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8ed1dc44
    • Hannes Frederic Sowa's avatar
      ipv4: introduce ip_dst_mtu_maybe_forward and protect forwarding path against pmtu spoofing · f87c10a8
      Hannes Frederic Sowa authored
      
      While forwarding we should not use the protocol path mtu to calculate
      the mtu for a forwarded packet but instead use the interface mtu.
      
      We mark forwarded skbs in ip_forward with IPSKB_FORWARDED, which was
      introduced for multicast forwarding. But as it does not conflict with
      our usage in unicast code path it is perfect for reuse.
      
      I moved the functions ip_sk_accept_pmtu, ip_sk_use_pmtu and ip_skb_dst_mtu
      along with the new ip_dst_mtu_maybe_forward to net/ip.h to fix circular
      dependencies because of IPSKB_FORWARDED.
      
      Because someone might have written a software which does probe
      destinations manually and expects the kernel to honour those path mtus
      I introduced a new per-namespace "ip_forward_use_pmtu" knob so someone
      can disable this new behaviour. We also still use mtus which are locked on a
      route for forwarding.
      
      The reason for this change is, that path mtus information can be injected
      into the kernel via e.g. icmp_err protocol handler without verification
      of local sockets. As such, this could cause the IPv4 forwarding path to
      wrongfully emit fragmentation needed notifications or start to fragment
      packets along a path.
      
      Tunnel and ipsec output paths clear IPCB again, thus IPSKB_FORWARDED
      won't be set and further fragmentation logic will use the path mtu to
      determine the fragmentation size. They also recheck packet size with
      help of path mtu discovery and report appropriate errors.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: David Miller <davem@davemloft.net>
      Cc: John Heffner <johnwheffner@gmail.com>
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: default avatarHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      f87c10a8
    • Peter Zijlstra's avatar
      sched, net: Clean up preempt_enable_no_resched() abuse · 1774e9f3
      Peter Zijlstra authored
      
      The only valid use of preempt_enable_no_resched() is if the very next
      line is schedule() or if we know preemption cannot actually be enabled
      by that statement due to known more preempt_count 'refs'.
      
      Reviewed-by: default avatarThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: default avatarPeter Zijlstra <peterz@infradead.org>
      Cc: rjw@rjwysocki.net
      Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: rui.zhang@intel.com
      Cc: jacob.jun.pan@linux.intel.com
      Cc: Mike Galbraith <bitbucket@online.de>
      Cc: hpa@zytor.com
      Cc: Arjan van de Ven <arjan@linux.intel.com>
      Cc: lenb@kernel.org
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.net
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      1774e9f3
  15. Jan 10, 2014
  16. Jan 09, 2014
Loading