Skip to content
Snippets Groups Projects
  1. Feb 03, 2008
    • Arnaldo Carvalho de Melo's avatar
      [SOCK] proto: Add hashinfo member to struct proto · ab1e0a13
      Arnaldo Carvalho de Melo authored
      
      This way we can remove TCP and DCCP specific versions of
      
      sk->sk_prot->get_port: both v4 and v6 use inet_csk_get_port
      sk->sk_prot->hash:     inet_hash is directly used, only v6 need
                             a specific version to deal with mapped sockets
      sk->sk_prot->unhash:   both v4 and v6 use inet_hash directly
      
      struct inet_connection_sock_af_ops also gets a new member, bind_conflict, so
      that inet_csk_get_port can find the per family routine.
      
      Now only the lookup routines receive as a parameter a struct inet_hashtable.
      
      With this we further reuse code, reducing the difference among INET transport
      protocols.
      
      Eventually work has to be done on UDP and SCTP to make them share this
      infrastructure and get as a bonus inet_diag interfaces so that iproute can be
      used with these protocols.
      
      net-2.6/net/ipv4/inet_hashtables.c:
        struct proto			     |   +8
        struct inet_connection_sock_af_ops |   +8
       2 structs changed
        __inet_hash_nolisten               |  +18
        __inet_hash                        | -210
        inet_put_port                      |   +8
        inet_bind_bucket_create            |   +1
        __inet_hash_connect                |   -8
       5 functions changed, 27 bytes added, 218 bytes removed, diff: -191
      
      net-2.6/net/core/sock.c:
        proto_seq_show                     |   +3
       1 function changed, 3 bytes added, diff: +3
      
      net-2.6/net/ipv4/inet_connection_sock.c:
        inet_csk_get_port                  |  +15
       1 function changed, 15 bytes added, diff: +15
      
      net-2.6/net/ipv4/tcp.c:
        tcp_set_state                      |   -7
       1 function changed, 7 bytes removed, diff: -7
      
      net-2.6/net/ipv4/tcp_ipv4.c:
        tcp_v4_get_port                    |  -31
        tcp_v4_hash                        |  -48
        tcp_v4_destroy_sock                |   -7
        tcp_v4_syn_recv_sock               |   -2
        tcp_unhash                         | -179
       5 functions changed, 267 bytes removed, diff: -267
      
      net-2.6/net/ipv6/inet6_hashtables.c:
        __inet6_hash |   +8
       1 function changed, 8 bytes added, diff: +8
      
      net-2.6/net/ipv4/inet_hashtables.c:
        inet_unhash                        | +190
        inet_hash                          | +242
       2 functions changed, 432 bytes added, diff: +432
      
      vmlinux:
       16 functions changed, 485 bytes added, 492 bytes removed, diff: -7
      
      /home/acme/git/net-2.6/net/ipv6/tcp_ipv6.c:
        tcp_v6_get_port                    |  -31
        tcp_v6_hash                        |   -7
        tcp_v6_syn_recv_sock               |   -9
       3 functions changed, 47 bytes removed, diff: -47
      
      /home/acme/git/net-2.6/net/dccp/proto.c:
        dccp_destroy_sock                  |   -7
        dccp_unhash                        | -179
        dccp_hash                          |  -49
        dccp_set_state                     |   -7
        dccp_done                          |   +1
       5 functions changed, 1 bytes added, 242 bytes removed, diff: -241
      
      /home/acme/git/net-2.6/net/dccp/ipv4.c:
        dccp_v4_get_port                   |  -31
        dccp_v4_request_recv_sock          |   -2
       2 functions changed, 33 bytes removed, diff: -33
      
      /home/acme/git/net-2.6/net/dccp/ipv6.c:
        dccp_v6_get_port                   |  -31
        dccp_v6_hash                       |   -7
        dccp_v6_request_recv_sock          |   +5
       3 functions changed, 5 bytes added, 38 bytes removed, diff: -33
      
      Signed-off-by: default avatarArnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ab1e0a13
  2. Feb 01, 2008
  3. Jan 29, 2008
    • David S. Miller's avatar
      [TCP]: Do not purge sk_forward_alloc entirely in tcp_delack_timer(). · 9993e7d3
      David S. Miller authored
      
      Otherwise we beat heavily on the global tcp_memory atomics
      when all of the sockets in the system are slowly sending
      perioding packet clumps.
      
      Noticed and suggested by Eric Dumazet.
      
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9993e7d3
    • Eric Dumazet's avatar
      [NET]: prot_inuse cleanups and optimizations · 65f76517
      Eric Dumazet authored
      
      1) Cleanups (all functions are prefixed by sock_prot_inuse)
      
      sock_prot_inc_use(prot) -> sock_prot_inuse_add(prot,-1)
      sock_prot_dec_use(prot) -> sock_prot_inuse_add(prot,-1)
      sock_prot_inuse()       -> sock_prot_inuse_get()
      
      New functions :
      
      sock_prot_inuse_init() and sock_prot_inuse_free() to abstract pcounter use.
      
      2) if CONFIG_PROC_FS=n, we can zap 'inuse' member from "struct proto",
      since nobody wants to read the inuse value.
      
      This saves 1372 bytes on i386/SMP and some cpu cycles.
      
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      65f76517
    • Hideo Aoki's avatar
      [NET] CORE: Introducing new memory accounting interface. · 3ab224be
      Hideo Aoki authored
      
      This patch introduces new memory accounting functions for each network
      protocol. Most of them are renamed from memory accounting functions
      for stream protocols. At the same time, some stream memory accounting
      functions are removed since other functions do same thing.
      
      Renaming:
      	sk_stream_free_skb()		->	sk_wmem_free_skb()
      	__sk_stream_mem_reclaim()	->	__sk_mem_reclaim()
      	sk_stream_mem_reclaim()		->	sk_mem_reclaim()
      	sk_stream_mem_schedule 		->    	__sk_mem_schedule()
      	sk_stream_pages()      		->	sk_mem_pages()
      	sk_stream_rmem_schedule()	->	sk_rmem_schedule()
      	sk_stream_wmem_schedule()	->	sk_wmem_schedule()
      	sk_charge_skb()			->	sk_mem_charge()
      
      Removeing
      	sk_stream_rfree():	consolidates into sock_rfree()
      	sk_stream_set_owner_r(): consolidates into skb_set_owner_r()
      	sk_stream_mem_schedule()
      
      The following functions are added.
          	sk_has_account(): check if the protocol supports accounting
      	sk_mem_uncharge(): do the opposite of sk_mem_charge()
      
      In addition, to achieve consolidation, updating sk_wmem_queued is
      removed from sk_mem_charge().
      
      Next, to consolidate memory accounting functions, this patch adds
      memory accounting calls to network core functions. Moreover, present
      memory accounting call is renamed to new accounting call.
      
      Finally we replace present memory accounting calls with new interface
      in TCP and SCTP.
      
      Signed-off-by: default avatarTakahiro Yasui <tyasui@redhat.com>
      Signed-off-by: default avatarHideo Aoki <haoki@redhat.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3ab224be
    • Eric Dumazet's avatar
      [SOCK] Avoid divides in sk_stream_pages() and __sk_stream_mem_reclaim() · 21371f76
      Eric Dumazet authored
      
      sk_forward_alloc being signed, we should take care of divides by
      SK_STREAM_MEM_QUANTUM we do in sk_stream_pages() and
      __sk_stream_mem_reclaim()
      
      This patchs introduces SK_STREAM_MEM_QUANTUM_SHIFT, defined
      as ilog2(SK_STREAM_MEM_QUANTUM), to be able to use right
      shifts instead of plain divides.
      
      This should help compiler to choose right shifts instead of
      expensive divides (as seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y on x86)
      
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21371f76
  4. Jan 28, 2008
  5. Jan 09, 2008
  6. Nov 19, 2007
    • Herbert Xu's avatar
      [TCP]: Fix TCP header misalignment · 21df56c6
      Herbert Xu authored
      
      Indeed my previous change to alloc_pskb has made it possible
      for the TCP header to be misaligned iff the MTU is not a multiple
      of 4 (and less than a page).  So I suspect the optimised IPsec
      MTU calculation is giving you just such an MTU :)
      
      This patch fixes it by changing alloc_pskb to make sure that
      the size is at least 32-bit aligned.  This does not cause the
      problem fixed by the previous patch because max_header is always
      32-bit aligned which means that in the SG/NOTSO case this will
      be a no-op.
      
      I thought about putting this in the callers but all the current
      callers are from TCP.  If and when we get a non-TCP caller we
      can always create a TCP wrapper for this function and move the
      alignment over there.
      
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      21df56c6
  7. Nov 15, 2007
    • Herbert Xu's avatar
      [TCP]: Fix size calculation in sk_stream_alloc_pskb · fb93134d
      Herbert Xu authored
      
      We round up the header size in sk_stream_alloc_pskb so that
      TSO packets get zero tail room.  Unfortunately this rounding
      up is not coordinated with the select_size() function used by
      TCP to calculate the second parameter of sk_stream_alloc_pskb.
      
      As a result, we may allocate more than a page of data in the
      non-TSO case when exactly one page is desired.
      
      In fact, rounding up the head room is detrimental in the non-TSO
      case because it makes memory that would otherwise be available to
      the payload head room.  TSO doesn't need this either, all it wants
      is the guarantee that there is no tail room.
      
      So this patch fixes this by adjusting the skb_reserve call so that
      exactly the requested amount (which all callers have calculated in
      a precise way) is made available as tail room.
      
      Signed-off-by: default avatarHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      fb93134d
  8. Nov 07, 2007
    • Eric Dumazet's avatar
      [NET]: Define infrastructure to keep 'inuse' changes in an efficent SMP/NUMA way. · 286ab3d4
      Eric Dumazet authored
      
      "struct proto" currently uses an array stats[NR_CPUS] to track change on
      'inuse' sockets per protocol.
      
      If NR_CPUS is big, this means we use a big memory area for this.
      Moreover, all this memory area is located on a single node on NUMA
      machines, increasing memory pressure on the boot node.
      
      In this patch, I tried to :
      
      - Keep a fast !CONFIG_SMP implementation
      - Keep a fast CONFIG_SMP implementation for often used protocols
      (tcp,udp,raw,...)
      - Introduce a NUMA efficient implementation
      
      Some helper macros are defined in include/net/sock.h
      These macros take into account CONFIG_SMP
      
      If a "struct proto" is declared without using DEFINE_PROTO_INUSE /
      REF_PROTO_INUSE
      macros, it will automatically use a default implementation, using a
      dynamically allocated percpu zone.
      This default implementation will be NUMA efficient, but might use 32/64
      bytes per possible cpu
      because of current alloc_percpu() implementation.
      However it still should be better than previous implementation based on
      stats[NR_CPUS] field.
      
      When a "struct proto" is changed to use the new macros, we use a single
      static "int" percpu variable,
      lowering the memory and cpu costs, still preserving NUMA efficiency.
      
      Signed-off-by: default avatarEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      286ab3d4
  9. Nov 01, 2007
  10. Oct 18, 2007
  11. Oct 11, 2007
  12. May 31, 2007
  13. May 08, 2007
  14. Apr 26, 2007
  15. Mar 06, 2007
  16. Mar 03, 2007
    • Wei Dong's avatar
      [NET]: Fix bugs in "Whether sock accept queue is full" checking · 8488df89
      Wei Dong authored
      
      	when I use linux TCP socket, and find there is a bug in function
      sk_acceptq_is_full().
      
      	When a new SYN comes, TCP module first checks its validation. If valid,
      send SYN,ACK to the client and add the sock to the syn hash table. Next
      time if received the valid ACK for SYN,ACK from the client. server will
      accept this connection and increase the sk->sk_ack_backlog -- which is
      done in function tcp_check_req().We check wether acceptq is full in
      function tcp_v4_syn_recv_sock().
      
      Consider an example:
      
       After listen(sockfd, 1) system call, sk->sk_max_ack_backlog is set to
      1. As we know, sk->sk_ack_backlog is initialized to 0. Assuming accept()
      system call is not invoked now.
      
      1. 1st connection comes. invoke sk_acceptq_is_full(). sk-
      >sk_ack_backlog=0 sk->sk_max_ack_backlog=1, function return 0 accept
      this connection. Increase the sk->sk_ack_backlog
      2. 2nd connection comes. invoke sk_acceptq_is_full(). sk-
      >sk_ack_backlog=1 sk->sk_max_ack_backlog=1, function return 0 accept
      this connection. Increase the sk->sk_ack_backlog
      3. 3rd connection comes. invoke sk_acceptq_is_full(). sk-
      >sk_ack_backlog=2 sk->sk_max_ack_backlog=1, function return 1. Refuse
      this connection.
      
      I think it has bugs. after listen system call. sk->sk_max_ack_backlog=1
      but now it can accept 2 connections.
      
      Signed-off-by: default avatarWei Dong <weid@np.css.fujitsu.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      8488df89
  17. Feb 28, 2007
    • Patrick McHardy's avatar
      [NET]: Handle disabled preemption in gfp_any() · 4498121c
      Patrick McHardy authored
      
      ctnetlink uses netlink_unicast from an atomic_notifier_chain
      (which is called within a RCU read side critical section)
      without holding further locks. netlink_unicast calls netlink_trim
      with the result of gfp_any() for the gfp flags, which are passed
      down to pskb_expand_header. gfp_any() only checks for softirq
      context and returns GFP_KERNEL, resulting in this warning:
      
      BUG: sleeping function called from invalid context at mm/slab.c:3032
      in_atomic():1, irqs_disabled():0
      no locks held by rmmod/7010.
      
      Call Trace:
       [<ffffffff8109467f>] debug_show_held_locks+0x9/0xb
       [<ffffffff8100b0b4>] __might_sleep+0xd9/0xdb
       [<ffffffff810b5082>] __kmalloc+0x68/0x110
       [<ffffffff811ba8f2>] pskb_expand_head+0x4d/0x13b
       [<ffffffff81053147>] netlink_broadcast+0xa5/0x2e0
       [<ffffffff881cd1d7>] :nfnetlink:nfnetlink_send+0x83/0x8a
       [<ffffffff8834f6a6>] :nf_conntrack_netlink:ctnetlink_conntrack_event+0x94c/0x96a
       [<ffffffff810624d6>] notifier_call_chain+0x29/0x3e
       [<ffffffff8106251d>] atomic_notifier_call_chain+0x32/0x60
       [<ffffffff881d266d>] :nf_conntrack:destroy_conntrack+0xa5/0x1d3
       [<ffffffff881d194e>] :nf_conntrack:nf_ct_cleanup+0x8c/0x12c
       [<ffffffff881d4614>] :nf_conntrack:kill_l3proto+0x0/0x13
       [<ffffffff881d482a>] :nf_conntrack:nf_conntrack_l3proto_unregister+0x90/0x94
       [<ffffffff883551b3>] :nf_conntrack_ipv4:nf_conntrack_l3proto_ipv4_fini+0x2b/0x5d
       [<ffffffff8109d44f>] sys_delete_module+0x1b5/0x1e6
       [<ffffffff8105f245>] trace_hardirqs_on_thunk+0x35/0x37
       [<ffffffff8105911e>] system_call+0x7e/0x83
      
      Since netlink_unicast is supposed to be callable from within RCU
      read side critical sections, make gfp_any() check for in_atomic()
      instead of in_softirq().
      
      Additionally nfnetlink_send needs to use gfp_any() as well for the
      call to netlink_broadcast).
      
      Signed-off-by: default avatarPatrick McHardy <kaber@trash.net>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4498121c
Loading