epoll

epoll是linux中IO多路复用的一种机制，I/O多路复用就是通过一种机制，一个进程可以监视多个描述符，一旦某个描述符就绪（一般是读就绪或者写就绪），能够通知程序进行相应的读写操作。当然linux中IO多路复用不仅仅是epoll，其他多路复用机制还有select、poll。

epoll接口的三个函数

创建epoll句柄

int epoll_create(int size);

epoll_create用来创建一个epoll的句柄，size用来告诉内核这个监听的数目一共有多大(Linux 2.6.8开始，忽略此参数）。当创建好epoll句柄后，会占用一个fd值，在linux下可查看/proc/进程id/fd/目录找到，使用完epoll后，必须调用close()关闭，否则可能导致fd被耗尽。

注册epoll事件

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

epoll_ctl向epoll对象中添加、修改或者删除感兴趣的事件，返回0表示成功，否则返回–1，此时需要根据errno错误码判断错误类型。

第一个参数epfd是epoll_create返回的epoll的句柄。

第二个参数op表示动作，用三个宏来表示：

EPOLL_CTL_ADD：注册新的fd到epfd中；
EPOLL_CTL_MOD：修改已经注册的fd的监听事件；
EPOLL_CTL_DEL：从epfd中删除一个fd；

第三个参数是需要监听的fd

第四个参数是告诉内核需要监听什么事，epoll_event定义如下：

typedef union epoll_data {
    void *ptr;
    int fd;
    __uint32_t u32;
    __uint64_t u64;
} epoll_data_t;

struct epoll_event {
    __uint32_t events; /* Epoll events */
    epoll_data_t data; /* User data variable */
};

其中epoll_event.events是以下几个宏的集合：

EPOLLIN 表示对应的文件描述符可以读（包括对端SOCKET正常关闭）
EPOLLOUT 表示对应的文件描述符可以写
EPOLLPRI 表示对应的文件描述符有紧急的数据可读（这里应该表示有带外数据到来）
EPOLLERR 表示对应的文件描述符发生错误
EPOLLHUP 表示对应的文件描述符被挂断
EPOLLET 将EPOLL设为边缘触发(Edge Triggered)模式，这是相对于水平触发(Level Triggered)来说的
EPOLLONESHOT 只监听一次事件，当监听完这次事件之后，如果还需要继续监听这个socket的话，需要再次把这个socket加入到EPOLL队列里

epoll_event.data.ptr是void *指针，用来传递用户自定义参数，当epoll_wait返回时候，epoll_event.data.ptr也会原值返回。Go语言在实现netpoll时候，就是基于这个，会将epoll_event.data.ptr指向fd相关的G，那么当epoll_wait返回时候，根据此唤醒挂起的G来进行读写fd.

epoll等待

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

epoll_wait用于等待事件的产生。epoll_wait一共有4个参数：

epfd是 epoll的描述符
events则是分配好的 epoll_event结构体数组，epoll将会把发生的事件复制到 events数组中
maxevents表示本次可以返回的最大事件数目，maxevents参数与预分配的events数组的大小是相等的
timeout表示在没有检测到事件发生时最多等待的时间（单位为毫秒），如果timeout为0，则表示 epoll_wait在 rdllist链表中为空，立刻返回，不会等待，-1表示永久等待，直到有事件发生返回。

两种操作模式

epoll对文件描述符有两种操作模式

LT（Level Trigger水平模式）

LT是epoll的默认操作模式，当epoll_wait函数检测到有事件发生并将通知应用程序，而应用程序不一定必须立即进行处理，这样epoll_wait函数再次检测到此事件的时候还会通知应用程序，直到事件被处理。LT支持阻塞的套接字和非阻塞的套接字。
ET（Edge Trigger边缘模式）

ET模式下，只要epoll_wait函数检测到事件发生，通知应用程序立即进行处理，后续的epoll_wait函数将不再检测此事件。因此ET模式在很大程度上降低了同一个事件被epoll触发的次数，因此效率比LT模式高。ET只支持非阻塞的套接字。

ET是状态变化的通知，即从没有数据转到有数据会通知，LT是数据变化的通知，即有数据就通知，没数据就不通知。对于ET模式，当接收到通知后，应该一直read循环读取，直到返回EWOULDBLOCK或EAGAIN，这样内部状态才会从有数据再次转为无数据，从而为下一次数据的到来做准备，否则只有对端再次发送数据时候，才会再次触发可读事件。

对于ET状态应该注意防止恶意请求连接，防止其一直请求，造成其他请求饿死。

ET模式下的EPOLLOUT事件的处理

考虑客户端请求服务端大文件的场景，当客户端EPOLLIN事件发生时候，我们write一个1G的大文件给客户端，但write一次最多也只能发送最大socket写缓冲大小的文件内容给客户端，立即再次write时候，会返回EAGAIN错误（写缓冲区满了，没法再接受数据了）。为了避免轮询write造成的资源空耗的这情况，我们可以使用EPOLLOUT事件，处理逻辑如下：

当EPOLLIN事件过来后，调用write发送数据, 如果返回值大于0，如果数据没有发完，则继续发送
如果write小于0，且errno等于EAGAIN，此时说明发送缓冲区满了. 那么需要把剩余的待发送数据保存起来，然后注册EPOLLOUT，直到epoll_wait返回EPOLLOUT事件, 那么说明发送缓冲区可写了, 则再发送之前保存起来的数据，如果此时write 返回值大于0，且数据都发送完了，那么就可以把EPOLLOUT事件取消掉.

epoll 优缺点

支持一个进程打开大数目的socket描述符(FD)

epoll没有select模型中的限制，它所支持的FD上限是最大可以打开文件的数目，这个数字一般远大于select 所支持的2048。进程打开文件数量是/proc/sys/fs/file-max。
IO效率不随FD数目增加而线性下降

传统select/poll的另一个致命弱点就是当你拥有一个很大的socket集合，由于网络得延时，使得任一时间只有部分的socket是”活跃”的，而select/poll每次调用都会线性扫描全部的集合，导致效率呈现线性下降。但是epoll不存在这个问题，它只会对”活跃”的socket进行操作：这是因为在内核实现中epoll是根据每个fd上面的callback函数实现的。于是，只有”活跃”的socket才会主动去调用callback函数，其他idle状态的socket则不会。
使用mmap加速内核与用户空间的消息传递

无论是select,poll还是epoll都需要内核把FD消息通知给用户空间，如何避免不必要的内存拷贝就显得很重要。在这点上，epoll是通过内核于用户空间mmap同一块内存实现。select/poll每次调用都要传递所要监控的所有fd给select/poll系统调用（这意味着每次调用都要将fd列表从用户态拷贝到内核态，当fd数目很多时，这会造成低效）。而每次调用epoll_wait时（作用相当于调用select/poll），不需要再传递fd列表给内核，因为已经在epoll_ctl中将需要监控的fd告诉了内核（epoll_ctl不需要每次都拷贝所有的fd，只需要进行增量式操作）。所以，在调用epoll_create之后，内核已经在内核态开始准备数据结构存放要监控的fd了。每次epoll_ctl只是对这个数据结构进行简单的维护。

epoll服务端代码示例

/**
	echo服务器
*/
#include <arpa/inet.h>
#include <errno.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <unistd.h>

#define MAX_EVENTS 10
#define MAX_LINE 30
#define LISTEN_BACKLOG 128

int set_non_blocking(int fd) {
  int oldopt;
  if ((oldopt = fcntl(fd, F_GETFL)) < 0) {
    return -1;
  }

  int newopt = oldopt | O_NONBLOCK;
  if (fcntl(fd, F_SETFL, newopt) < 0) {
    return -1;
  }
  return oldopt;
}

int set_reuse_addr(int sockfd, int reuse) {
  return setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(int));
}

void usage(char *progname) { fprintf(stderr, "Usage: %s <port>\n", progname); }

int main(int argc, char *argv[]) {
  int listenfd;
  int epfd;
  int port;

  if (argc != 2) {
    usage(argv[0]);
    exit(1);
  }

  if ((port = atoi(argv[1])) < 0) {
    fprintf(stderr, "invalid port: %s", argv[1]);
    exit(1);
  }

  // socket()
  listenfd = socket(AF_INET, SOCK_STREAM, 0);

  int reuse = 1;
  if (set_reuse_addr(listenfd, reuse) < 0) {
    perror("set_reuse_addr");
    return 1;
  }
  struct sockaddr_in serveraddr;
  memset(&serveraddr, 0, sizeof(struct sockaddr_in));
  serveraddr.sin_family = AF_INET;
  if (inet_aton("127.0.0.1", &(serveraddr.sin_addr)) < 0) {
    perror("inet_aton");
    return 1;
  }
  serveraddr.sin_port = htons(port);
  // bind()
  if (bind(listenfd, (struct sockaddr *)&serveraddr,
           sizeof(struct sockaddr_in)) < 0) {
    perror("bind");
    return 1;
  }
  // listen()
  if (listen(listenfd, LISTEN_BACKLOG) < 0) {
    perror("listen");
    return 1;
  }
  // epoll_create1
  if ((epfd = epoll_create1(0)) < 0) {
    perror("epoll_create1");
    return 1;
  }

  struct epoll_event event, *events;
  int connfd, sockfd, nfds;
  event.data.fd = listenfd;
  event.events = EPOLLIN;  // 默认水平触发模式
  // 注册epoll事件
  if (epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &event) < 0) {
    perror("epoll_ctl");
    return 1;
  }

  events = malloc(MAX_EVENTS * sizeof(struct epoll_event));
  char line[MAX_LINE + 1];
  struct sockaddr_in clientaddr;
  socklen_t clientlen;
  for (;;) {
    // 等待epoll事件的发生
    nfds = epoll_wait(epfd, events, MAX_EVENTS, -1);
    if (nfds < 0) {
      perror("epoll_wait");
      return 1;
    }

    // 处理所有事件
    for (int i = 0; i < nfds; i++) {
      if (events[i].data.fd == listenfd) {  // 客户端连接请求进来了
        connfd = accept(listenfd, (struct sockaddr *)&clientaddr, &clientlen);
        if (connfd < 0) {
          perror("accept");
          continue;
        }

        printf("accept a connection from %s:%d, fd=%d\n",
               inet_ntoa(clientaddr.sin_addr), ntohs(clientaddr.sin_port),
               connfd);

        // 注册event
        event.data.fd = connfd;
        event.events = EPOLLIN;  // 默认边缘触发模式
        if (epoll_ctl(epfd, EPOLL_CTL_ADD, connfd, &event) < 0) {
          perror("epoll_ctl");
        }
        continue;
      }

      if (events[i].events & EPOLLIN) {  // 读取客户端输入内容
        if ((sockfd = events[i].data.fd) < 0) continue;

        ssize_t nread;
        nread = read(sockfd, line, MAX_LINE);
        if (nread < 0) {
          if (errno == ECONNRESET) {  // 客户端异常掉线
            printf("client fd=%d lost connection\n", sockfd);
            if (epoll_ctl(epfd, EPOLL_CTL_DEL, sockfd, NULL) <
                0) {  // 移除掉监听事件
              perror("epoll_ctl");
            }
            close(sockfd);
            events[i].data.fd = -1;
          } else {
            perror("read");
            close(sockfd);
            if (epoll_ctl(epfd, EPOLL_CTL_DEL, sockfd, NULL) < 0) {
              perror("epoll_ctl");
            }
            events[i].data.fd = -1;
          }
        } else if (nread == 0) {  // 客户端正常下线
          printf("client fd=%d normal exit\n", sockfd);
          close(sockfd);
          events[i].data.fd = -1;
        } else {
          line[nread] = '\0';
          printf("received: %s\n", line);

          int nwrite = 0;
          int n;
          while (nwrite < nread) {  // 保证完成写入完成
            if ((n = write(sockfd, line + nwrite, nread)) < 0) {
              fprintf(stderr, "write fd=%d error: %s\n", sockfd,
                      strerror(errno));
              break;
            }
            nwrite += n;
          }
        }
      }
    }
  }
}

Go与epoll

核心逻辑

每当Accept接收到新的连接时候，会将该socket的fd注册到epoll中，并将pollDesc放到epoll event的data域中。

func netpollopen(fd uintptr, pd *pollDesc) int32 {
	var ev epollevent
	ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
	*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd // epollevet data字段指向pollDesc对象
	return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
}

若goroutine读写fd阻塞时候，会将fd对应的pollDesc的rg或wg字段指向这个g，并当前G挂起：

func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
    gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}
	...
	if waitio || netpollcheckerr(pd, mode) == 0 {
		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)
	}
	...
	return old == pdReady
}

func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
	if reason != waitReasonSleep {
		checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
	}
	mp := acquirem()
	gp := mp.curg
	status := readgstatus(gp)
	if status != _Grunning && status != _Gscanrunning {
		throw("gopark: bad g status")
	}
	mp.waitlock = lock
	mp.waitunlockf = unlockf
	gp.waitreason = reason
	mp.waittraceev = traceEv
	mp.waittraceskip = traceskip
	releasem(mp)
	// can't do anything that might move the G between Ms here.
	mcall(park_m)
}

func park_m(gp *g) {
	_g_ := getg()
	casgstatus(gp, _Grunning, _Gwaiting)
	dropg()

	if fn := _g_.m.waitunlockf; fn != nil {
		ok := fn(gp, _g_.m.waitlock)
		_g_.m.waitunlockf = nil
		_g_.m.waitlock = nil
		if !ok {
			casgstatus(gp, _Gwaiting, _Grunnable)
			execute(gp, true) // Schedule it back, never returns.
		}
	}
	schedule()
}

func dropg() {
	_g_ := getg()

	setMNoWB(&_g_.m.curg.m, nil)
	setGNoWB(&_g_.m.curg, nil)
}

func netpollblockcommit(gp *g, gpp unsafe.Pointer) bool {
	r := atomic.Casuintptr((*uintptr)(gpp), pdWait, uintptr(unsafe.Pointer(gp)))
	if r {
		atomic.Xadd(&netpollWaiters, 1)
	}
	return r
}

netpollblock函数调用gopark将goroutine挂起，gopark里面会调用park_m函数，park_m函数首先会dropg()处理，解除当前goroutine与M的绑定，然后执行netpollblockcommit回调函数，而在netpollblockcommit中会将epoll event中data域指向当前goroutine，这样读取epoll_wait返回的epoll event的data域就可以找到阻塞goroutine，然后将它唤醒。

调用器的findrunnable函数和系统监控sysmon会调用netpoll函数，获取就绪的socket，并得到阻塞此socket的goroutine。

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from local or global queue, poll network.
func findrunnable() (gp *g, inheritTime bool) {
	_g_ := getg()
	...
	if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
		// 使用netpoll获取已准备就绪的socket
		if list := netpoll(0); !list.empty() { // non-blocking
			gp := list.pop()
			injectglist(&list)
			casgstatus(gp, _Gwaiting, _Grunnable)
			if trace.enabled {
				traceGoUnpark(gp, 0)
			}
			return gp, false
		}
	}
	...
}


func sysmon() {
	...
	for {
		...
		// poll network if not polled for more than 10ms
        // 每隔10ms进行netpoll
		lastpoll := int64(atomic.Load64(&sched.lastpoll))
		if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
			atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
			list := netpoll(0) // non-blocking - returns list of goroutines
			if !list.empty() {
				incidlelocked(-1)
				injectglist(&list)
				incidlelocked(1)
			}
		}
		...
	}
}

netpoll实现:

func netpoll(delay int64) gList {
	if epfd == -1 {
		return gList{}
	}
	var waitms int32
	if delay < 0 {
		waitms = -1
	} else if delay == 0 {
		waitms = 0
	} else if delay < 1e6 {
		waitms = 1
	} else if delay < 1e15 {
		waitms = int32(delay / 1e6)
	} else {
		// An arbitrary cap on how long to wait for a timer.
		// 1e9 ms == ~11.5 days.
		waitms = 1e9
	}
	var events [128]epollevent
retry:
	n := epollwait(epfd, &events[0], int32(len(events)), waitms) // epoll_wait系统调用
	if n < 0 {
		if n != -_EINTR {
			println("runtime: epollwait on fd", epfd, "failed with", -n)
			throw("runtime: netpoll failed")
		}
		// If a timed sleep was interrupted, just return to
		// recalculate how long we should sleep now.
		if waitms > 0 {
			return gList{}
		}
		goto retry
	}
	var toRun gList
	for i := int32(0); i < n; i++ {
		ev := &events[i]
		if ev.events == 0 {
			continue
		}

		if *(**uintptr)(unsafe.Pointer(&ev.data)) == &netpollBreakRd {
			if ev.events != _EPOLLIN {
				println("runtime: netpoll: break fd ready for", ev.events)
				throw("runtime: netpoll: break fd ready for something unexpected")
			}
			if delay != 0 {
				// netpollBreak could be picked up by a
				// nonblocking poll. Only read the byte
				// if blocking.
				var tmp [16]byte
				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
			}
			continue
		}

		var mode int32
		if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {
			mode += 'r'
		}
		if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {
			mode += 'w'
		}
		if mode != 0 {
			pd := *(**pollDesc)(unsafe.Pointer(&ev.data))
			pd.everr = false
			if ev.events == _EPOLLERR {
				pd.everr = true
			}
			netpollready(&toRun, pd, mode) // 唤醒G
		}
	}
	return toRun
}

完整流程

服务端代码示例

//TcpServer.go
package main

import (
	"fmt"
	"net"
)

func main() {
	ln, err := net.Listen("tcp", ":8080")
	if err != nil {
		panic(err)
	}
	for {
		conn, err := ln.Accept()
		if err != nil {
			panic(err)
		}
		// 每个Client一个Goroutine
		go handleConnection(conn)
	}
}

func handleConnection(conn net.Conn) {
	defer conn.Close()
	var body [4]byte
	addr := conn.RemoteAddr()
	for {
		// 读取客户端消息
		_, err := conn.Read(body[:])
		if err != nil {
			break
		}
		fmt.Printf("收到%s消息: %s\n", addr, string(body[:]))
		// 回包
		_, err = conn.Write(body[:])
		if err != nil {
			break
		}
		fmt.Printf("发送给%s: %s\n", addr, string(body[:]))
	}
	fmt.Printf("与%s断开!\n", addr)
}

socket创建、bind、listen，以及epoll创建

net.Listen:

func Listen(network, address string) (Listener, error) {
	var lc ListenConfig
	return lc.Listen(context.Background(), network, address)
}

ListenConfig.Listen:

type ListenConfig struct {
	// If Control is not nil, it is called after creating the network
	// connection but before binding it to the operating system.
	//
	// Network and address parameters passed to Control method are not
	// necessarily the ones passed to Listen. For example, passing "tcp" to
	// Listen will cause the Control function to be called with "tcp4" or "tcp6".
	Control func(network, address string, c syscall.RawConn) error

	// KeepAlive specifies the keep-alive period for network
	// connections accepted by this listener.
	// If zero, keep-alives are enabled if supported by the protocol
	// and operating system. Network protocols or operating systems
	// that do not support keep-alives ignore this field.
	// If negative, keep-alives are disabled.
	KeepAlive time.Duration
}

func (lc *ListenConfig) Listen(ctx context.Context, network, address string) (Listener, error) {
	addrs, err := DefaultResolver.resolveAddrList(ctx, "listen", network, address, nil)
	if err != nil {
		return nil, &OpError{Op: "listen", Net: network, Source: nil, Addr: nil, Err: err}
	}
	sl := &sysListener{
		ListenConfig: *lc,
		network:      network,
		address:      address,
	}
	var l Listener
	la := addrs.first(isIPv4)
	switch la := la.(type) {
	case *TCPAddr:
		l, err = sl.listenTCP(ctx, la) // 调用sysListener.listenTCP
	case *UnixAddr:
		l, err = sl.listenUnix(ctx, la)
	default:
		return nil, &OpError{Op: "listen", Net: sl.network, Source: nil, Addr: la, Err: &AddrError{Err: "unexpected address type", Addr: address}}
	}
	if err != nil {
		return nil, &OpError{Op: "listen", Net: sl.network, Source: nil, Addr: la, Err: err} // l is non-nil interface containing nil pointer
	}
	return l, nil
}

sysListener.listenTCP:

type sysListener struct {
	ListenConfig
	network, address string
}

func (sl *sysListener) listenTCP(ctx context.Context, laddr *TCPAddr) (*TCPListener, error) {
	fd, err := internetSocket(ctx, sl.network, laddr, nil, syscall.SOCK_STREAM, 0, "listen", sl.ListenConfig.Control) // raddr参数为nil
    // internetSocket支持服务端和客户端socket创建，服务端依赖laddr，客户端依赖raddr
	if err != nil {
		return nil, err
	}
	return &TCPListener{fd: fd, lc: sl.ListenConfig}, nil
}

internetSocket用来创建socket:

func internetSocket(ctx context.Context, net string, laddr, raddr sockaddr, sotype, proto int, mode string, ctrlFn func(string, string, syscall.RawConn) error) (fd *netFD, err error) {
	if (runtime.GOOS == "aix" || runtime.GOOS == "windows" || runtime.GOOS == "openbsd") && mode == "dial" && raddr.isWildcard() {
		raddr = raddr.toLocal(net)
	}
	family, ipv6only := favoriteAddrFamily(net, laddr, raddr, mode)
	return socket(ctx, net, family, sotype, proto, ipv6only, laddr, raddr, ctrlFn)
}

func socket(ctx context.Context, net string, family, sotype, proto int, ipv6only bool, laddr, raddr sockaddr, ctrlFn func(string, string, syscall.RawConn) error) (fd *netFD, err error) {
	s, err := sysSocket(family, sotype, proto) // 调用系统socket，返回socket fd
	if err != nil {
		return nil, err
	}
	if err = setDefaultSockopts(s, family, sotype, ipv6only); err != nil {
		poll.CloseFunc(s)
		return nil, err
	}
    // newFD()返回netFD
    // fd是socket fd的上层包装
	if fd, err = newFD(s, family, sotype, net); err != nil {
		poll.CloseFunc(s)
		return nil, err
	}

	if laddr != nil && raddr == nil { // 服务端连接
		switch sotype {
		case syscall.SOCK_STREAM, syscall.SOCK_SEQPACKET: // tcp
            // listenerBacklog()返回全连接队列的backlog
			if err := fd.listenStream(laddr, listenerBacklog(), ctrlFn); err != nil {
				fd.Close()
				return nil, err
			}
			return fd, nil
		case syscall.SOCK_DGRAM: // udp
			if err := fd.listenDatagram(laddr, ctrlFn); err != nil {
				fd.Close()
				return nil, err
			}
			return fd, nil
		}
	}
	if err := fd.dial(ctx, laddr, raddr, ctrlFn); err != nil {
		fd.Close()
		return nil, err
	}
	return fd, nil
}

socket的创建：

sysSocket用来创建socket

var (
	testHookDialChannel  = func() {} // for golang.org/issue/5349
	testHookCanceledDial = func() {} // for golang.org/issue/16523

	// Placeholders for socket system calls.
	socketFunc        func(int, int, int) (int, error)  = syscall.Socket
	connectFunc       func(int, syscall.Sockaddr) error = syscall.Connect
	listenFunc        func(int, int) error              = syscall.Listen
	getsockoptIntFunc func(int, int, int) (int, error)  = syscall.GetsockoptInt
)

func sysSocket(family, sotype, proto int) (int, error) {
    // socketFunc是syscall.Socket系统调用
	s, err := socketFunc(family, sotype|syscall.SOCK_NONBLOCK|syscall.SOCK_CLOEXEC, proto) // 创建非阻塞的socket
	// On Linux the SOCK_NONBLOCK and SOCK_CLOEXEC flags were
	// introduced in 2.6.27 kernel and on FreeBSD both flags were
	// introduced in 10 kernel. If we get an EINVAL error on Linux
	// or EPROTONOSUPPORT error on FreeBSD, fall back to using
	// socket without them.
	switch err {
	case nil:
		return s, nil
	default:
		return -1, os.NewSyscallError("socket", err)
	case syscall.EPROTONOSUPPORT, syscall.EINVAL:
	}

	// See ../syscall/exec_unix.go for description of ForkLock.
	syscall.ForkLock.RLock()
	s, err = socketFunc(family, sotype, proto)
	if err == nil {
		syscall.CloseOnExec(s)
	}
	syscall.ForkLock.RUnlock()
	if err != nil {
		return -1, os.NewSyscallError("socket", err)
	}
	if err = syscall.SetNonblock(s, true); err != nil {
		poll.CloseFunc(s)
		return -1, os.NewSyscallError("setnonblock", err)
	}
	return s, nil
}

newFD函数定义如下：

// Network file descriptor.
type netFD struct {
	pfd poll.FD // 类型为internal/poll/FD

	// immutable until Close
	family      int
	sotype      int
	isConnected bool // handshake completed or use of association with peer
	net         string
	laddr       Addr
	raddr       Addr
}

// internal/poll
type FD struct {
	// Lock sysfd and serialize access to Read and Write methods.
	fdmu fdMutex

	// System file descriptor. Immutable until Close.
	Sysfd int

	// I/O poller.
	pd pollDesc // 类型为internal/poll.pollDesc

	// Writev cache.
	iovecs *[]syscall.Iovec

	// Semaphore signaled when file is closed.
	csema uint32

	// Non-zero if this file has been set to blocking mode.
	isBlocking uint32

	// Whether this is a streaming descriptor, as opposed to a
	// packet-based descriptor like a UDP socket. Immutable.
	IsStream bool

	// Whether a zero byte read indicates EOF. This is false for a
	// message based socket connection.
	ZeroReadIsEOF bool

	// Whether this is a file rather than a network socket.
	isFile bool
}

// internal/poll
type pollDesc struct {
	runtimeCtx uintptr // 指向runtime.pollDesc
}

func newFD(sysfd, family, sotype int, net string) (*netFD, error) {
	ret := &netFD{
		pfd: poll.FD{
			Sysfd:         sysfd,
			IsStream:      sotype == syscall.SOCK_STREAM,
			ZeroReadIsEOF: sotype != syscall.SOCK_DGRAM && sotype != syscall.SOCK_RAW,
		},
		family: family,
		sotype: sotype,
		net:    net,
	}
	return ret, nil
}

netFD.listenStream定义如下：

func (fd *netFD) listenStream(laddr sockaddr, backlog int, ctrlFn func(string, string, syscall.RawConn) error) error {
	var err error
	if err = setDefaultListenerSockopts(fd.pfd.Sysfd); err != nil {
		return err
	}
	var lsa syscall.Sockaddr
	if lsa, err = laddr.sockaddr(fd.family); err != nil {
		return err
	}
	if ctrlFn != nil {
		c, err := newRawConn(fd)
		if err != nil {
			return err
		}
		if err := ctrlFn(fd.ctrlNetwork(), laddr.String(), c); err != nil {
			return err
		}
	}
	if err = syscall.Bind(fd.pfd.Sysfd, lsa); err != nil { // bind操作
		return os.NewSyscallError("bind", err)
	}
    // listenFunc  func(int, int) error  = syscall.Listen
    // listenFunc是syscall.Listen
	if err = listenFunc(fd.pfd.Sysfd, backlog); err != nil { // listen操作
		return os.NewSyscallError("listen", err)
	}
	if err = fd.init(); err != nil { // fd初始化操作
		return err
	}
	lsa, _ = syscall.Getsockname(fd.pfd.Sysfd)
	fd.setAddr(fd.addrFunc()(lsa), nil)
	return nil
}

func setDefaultListenerSockopts(s int) error {
	// Allow reuse of recently-used addresses.
	return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1))
}

netFD的初始化操作：

func (fd *netFD) init() error {
	return fd.pfd.Init(fd.net, true) // 最终是poll.FD的初始化工作
}

// internal/poll.FD的初始化工作
func (fd *FD) Init(net string, pollable bool) error {
	// We don't actually care about the various network types.
	if net == "file" {
		fd.isFile = true
	}
	if !pollable {
		fd.isBlocking = 1
		return nil
	}
	err := fd.pd.init(fd) // 最终是poll.FD.pollDesc的初始化工作
	if err != nil {
		// If we could not initialize the runtime poller,
		// assume we are using blocking mode.
		fd.isBlocking = 1
	}
	return err
}

// internal/poll.pollDesc的初始化工作
func (pd *pollDesc) init(fd *FD) error {
	serverInit.Do(runtime_pollServerInit) // 调用epoll_create，完成epoll创建
	ctx, errno := runtime_pollOpen(uintptr(fd.Sysfd)) // 将socket fd添加到epoll中
	if errno != 0 {
		if ctx != 0 {
			runtime_pollUnblock(ctx)
			runtime_pollClose(ctx)
		}
		return errnoErr(syscall.Errno(errno))
	}
	pd.runtimeCtx = ctx
	return nil
}

internal/poll.pollDesc完成初始化工作有:

epoll创建
将socket fd加入epoll中

epoll的创建：

var serverInit sync.Once
func (pd *pollDesc) init(fd *FD) error {
	serverInit.Do(runtime_pollServerInit)
    ...
}

// runtime_pollServerInit最终实现是runtime.poll_runtime_pollServerInit
//go:linkname poll_runtime_pollServerInit internal/poll.runtime_pollServerInit
func poll_runtime_pollServerInit() {
	netpollGenericInit()
}

func netpollGenericInit() {
	if atomic.Load(&netpollInited) == 0 {
		lock(&netpollInitLock)
		if netpollInited == 0 {
			netpollinit()
			atomic.Store(&netpollInited, 1)
		}
		unlock(&netpollInitLock)
	}
}

var (
	epfd int32 = -1 // epoll descriptor

	netpollBreakRd, netpollBreakWr uintptr // for netpollBreak
)


func netpollinit() {
	epfd = epollcreate1(_EPOLL_CLOEXEC) // 调用系统调用epoll_create1
	if epfd < 0 {
		epfd = epollcreate(1024)
		if epfd < 0 {
			println("runtime: epollcreate failed with", -epfd)
			throw("runtime: netpollinit failed")
		}
		closeonexec(epfd)
	}
	r, w, errno := nonblockingPipe()
	if errno != 0 {
		println("runtime: pipe failed with", -errno)
		throw("runtime: pipe failed")
	}
	ev := epollevent{
		events: _EPOLLIN,
	}
	*(**uintptr)(unsafe.Pointer(&ev.data)) = &netpollBreakRd
	errno = epollctl(epfd, _EPOLL_CTL_ADD, r, &ev)
	if errno != 0 {
		println("runtime: epollctl failed with", -errno)
		throw("runtime: epollctl failed")
	}
	netpollBreakRd = uintptr(r)
	netpollBreakWr = uintptr(w)
}

将socket fd加入epoll中：

将socket fd加入epoll中，是runtime_pollOpen完成。runtime_pollOpen最终实现是poll_runtime_pollOpen。

//go:linkname poll_runtime_pollOpen internal/poll.runtime_pollOpen
func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
	pd := pollcache.alloc()
	lock(&pd.lock)
	if pd.wg != 0 && pd.wg != pdReady {
		throw("runtime: blocked write on free polldesc")
	}
	if pd.rg != 0 && pd.rg != pdReady {
		throw("runtime: blocked read on free polldesc")
	}
	pd.fd = fd // socket fd
	pd.closing = false
	pd.everr = false
	pd.rseq++
	pd.rg = 0
	pd.rd = 0
	pd.wseq++
	pd.wg = 0
	pd.wd = 0
	unlock(&pd.lock)

	var errno int32
	errno = netpollopen(fd, pd)
	return pd, int(errno)
}

func netpollopen(fd uintptr, pd *pollDesc) int32 {
	var ev epollevent
	ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
	*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd
	return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
}

最后我们看下listenerBacklog的实现：

var listenerBacklogCache struct {
	sync.Once
	val int
}

// listenerBacklog is a caching wrapper around maxListenerBacklog.
func listenerBacklog() int {
	listenerBacklogCache.Do(func() { listenerBacklogCache.val = maxListenerBacklog() })
	return listenerBacklogCache.val
}

func maxListenerBacklog() int {
	fd, err := open("/proc/sys/net/core/somaxconn")
	if err != nil {
		return syscall.SOMAXCONN
	}
	defer fd.close()
	l, ok := fd.readLine()
	if !ok {
		return syscall.SOMAXCONN
	}
	f := getFields(l)
	n, _, ok := dtoi(f[0])
	if n == 0 || !ok {
		return syscall.SOMAXCONN
	}
	// Linux stores the backlog in a uint16.
	// Truncate number to avoid wrapping.
	// See issue 5030.
	if n > 1<<16-1 {
		n = 1<<16 - 1
	}
	return n
}

accept操作

accept操作是由TCPListener提供。

type TCPListener struct {
	fd *netFD
	lc ListenConfig
}

func (l *TCPListener) Accept() (Conn, error) {
	if !l.ok() {
		return nil, syscall.EINVAL
	}
	c, err := l.accept()
	if err != nil {
		return nil, &OpError{Op: "accept", Net: l.fd.net, Source: nil, Addr: l.fd.laddr, Err: err}
	}
	return c, nil
}

func (ln *TCPListener) ok() bool { return ln != nil && ln.fd != nil }

func (ln *TCPListener) accept() (*TCPConn, error) {
	fd, err := ln.fd.accept()
	if err != nil {
		return nil, err
	}
	tc := newTCPConn(fd)
	if ln.lc.KeepAlive >= 0 {
		setKeepAlive(fd, true)
		ka := ln.lc.KeepAlive
		if ln.lc.KeepAlive == 0 {
			ka = defaultTCPKeepAlive // keepalive默认15min
		}
		setKeepAlivePeriod(fd, ka)
	}
	return tc, nil
}

TCPListener.Accept最终调用netFD.accept:

func (fd *netFD) accept() (netfd *netFD, err error) {
	d, rsa, errcall, err := fd.pfd.Accept() // d指向的新进来的socket fd
	if err != nil {
		if errcall != "" {
			err = wrapSyscallError(errcall, err)
		}
		return nil, err
	}
	if netfd, err = newFD(d, fd.family, fd.sotype, fd.net); err != nil {
		poll.CloseFunc(d)
		return nil, err
	}
	if err = netfd.init(); err != nil { // netfd.init()逻辑上面已分析过了。
    // 这里是将进来的连接的fd加入到epoll中去
		netfd.Close()
		return nil, err
	}
	lsa, _ := syscall.Getsockname(netfd.pfd.Sysfd)
	netfd.setAddr(netfd.addrFunc()(lsa), netfd.addrFunc()(rsa))
	return netfd, nil
}

netFD.Accept最终由internal/poll.FD

func (fd *FD) Accept() (int, syscall.Sockaddr, string, error) {
	if err := fd.readLock(); err != nil {
		return -1, nil, "", err
	}
	defer fd.readUnlock()

	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return -1, nil, "", err
	}
	for {
		// 由于fd.Sysfd是非阻塞的，那么accept也是非阻塞的，程序写的时候，要不断处理返回EAGAIN的错误情况
		s, rsa, errcall, err := accept(fd.Sysfd) // 调用系统调用accept，若有数据直接返回，
        // 若没有数据则判断错误类型，若是非阻塞情况返回的EAGIN，说明只是没有数据而已，则使用internal/poll.FD.waitRead等待
		if err == nil {
			return s, rsa, "", err
		}
		switch err {
		case syscall.EAGAIN:
			if fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil { // internal/poll.FD.waitRead阻塞等待
					continue
				}
			}
		case syscall.ECONNABORTED:
			// This means that a socket on the listen
			// queue was closed before we Accept()ed it;
			// it's a silly error, so try again.
			continue
		}
		return -1, nil, errcall, err
	}
}

internal/poll.FD.waitRead的实现：

func (pd *pollDesc) waitRead(isFile bool) error {
	return pd.wait('r', isFile)
}

func (pd *pollDesc) wait(mode int, isFile bool) error {
	if pd.runtimeCtx == 0 {
		return errors.New("waiting for unsupported file type")
	}
	res := runtime_pollWait(pd.runtimeCtx, mode)
	return convertErr(res, isFile)
}

runtime_pollWait最终是由runtime.poll_runtime_pollWait实现。

//go:linkname poll_runtime_pollWait internal/poll.runtime_pollWait
func poll_runtime_pollWait(pd *pollDesc, mode int) int {
	err := netpollcheckerr(pd, int32(mode))
	if err != 0 {
		return err
	}
	// As for now only Solaris, illumos, and AIX use level-triggered IO.
	if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" {
		netpollarm(pd, mode)
	}
	for !netpollblock(pd, int32(mode), false) { // netpollblock是核心
		err = netpollcheckerr(pd, int32(mode))
		if err != 0 {
			return err
		}
		// Can happen if timeout has fired and unblocked us,
		// but before we had a chance to run, timeout has been reset.
		// Pretend it has not happened and retry.
	}
	return 0
}

我们来看下netpollblock的实现：

// 如果socket io是可读可写进来了，那么返回true，否则返回false
func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
	gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}

	// set the gpp semaphore to WAIT
	for {
		old := *gpp
		if old == pdReady {
			*gpp = 0
			return true
		}
		if old != 0 {
			throw("runtime: double wait")
		}
		if atomic.Casuintptr(gpp, 0, pdWait) {
			break
		}
	}

	// need to recheck error states after setting gpp to WAIT
	// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
	// do the opposite: store to closing/rd/wd, membarrier, load of rg/wg
	if waitio || netpollcheckerr(pd, mode) == 0 {
		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5) 
	}
	// be careful to not lose concurrent READY notification
	old := atomic.Xchguintptr(gpp, 0)
	if old > pdWait {
		throw("runtime: corrupted polldesc")
	}
	return old == pdReady
}

从上面可以看到Go中Accept的阻塞不同linux的Accept,Go中通过休眠G实现阻塞的，而linux是通过Accept调用进行阻塞的。

Read操作

Read操作是由TCPConn提供：

type TCPConn struct {
	conn
}

type conn struct {
	fd *netFD
}

func newTCPConn(fd *netFD) *TCPConn {
	c := &TCPConn{conn{fd}}
	setNoDelay(c.fd, true)
	return c
}

func (c *conn) ok() bool { return c != nil && c.fd != nil }

// Read implements the Conn Read method.
func (c *conn) Read(b []byte) (int, error) {
	if !c.ok() {
		return 0, syscall.EINVAL
	}
	n, err := c.fd.Read(b)
	if err != nil && err != io.EOF {
		err = &OpError{Op: "read", Net: c.fd.net, Source: c.fd.laddr, Addr: c.fd.raddr, Err: err}
	}
	return n, err
}

TCPConn的Read操作最终是调用netFD.Read:

func (fd *netFD) Read(p []byte) (n int, err error) {
	n, err = fd.pfd.Read(p)
	runtime.KeepAlive(fd)
	return n, wrapSyscallError("read", err)
}

而netFD.Read是由internal/poll.FD实现：

func (fd *FD) Read(p []byte) (int, error) {
	if err := fd.readLock(); err != nil {
		return 0, err
	}
	defer fd.readUnlock()
	if len(p) == 0 {
		// If the caller wanted a zero byte read, return immediately
		// without trying (but after acquiring the readLock).
		// Otherwise syscall.Read returns 0, nil which looks like
		// io.EOF.
		// TODO(bradfitz): make it wait for readability? (Issue 15735)
		return 0, nil
	}
	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return 0, err
	}
	if fd.IsStream && len(p) > maxRW {
		p = p[:maxRW]
	}
	for {
		n, err := syscall.Read(fd.Sysfd, p) // 调用系统调用Read，尝试读数据，读到则返回，由于fd.Sysfd是非阻塞的，所以syscall.Read也是非阻塞的
		if err != nil {
			n = 0
			if err == syscall.EAGAIN && fd.pd.pollable() { // 没有数据可读，那么等待可读
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					continue
				}
			}

			// On MacOS we can see EINTR here if the user
			// pressed ^Z.  See issue #22838.
			if runtime.GOOS == "darwin" && err == syscall.EINTR {
				continue
			}
		}
		err = fd.eofError(n, err)
		return n, err
	}
}

Read操作和Accept类似，最后都是调用fd.pd.waitRead来处理没有数据的情况。waitRead最后是调用gopark休眠goroutine，用户层感受到的Read阻塞住是因为通过休眠G实现的，而并没有阻塞到某个系统调用里，因此对于系统来说实现了非阻塞IO。

Write操作

同Read类似，略。

keepalive

keepalive是一种探活机制，一般用于服务端探测客户端存活状态，当客户端异常时候，主动下掉服务端维持的长连接，减少资源消耗。默认情况由内核以下几个参数控制：

# sudo sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200

tcp_keepalive_time

连接处于空闲状态下最长多长时间会发送keepalive包。默认2小时。当对方回复如下三种情况下，会采取不同操作。
- 对方回复ACK
  
  说明对方处于存活状态，不做任何处理。等待tcp_keepalive_time再发送keepalive包
- 对方回复RST
  
  说明对方重启或下线，则服务端会关闭此连接
- 对方没有任何回复
  
  那么会重试 tcp_keepalive_probes次，每次间隔tcp_keepalive_intvl秒，若最后还是不可达，那么向应用程序返回ETIMEOUT或EHOST错误
tcp_keepalive_intvl

keepalive报文包重试间隔
tcp_keepalive_probes

keepalive报文包最大重试次数

关于keepalive包，一般有两种情况：

发送一个空包
发送一个字节大小的包，包内容为空字符，对应就是ascii码表中0x00对应的那个符号。

这里面有个技巧。假定客户端上一次发送的接收ack是101，那么服务端发送keepalive的seq是100，即客户端ack减少1。客户端接收到这个包，发现序号100的报文之前已经接收过了，那么就会丢弃这个空字符，也就不正常报文造成干扰了，接着响应ack。

需要注意的是内核参数并不能决定应用程序是否开启keepalive机制。这需要应用程序开启。我们可以针对每个fd设置keepalive参数，下面是相关系统调用：

setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val)) // 开启keepalive
setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val)) // 设置tcp_keepalive_time
setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val)) // 设置tcp_keepalive_intvl
setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val)) // 设置tcp_keepalive_probes

Go中的keepalive

Go语言中每当一个新连接accept进来时候，默认会开启keepalive机制:

func (ln *TCPListener) accept() (*TCPConn, error) {
	fd, err := ln.fd.accept()
	if err != nil {
		return nil, err
	}
	tc := newTCPConn(fd)
	if ln.lc.KeepAlive >= 0 {
		setKeepAlive(fd, true) // 开启keepalive
		ka := ln.lc.KeepAlive
		if ln.lc.KeepAlive == 0 {
			ka = defaultTCPKeepAlive // tcp_keepalive_time默认15min
		}
		setKeepAlivePeriod(fd, ka) // 设置tcp_keepalive_time
	}
	return tc, nil
}

setKeepAlive和setKeepAlivePeriod函数操作的对象是netFD对象：

func setKeepAlive(fd *netFD, keepalive bool) error {
	err := fd.pfd.SetsockoptInt(syscall.SOL_SOCKET, syscall.SO_KEEPALIVE, boolint(keepalive))
	runtime.KeepAlive(fd)
	return wrapSyscallError("setsockopt", err)
}

// setKeepAlivePeriod函数中没有设置tcp_keepalive_probes，
// 但将tcp_keepalive_intvl和tcp_keepalive_time设置为相同值，这相当于设置tcp_keepalive_probes=net.ipv4.tcp_keepalive_probes+1。
func setKeepAlivePeriod(fd *netFD, d time.Duration) error {
	// The kernel expects seconds so round to next highest second.
	secs := int(roundDurationUp(d, time.Second))
	if err := fd.pfd.SetsockoptInt(syscall.IPPROTO_TCP, syscall.TCP_KEEPINTVL, secs); err != nil { // 设置tcp_keepalive_intvl
		return wrapSyscallError("setsockopt", err)
	}
	err := fd.pfd.SetsockoptInt(syscall.IPPROTO_TCP, syscall.TCP_KEEPIDLE, secs) // 设置tcp_keepalive_time
	runtime.KeepAlive(fd)
	return wrapSyscallError("setsockopt", err)
}

func (fd *FD) SetsockoptInt(level, name, arg int) error {
	if err := fd.incref(); err != nil {
		return err
	}
	defer fd.decref()
	return syscall.SetsockoptInt(fd.Sysfd, level, name, arg)
}

系统调用包syscall中SetsockoptInt的实现如下：

func SetsockoptInt(fd, level, opt int, value int) (err error) {
	var n = int32(value)
	return setsockopt(fd, level, opt, unsafe.Pointer(&n), 4)
}

func setsockopt(s int, level int, name int, val unsafe.Pointer, vallen uintptr) (err error) {
	_, _, e1 := Syscall6(SYS_SETSOCKOPT, uintptr(s), uintptr(level), uintptr(name), uintptr(val), uintptr(vallen), 0)
	if e1 != 0 {
		err = errnoErr(e1)
	}
	return
}

Syscall6是通过汇编实现系统调用的。系统调用约定参见使用专有系统调用指令:

// func Syscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr)
TEXT ·Syscall6(SB),NOSPLIT,$0-80
	CALL	runtime·entersyscall(SB)
	MOVQ	a1+8(FP), DI
	MOVQ	a2+16(FP), SI
	MOVQ	a3+24(FP), DX
	MOVQ	a4+32(FP), R10
	MOVQ	a5+40(FP), R8
	MOVQ	a6+48(FP), R9
	MOVQ	trap+0(FP), AX	// syscall entry
	SYSCALL
	CMPQ	AX, $0xfffffffffffff001
	JLS	ok6
	MOVQ	$-1, r1+56(FP)
	MOVQ	$0, r2+64(FP)
	NEGQ	AX
	MOVQ	AX, err+72(FP)
	CALL	runtime·exitsyscall(SB)
	RET
ok6:
	MOVQ	AX, r1+56(FP)
	MOVQ	DX, r2+64(FP)
	MOVQ	$0, err+72(FP)
	CALL	runtime·exitsyscall(SB)
	RET

参考资料

从源码角度看Golang的TCP Socket(epoll)实现
从源码角度看Golang的调度

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epoll.md

epoll.md

epoll

epoll接口的三个函数

创建epoll句柄

注册epoll事件

epoll等待

两种操作模式

ET模式下的EPOLLOUT事件的处理

epoll 优缺点

epoll服务端代码示例

Go与epoll

核心逻辑

完整流程

服务端代码示例

socket创建、bind、listen，以及epoll创建

accept操作

Read操作

Write操作

keepalive

Go中的keepalive

参考资料

Files

epoll.md

Latest commit

History

epoll.md

File metadata and controls

epoll

epoll接口的三个函数

创建epoll句柄

注册epoll事件

epoll等待

两种操作模式

ET模式下的EPOLLOUT事件的处理

epoll 优缺点

epoll服务端代码示例

Go与epoll

核心逻辑

完整流程

服务端代码示例

socket创建、bind、listen，以及epoll创建

accept操作

Read操作

Write操作

keepalive

Go中的keepalive

参考资料