Skip to content

Latest commit

 

History

History
1472 lines (1258 loc) · 43 KB

epoll.md

File metadata and controls

1472 lines (1258 loc) · 43 KB

epoll

epoll是linux中IO多路复用的一种机制,I/O多路复用就是通过一种机制,一个进程可以监视多个描述符,一旦某个描述符就绪(一般是读就绪或者写就绪),能够通知程序进行相应的读写操作。当然linux中IO多路复用不仅仅是epoll,其他多路复用机制还有select、poll。

epoll接口的三个函数

创建epoll句柄

int epoll_create(int size);

epoll_create用来创建一个epoll的句柄,size用来告诉内核这个监听的数目一共有多大(Linux 2.6.8开始,忽略此参数)。当创建好epoll句柄后,会占用一个fd值,在linux下可查看/proc/进程id/fd/目录找到,使用完epoll后,必须调用close()关闭,否则可能导致fd被耗尽。

注册epoll事件

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

epoll_ctl向epoll对象中添加、修改或者删除感兴趣的事件,返回0表示成功,否则返回–1,此时需要根据errno错误码判断错误类型。

第一个参数epfd是epoll_create返回的epoll的句柄。

第二个参数op表示动作,用三个宏来表示:

  • EPOLL_CTL_ADD:注册新的fd到epfd中;
  • EPOLL_CTL_MOD:修改已经注册的fd的监听事件;
  • EPOLL_CTL_DEL:从epfd中删除一个fd;

第三个参数是需要监听的fd

第四个参数是告诉内核需要监听什么事,epoll_event定义如下:

typedef union epoll_data {
    void *ptr;
    int fd;
    __uint32_t u32;
    __uint64_t u64;
} epoll_data_t;

struct epoll_event {
    __uint32_t events; /* Epoll events */
    epoll_data_t data; /* User data variable */
};

其中epoll_event.events是以下几个宏的集合:

  • EPOLLIN 表示对应的文件描述符可以读(包括对端SOCKET正常关闭)
  • EPOLLOUT 表示对应的文件描述符可以写
  • EPOLLPRI 表示对应的文件描述符有紧急的数据可读(这里应该表示有带外数据到来)
  • EPOLLERR 表示对应的文件描述符发生错误
  • EPOLLHUP 表示对应的文件描述符被挂断
  • EPOLLET 将EPOLL设为边缘触发(Edge Triggered)模式,这是相对于水平触发(Level Triggered)来说的
  • EPOLLONESHOT 只监听一次事件,当监听完这次事件之后,如果还需要继续监听这个socket的话,需要再次把这个socket加入到EPOLL队列里

epoll_event.data.ptr是void *指针,用来传递用户自定义参数,当epoll_wait返回时候,epoll_event.data.ptr也会原值返回。Go语言在实现netpoll时候,就是基于这个,会将epoll_event.data.ptr指向fd相关的G,那么当epoll_wait返回时候,根据此唤醒挂起的G来进行读写fd.

epoll等待

int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout);

epoll_wait用于等待事件的产生。epoll_wait一共有4个参数:

  • epfd是 epoll的描述符
  • events则是分配好的 epoll_event结构体数组,epoll将会把发生的事件复制到 events数组中
  • maxevents表示本次可以返回的最大事件数目,maxevents参数与预分配的events数组的大小是相等的
  • timeout表示在没有检测到事件发生时最多等待的时间(单位为毫秒),如果timeout为0,则表示 epoll_wait在 rdllist链表中为空,立刻返回,不会等待,-1表示永久等待,直到有事件发生返回。

两种操作模式

epoll对文件描述符有两种操作模式

  • LT(Level Trigger水平模式)

    LT是epoll的默认操作模式,当epoll_wait函数检测到有事件发生并将通知应用程序,而应用程序不一定必须立即进行处理,这样epoll_wait函数再次检测到此事件的时候还会通知应用程序,直到事件被处理。LT支持阻塞的套接字和非阻塞的套接字。

  • ET(Edge Trigger边缘模式)

    ET模式下,只要epoll_wait函数检测到事件发生,通知应用程序立即进行处理,后续的epoll_wait函数将不再检测此事件。因此ET模式在很大程度上降低了同一个事件被epoll触发的次数,因此效率比LT模式高。ET只支持非阻塞的套接字

ET是状态变化的通知,即从没有数据转到有数据会通知,LT是数据变化的通知,即有数据就通知,没数据就不通知。对于ET模式,当接收到通知后,应该一直read循环读取,直到返回EWOULDBLOCK或EAGAIN,这样内部状态才会从有数据再次转为无数据,从而为下一次数据的到来做准备,否则只有对端再次发送数据时候,才会再次触发可读事件。

对于ET状态应该注意防止恶意请求连接,防止其一直请求,造成其他请求饿死。

ET模式下的EPOLLOUT事件的处理

考虑客户端请求服务端大文件的场景,当客户端EPOLLIN事件发生时候,我们write一个1G的大文件给客户端,但write一次最多也只能发送最大socket写缓冲大小的文件内容给客户端,立即再次write时候,会返回EAGAIN错误(写缓冲区满了,没法再接受数据了)。为了避免轮询write造成的资源空耗的这情况,我们可以使用EPOLLOUT事件,处理逻辑如下:

  1. 当EPOLLIN事件过来后,调用write发送数据, 如果返回值大于0,如果数据没有发完,则继续发送
  2. 如果write小于0,且errno等于EAGAIN,此时说明发送缓冲区满了. 那么需要把剩余的待发送数据保存起来,然后注册EPOLLOUT,直到epoll_wait返回EPOLLOUT事件, 那么说明发送缓冲区可写了, 则再发送之前保存起来的数据,如果此时write 返回值大于0,且数据都发送完了,那么就可以把EPOLLOUT事件取消掉.

epoll 优缺点

  • 支持一个进程打开大数目的socket描述符(FD)

    epoll没有select模型中的限制,它所支持的FD上限是最大可以打开文件的数目,这个数字一般远大于select 所支持的2048。进程打开文件数量是/proc/sys/fs/file-max。

  • IO效率不随FD数目增加而线性下降

    传统select/poll的另一个致命弱点就是当你拥有一个很大的socket集合,由于网络得延时,使得任一时间只有部分的socket是”活跃”的,而select/poll每次调用都会线性扫描全部的集合,导致效率呈现线性下降。但是epoll不存在这个问题,它只会对”活跃”的socket进行操作:这是因为在内核实现中epoll是根据每个fd上面的callback函数实现的。于是,只有”活跃”的socket才会主动去调用callback函数,其他idle状态的socket则不会。

  • 使用mmap加速内核与用户空间的消息传递

    无论是select,poll还是epoll都需要内核把FD消息通知给用户空间,如何避免不必要的内存拷贝就显得很重要。在这点上,epoll是通过内核于用户空间mmap同一块内存实现。select/poll每次调用都要传递所要监控的所有fd给select/poll系统调用(这意味着每次调用都要将fd列表从用户态拷贝到内核态,当fd数目很多时,这会造成低效)。而每次调用epoll_wait时(作用相当于调用select/poll),不需要再传递fd列表给内核,因为已经在epoll_ctl中将需要监控的fd告诉了内核(epoll_ctl不需要每次都拷贝所有的fd,只需要进行增量式操作)。所以,在调用epoll_create之后,内核已经在内核态开始准备数据结构存放要监控的fd了。每次epoll_ctl只是对这个数据结构进行简单的维护。

epoll服务端代码示例

/**
	echo服务器
*/
#include <arpa/inet.h>
#include <errno.h>
#include <fcntl.h>
#include <netinet/in.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/epoll.h>
#include <sys/socket.h>
#include <unistd.h>

#define MAX_EVENTS 10
#define MAX_LINE 30
#define LISTEN_BACKLOG 128

int set_non_blocking(int fd) {
  int oldopt;
  if ((oldopt = fcntl(fd, F_GETFL)) < 0) {
    return -1;
  }

  int newopt = oldopt | O_NONBLOCK;
  if (fcntl(fd, F_SETFL, newopt) < 0) {
    return -1;
  }
  return oldopt;
}

int set_reuse_addr(int sockfd, int reuse) {
  return setsockopt(sockfd, SOL_SOCKET, SO_REUSEADDR, &reuse, sizeof(int));
}

void usage(char *progname) { fprintf(stderr, "Usage: %s <port>\n", progname); }

int main(int argc, char *argv[]) {
  int listenfd;
  int epfd;
  int port;

  if (argc != 2) {
    usage(argv[0]);
    exit(1);
  }

  if ((port = atoi(argv[1])) < 0) {
    fprintf(stderr, "invalid port: %s", argv[1]);
    exit(1);
  }

  // socket()
  listenfd = socket(AF_INET, SOCK_STREAM, 0);

  int reuse = 1;
  if (set_reuse_addr(listenfd, reuse) < 0) {
    perror("set_reuse_addr");
    return 1;
  }
  struct sockaddr_in serveraddr;
  memset(&serveraddr, 0, sizeof(struct sockaddr_in));
  serveraddr.sin_family = AF_INET;
  if (inet_aton("127.0.0.1", &(serveraddr.sin_addr)) < 0) {
    perror("inet_aton");
    return 1;
  }
  serveraddr.sin_port = htons(port);
  // bind()
  if (bind(listenfd, (struct sockaddr *)&serveraddr,
           sizeof(struct sockaddr_in)) < 0) {
    perror("bind");
    return 1;
  }
  // listen()
  if (listen(listenfd, LISTEN_BACKLOG) < 0) {
    perror("listen");
    return 1;
  }
  // epoll_create1
  if ((epfd = epoll_create1(0)) < 0) {
    perror("epoll_create1");
    return 1;
  }

  struct epoll_event event, *events;
  int connfd, sockfd, nfds;
  event.data.fd = listenfd;
  event.events = EPOLLIN;  // 默认水平触发模式
  // 注册epoll事件
  if (epoll_ctl(epfd, EPOLL_CTL_ADD, listenfd, &event) < 0) {
    perror("epoll_ctl");
    return 1;
  }

  events = malloc(MAX_EVENTS * sizeof(struct epoll_event));
  char line[MAX_LINE + 1];
  struct sockaddr_in clientaddr;
  socklen_t clientlen;
  for (;;) {
    // 等待epoll事件的发生
    nfds = epoll_wait(epfd, events, MAX_EVENTS, -1);
    if (nfds < 0) {
      perror("epoll_wait");
      return 1;
    }

    // 处理所有事件
    for (int i = 0; i < nfds; i++) {
      if (events[i].data.fd == listenfd) {  // 客户端连接请求进来了
        connfd = accept(listenfd, (struct sockaddr *)&clientaddr, &clientlen);
        if (connfd < 0) {
          perror("accept");
          continue;
        }

        printf("accept a connection from %s:%d, fd=%d\n",
               inet_ntoa(clientaddr.sin_addr), ntohs(clientaddr.sin_port),
               connfd);

        // 注册event
        event.data.fd = connfd;
        event.events = EPOLLIN;  // 默认边缘触发模式
        if (epoll_ctl(epfd, EPOLL_CTL_ADD, connfd, &event) < 0) {
          perror("epoll_ctl");
        }
        continue;
      }

      if (events[i].events & EPOLLIN) {  // 读取客户端输入内容
        if ((sockfd = events[i].data.fd) < 0) continue;

        ssize_t nread;
        nread = read(sockfd, line, MAX_LINE);
        if (nread < 0) {
          if (errno == ECONNRESET) {  // 客户端异常掉线
            printf("client fd=%d lost connection\n", sockfd);
            if (epoll_ctl(epfd, EPOLL_CTL_DEL, sockfd, NULL) <
                0) {  // 移除掉监听事件
              perror("epoll_ctl");
            }
            close(sockfd);
            events[i].data.fd = -1;
          } else {
            perror("read");
            close(sockfd);
            if (epoll_ctl(epfd, EPOLL_CTL_DEL, sockfd, NULL) < 0) {
              perror("epoll_ctl");
            }
            events[i].data.fd = -1;
          }
        } else if (nread == 0) {  // 客户端正常下线
          printf("client fd=%d normal exit\n", sockfd);
          close(sockfd);
          events[i].data.fd = -1;
        } else {
          line[nread] = '\0';
          printf("received: %s\n", line);

          int nwrite = 0;
          int n;
          while (nwrite < nread) {  // 保证完成写入完成
            if ((n = write(sockfd, line + nwrite, nread)) < 0) {
              fprintf(stderr, "write fd=%d error: %s\n", sockfd,
                      strerror(errno));
              break;
            }
            nwrite += n;
          }
        }
      }
    }
  }
}

Go与epoll

核心逻辑

每当Accept接收到新的连接时候,会将该socket的fd注册到epoll中,并将pollDesc放到epoll event的data域中。

func netpollopen(fd uintptr, pd *pollDesc) int32 {
	var ev epollevent
	ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
	*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd // epollevet data字段指向pollDesc对象
	return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
}

若goroutine读写fd阻塞时候,会将fd对应的pollDesc的rg或wg字段指向这个g,并当前G挂起:

func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
    gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}
	...
	if waitio || netpollcheckerr(pd, mode) == 0 {
		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5)
	}
	...
	return old == pdReady
}

func gopark(unlockf func(*g, unsafe.Pointer) bool, lock unsafe.Pointer, reason waitReason, traceEv byte, traceskip int) {
	if reason != waitReasonSleep {
		checkTimeouts() // timeouts may expire while two goroutines keep the scheduler busy
	}
	mp := acquirem()
	gp := mp.curg
	status := readgstatus(gp)
	if status != _Grunning && status != _Gscanrunning {
		throw("gopark: bad g status")
	}
	mp.waitlock = lock
	mp.waitunlockf = unlockf
	gp.waitreason = reason
	mp.waittraceev = traceEv
	mp.waittraceskip = traceskip
	releasem(mp)
	// can't do anything that might move the G between Ms here.
	mcall(park_m)
}

func park_m(gp *g) {
	_g_ := getg()
	casgstatus(gp, _Grunning, _Gwaiting)
	dropg()

	if fn := _g_.m.waitunlockf; fn != nil {
		ok := fn(gp, _g_.m.waitlock)
		_g_.m.waitunlockf = nil
		_g_.m.waitlock = nil
		if !ok {
			casgstatus(gp, _Gwaiting, _Grunnable)
			execute(gp, true) // Schedule it back, never returns.
		}
	}
	schedule()
}

func dropg() {
	_g_ := getg()

	setMNoWB(&_g_.m.curg.m, nil)
	setGNoWB(&_g_.m.curg, nil)
}

func netpollblockcommit(gp *g, gpp unsafe.Pointer) bool {
	r := atomic.Casuintptr((*uintptr)(gpp), pdWait, uintptr(unsafe.Pointer(gp)))
	if r {
		atomic.Xadd(&netpollWaiters, 1)
	}
	return r
}

netpollblock函数调用gopark将goroutine挂起,gopark里面会调用park_m函数,park_m函数首先会dropg()处理,解除当前goroutine与M的绑定,然后执行netpollblockcommit回调函数,而在netpollblockcommit中会将epoll event中data域指向当前goroutine,这样读取epoll_wait返回的epoll event的data域就可以找到阻塞goroutine,然后将它唤醒。

调用器的findrunnable函数和系统监控sysmon会调用netpoll函数,获取就绪的socket,并得到阻塞此socket的goroutine。

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from local or global queue, poll network.
func findrunnable() (gp *g, inheritTime bool) {
	_g_ := getg()
	...
	if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
		// 使用netpoll获取已准备就绪的socket
		if list := netpoll(0); !list.empty() { // non-blocking
			gp := list.pop()
			injectglist(&list)
			casgstatus(gp, _Gwaiting, _Grunnable)
			if trace.enabled {
				traceGoUnpark(gp, 0)
			}
			return gp, false
		}
	}
	...
}


func sysmon() {
	...
	for {
		...
		// poll network if not polled for more than 10ms
        // 每隔10ms进行netpoll
		lastpoll := int64(atomic.Load64(&sched.lastpoll))
		if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
			atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
			list := netpoll(0) // non-blocking - returns list of goroutines
			if !list.empty() {
				incidlelocked(-1)
				injectglist(&list)
				incidlelocked(1)
			}
		}
		...
	}
}

netpoll实现:

func netpoll(delay int64) gList {
	if epfd == -1 {
		return gList{}
	}
	var waitms int32
	if delay < 0 {
		waitms = -1
	} else if delay == 0 {
		waitms = 0
	} else if delay < 1e6 {
		waitms = 1
	} else if delay < 1e15 {
		waitms = int32(delay / 1e6)
	} else {
		// An arbitrary cap on how long to wait for a timer.
		// 1e9 ms == ~11.5 days.
		waitms = 1e9
	}
	var events [128]epollevent
retry:
	n := epollwait(epfd, &events[0], int32(len(events)), waitms) // epoll_wait系统调用
	if n < 0 {
		if n != -_EINTR {
			println("runtime: epollwait on fd", epfd, "failed with", -n)
			throw("runtime: netpoll failed")
		}
		// If a timed sleep was interrupted, just return to
		// recalculate how long we should sleep now.
		if waitms > 0 {
			return gList{}
		}
		goto retry
	}
	var toRun gList
	for i := int32(0); i < n; i++ {
		ev := &events[i]
		if ev.events == 0 {
			continue
		}

		if *(**uintptr)(unsafe.Pointer(&ev.data)) == &netpollBreakRd {
			if ev.events != _EPOLLIN {
				println("runtime: netpoll: break fd ready for", ev.events)
				throw("runtime: netpoll: break fd ready for something unexpected")
			}
			if delay != 0 {
				// netpollBreak could be picked up by a
				// nonblocking poll. Only read the byte
				// if blocking.
				var tmp [16]byte
				read(int32(netpollBreakRd), noescape(unsafe.Pointer(&tmp[0])), int32(len(tmp)))
			}
			continue
		}

		var mode int32
		if ev.events&(_EPOLLIN|_EPOLLRDHUP|_EPOLLHUP|_EPOLLERR) != 0 {
			mode += 'r'
		}
		if ev.events&(_EPOLLOUT|_EPOLLHUP|_EPOLLERR) != 0 {
			mode += 'w'
		}
		if mode != 0 {
			pd := *(**pollDesc)(unsafe.Pointer(&ev.data))
			pd.everr = false
			if ev.events == _EPOLLERR {
				pd.everr = true
			}
			netpollready(&toRun, pd, mode) // 唤醒G
		}
	}
	return toRun
}

完整流程

服务端代码示例

//TcpServer.go
package main

import (
	"fmt"
	"net"
)

func main() {
	ln, err := net.Listen("tcp", ":8080")
	if err != nil {
		panic(err)
	}
	for {
		conn, err := ln.Accept()
		if err != nil {
			panic(err)
		}
		// 每个Client一个Goroutine
		go handleConnection(conn)
	}
}

func handleConnection(conn net.Conn) {
	defer conn.Close()
	var body [4]byte
	addr := conn.RemoteAddr()
	for {
		// 读取客户端消息
		_, err := conn.Read(body[:])
		if err != nil {
			break
		}
		fmt.Printf("收到%s消息: %s\n", addr, string(body[:]))
		// 回包
		_, err = conn.Write(body[:])
		if err != nil {
			break
		}
		fmt.Printf("发送给%s: %s\n", addr, string(body[:]))
	}
	fmt.Printf("与%s断开!\n", addr)
}

socket创建、bind、listen,以及epoll创建

net.Listen:

func Listen(network, address string) (Listener, error) {
	var lc ListenConfig
	return lc.Listen(context.Background(), network, address)
}

ListenConfig.Listen:

type ListenConfig struct {
	// If Control is not nil, it is called after creating the network
	// connection but before binding it to the operating system.
	//
	// Network and address parameters passed to Control method are not
	// necessarily the ones passed to Listen. For example, passing "tcp" to
	// Listen will cause the Control function to be called with "tcp4" or "tcp6".
	Control func(network, address string, c syscall.RawConn) error

	// KeepAlive specifies the keep-alive period for network
	// connections accepted by this listener.
	// If zero, keep-alives are enabled if supported by the protocol
	// and operating system. Network protocols or operating systems
	// that do not support keep-alives ignore this field.
	// If negative, keep-alives are disabled.
	KeepAlive time.Duration
}

func (lc *ListenConfig) Listen(ctx context.Context, network, address string) (Listener, error) {
	addrs, err := DefaultResolver.resolveAddrList(ctx, "listen", network, address, nil)
	if err != nil {
		return nil, &OpError{Op: "listen", Net: network, Source: nil, Addr: nil, Err: err}
	}
	sl := &sysListener{
		ListenConfig: *lc,
		network:      network,
		address:      address,
	}
	var l Listener
	la := addrs.first(isIPv4)
	switch la := la.(type) {
	case *TCPAddr:
		l, err = sl.listenTCP(ctx, la) // 调用sysListener.listenTCP
	case *UnixAddr:
		l, err = sl.listenUnix(ctx, la)
	default:
		return nil, &OpError{Op: "listen", Net: sl.network, Source: nil, Addr: la, Err: &AddrError{Err: "unexpected address type", Addr: address}}
	}
	if err != nil {
		return nil, &OpError{Op: "listen", Net: sl.network, Source: nil, Addr: la, Err: err} // l is non-nil interface containing nil pointer
	}
	return l, nil
}

sysListener.listenTCP:

type sysListener struct {
	ListenConfig
	network, address string
}

func (sl *sysListener) listenTCP(ctx context.Context, laddr *TCPAddr) (*TCPListener, error) {
	fd, err := internetSocket(ctx, sl.network, laddr, nil, syscall.SOCK_STREAM, 0, "listen", sl.ListenConfig.Control) // raddr参数为nil
    // internetSocket支持服务端和客户端socket创建,服务端依赖laddr,客户端依赖raddr
	if err != nil {
		return nil, err
	}
	return &TCPListener{fd: fd, lc: sl.ListenConfig}, nil
}

internetSocket用来创建socket:

func internetSocket(ctx context.Context, net string, laddr, raddr sockaddr, sotype, proto int, mode string, ctrlFn func(string, string, syscall.RawConn) error) (fd *netFD, err error) {
	if (runtime.GOOS == "aix" || runtime.GOOS == "windows" || runtime.GOOS == "openbsd") && mode == "dial" && raddr.isWildcard() {
		raddr = raddr.toLocal(net)
	}
	family, ipv6only := favoriteAddrFamily(net, laddr, raddr, mode)
	return socket(ctx, net, family, sotype, proto, ipv6only, laddr, raddr, ctrlFn)
}

func socket(ctx context.Context, net string, family, sotype, proto int, ipv6only bool, laddr, raddr sockaddr, ctrlFn func(string, string, syscall.RawConn) error) (fd *netFD, err error) {
	s, err := sysSocket(family, sotype, proto) // 调用系统socket,返回socket fd
	if err != nil {
		return nil, err
	}
	if err = setDefaultSockopts(s, family, sotype, ipv6only); err != nil {
		poll.CloseFunc(s)
		return nil, err
	}
    // newFD()返回netFD
    // fd是socket fd的上层包装
	if fd, err = newFD(s, family, sotype, net); err != nil {
		poll.CloseFunc(s)
		return nil, err
	}

	if laddr != nil && raddr == nil { // 服务端连接
		switch sotype {
		case syscall.SOCK_STREAM, syscall.SOCK_SEQPACKET: // tcp
            // listenerBacklog()返回全连接队列的backlog
			if err := fd.listenStream(laddr, listenerBacklog(), ctrlFn); err != nil {
				fd.Close()
				return nil, err
			}
			return fd, nil
		case syscall.SOCK_DGRAM: // udp
			if err := fd.listenDatagram(laddr, ctrlFn); err != nil {
				fd.Close()
				return nil, err
			}
			return fd, nil
		}
	}
	if err := fd.dial(ctx, laddr, raddr, ctrlFn); err != nil {
		fd.Close()
		return nil, err
	}
	return fd, nil
}

socket的创建:

sysSocket用来创建socket

var (
	testHookDialChannel  = func() {} // for golang.org/issue/5349
	testHookCanceledDial = func() {} // for golang.org/issue/16523

	// Placeholders for socket system calls.
	socketFunc        func(int, int, int) (int, error)  = syscall.Socket
	connectFunc       func(int, syscall.Sockaddr) error = syscall.Connect
	listenFunc        func(int, int) error              = syscall.Listen
	getsockoptIntFunc func(int, int, int) (int, error)  = syscall.GetsockoptInt
)

func sysSocket(family, sotype, proto int) (int, error) {
    // socketFunc是syscall.Socket系统调用
	s, err := socketFunc(family, sotype|syscall.SOCK_NONBLOCK|syscall.SOCK_CLOEXEC, proto) // 创建非阻塞的socket
	// On Linux the SOCK_NONBLOCK and SOCK_CLOEXEC flags were
	// introduced in 2.6.27 kernel and on FreeBSD both flags were
	// introduced in 10 kernel. If we get an EINVAL error on Linux
	// or EPROTONOSUPPORT error on FreeBSD, fall back to using
	// socket without them.
	switch err {
	case nil:
		return s, nil
	default:
		return -1, os.NewSyscallError("socket", err)
	case syscall.EPROTONOSUPPORT, syscall.EINVAL:
	}

	// See ../syscall/exec_unix.go for description of ForkLock.
	syscall.ForkLock.RLock()
	s, err = socketFunc(family, sotype, proto)
	if err == nil {
		syscall.CloseOnExec(s)
	}
	syscall.ForkLock.RUnlock()
	if err != nil {
		return -1, os.NewSyscallError("socket", err)
	}
	if err = syscall.SetNonblock(s, true); err != nil {
		poll.CloseFunc(s)
		return -1, os.NewSyscallError("setnonblock", err)
	}
	return s, nil
}

newFD函数定义如下:

// Network file descriptor.
type netFD struct {
	pfd poll.FD // 类型为internal/poll/FD

	// immutable until Close
	family      int
	sotype      int
	isConnected bool // handshake completed or use of association with peer
	net         string
	laddr       Addr
	raddr       Addr
}

// internal/poll
type FD struct {
	// Lock sysfd and serialize access to Read and Write methods.
	fdmu fdMutex

	// System file descriptor. Immutable until Close.
	Sysfd int

	// I/O poller.
	pd pollDesc // 类型为internal/poll.pollDesc

	// Writev cache.
	iovecs *[]syscall.Iovec

	// Semaphore signaled when file is closed.
	csema uint32

	// Non-zero if this file has been set to blocking mode.
	isBlocking uint32

	// Whether this is a streaming descriptor, as opposed to a
	// packet-based descriptor like a UDP socket. Immutable.
	IsStream bool

	// Whether a zero byte read indicates EOF. This is false for a
	// message based socket connection.
	ZeroReadIsEOF bool

	// Whether this is a file rather than a network socket.
	isFile bool
}

// internal/poll
type pollDesc struct {
	runtimeCtx uintptr // 指向runtime.pollDesc
}

func newFD(sysfd, family, sotype int, net string) (*netFD, error) {
	ret := &netFD{
		pfd: poll.FD{
			Sysfd:         sysfd,
			IsStream:      sotype == syscall.SOCK_STREAM,
			ZeroReadIsEOF: sotype != syscall.SOCK_DGRAM && sotype != syscall.SOCK_RAW,
		},
		family: family,
		sotype: sotype,
		net:    net,
	}
	return ret, nil
}

netFD.listenStream定义如下:

func (fd *netFD) listenStream(laddr sockaddr, backlog int, ctrlFn func(string, string, syscall.RawConn) error) error {
	var err error
	if err = setDefaultListenerSockopts(fd.pfd.Sysfd); err != nil {
		return err
	}
	var lsa syscall.Sockaddr
	if lsa, err = laddr.sockaddr(fd.family); err != nil {
		return err
	}
	if ctrlFn != nil {
		c, err := newRawConn(fd)
		if err != nil {
			return err
		}
		if err := ctrlFn(fd.ctrlNetwork(), laddr.String(), c); err != nil {
			return err
		}
	}
	if err = syscall.Bind(fd.pfd.Sysfd, lsa); err != nil { // bind操作
		return os.NewSyscallError("bind", err)
	}
    // listenFunc  func(int, int) error  = syscall.Listen
    // listenFunc是syscall.Listen
	if err = listenFunc(fd.pfd.Sysfd, backlog); err != nil { // listen操作
		return os.NewSyscallError("listen", err)
	}
	if err = fd.init(); err != nil { // fd初始化操作
		return err
	}
	lsa, _ = syscall.Getsockname(fd.pfd.Sysfd)
	fd.setAddr(fd.addrFunc()(lsa), nil)
	return nil
}

func setDefaultListenerSockopts(s int) error {
	// Allow reuse of recently-used addresses.
	return os.NewSyscallError("setsockopt", syscall.SetsockoptInt(s, syscall.SOL_SOCKET, syscall.SO_REUSEADDR, 1))
}

netFD的初始化操作:

func (fd *netFD) init() error {
	return fd.pfd.Init(fd.net, true) // 最终是poll.FD的初始化工作
}

// internal/poll.FD的初始化工作
func (fd *FD) Init(net string, pollable bool) error {
	// We don't actually care about the various network types.
	if net == "file" {
		fd.isFile = true
	}
	if !pollable {
		fd.isBlocking = 1
		return nil
	}
	err := fd.pd.init(fd) // 最终是poll.FD.pollDesc的初始化工作
	if err != nil {
		// If we could not initialize the runtime poller,
		// assume we are using blocking mode.
		fd.isBlocking = 1
	}
	return err
}

// internal/poll.pollDesc的初始化工作
func (pd *pollDesc) init(fd *FD) error {
	serverInit.Do(runtime_pollServerInit) // 调用epoll_create,完成epoll创建
	ctx, errno := runtime_pollOpen(uintptr(fd.Sysfd)) // 将socket fd添加到epoll中
	if errno != 0 {
		if ctx != 0 {
			runtime_pollUnblock(ctx)
			runtime_pollClose(ctx)
		}
		return errnoErr(syscall.Errno(errno))
	}
	pd.runtimeCtx = ctx
	return nil
}

internal/poll.pollDesc完成初始化工作有:

  1. epoll创建
  2. 将socket fd加入epoll中

epoll的创建

var serverInit sync.Once
func (pd *pollDesc) init(fd *FD) error {
	serverInit.Do(runtime_pollServerInit)
    ...
}

// runtime_pollServerInit最终实现是runtime.poll_runtime_pollServerInit
//go:linkname poll_runtime_pollServerInit internal/poll.runtime_pollServerInit
func poll_runtime_pollServerInit() {
	netpollGenericInit()
}

func netpollGenericInit() {
	if atomic.Load(&netpollInited) == 0 {
		lock(&netpollInitLock)
		if netpollInited == 0 {
			netpollinit()
			atomic.Store(&netpollInited, 1)
		}
		unlock(&netpollInitLock)
	}
}

var (
	epfd int32 = -1 // epoll descriptor

	netpollBreakRd, netpollBreakWr uintptr // for netpollBreak
)


func netpollinit() {
	epfd = epollcreate1(_EPOLL_CLOEXEC) // 调用系统调用epoll_create1
	if epfd < 0 {
		epfd = epollcreate(1024)
		if epfd < 0 {
			println("runtime: epollcreate failed with", -epfd)
			throw("runtime: netpollinit failed")
		}
		closeonexec(epfd)
	}
	r, w, errno := nonblockingPipe()
	if errno != 0 {
		println("runtime: pipe failed with", -errno)
		throw("runtime: pipe failed")
	}
	ev := epollevent{
		events: _EPOLLIN,
	}
	*(**uintptr)(unsafe.Pointer(&ev.data)) = &netpollBreakRd
	errno = epollctl(epfd, _EPOLL_CTL_ADD, r, &ev)
	if errno != 0 {
		println("runtime: epollctl failed with", -errno)
		throw("runtime: epollctl failed")
	}
	netpollBreakRd = uintptr(r)
	netpollBreakWr = uintptr(w)
}

将socket fd加入epoll中:

将socket fd加入epoll中,是runtime_pollOpen完成。runtime_pollOpen最终实现是poll_runtime_pollOpen。

//go:linkname poll_runtime_pollOpen internal/poll.runtime_pollOpen
func poll_runtime_pollOpen(fd uintptr) (*pollDesc, int) {
	pd := pollcache.alloc()
	lock(&pd.lock)
	if pd.wg != 0 && pd.wg != pdReady {
		throw("runtime: blocked write on free polldesc")
	}
	if pd.rg != 0 && pd.rg != pdReady {
		throw("runtime: blocked read on free polldesc")
	}
	pd.fd = fd // socket fd
	pd.closing = false
	pd.everr = false
	pd.rseq++
	pd.rg = 0
	pd.rd = 0
	pd.wseq++
	pd.wg = 0
	pd.wd = 0
	unlock(&pd.lock)

	var errno int32
	errno = netpollopen(fd, pd)
	return pd, int(errno)
}

func netpollopen(fd uintptr, pd *pollDesc) int32 {
	var ev epollevent
	ev.events = _EPOLLIN | _EPOLLOUT | _EPOLLRDHUP | _EPOLLET
	*(**pollDesc)(unsafe.Pointer(&ev.data)) = pd
	return -epollctl(epfd, _EPOLL_CTL_ADD, int32(fd), &ev)
}

最后我们看下listenerBacklog的实现:

var listenerBacklogCache struct {
	sync.Once
	val int
}

// listenerBacklog is a caching wrapper around maxListenerBacklog.
func listenerBacklog() int {
	listenerBacklogCache.Do(func() { listenerBacklogCache.val = maxListenerBacklog() })
	return listenerBacklogCache.val
}

func maxListenerBacklog() int {
	fd, err := open("/proc/sys/net/core/somaxconn")
	if err != nil {
		return syscall.SOMAXCONN
	}
	defer fd.close()
	l, ok := fd.readLine()
	if !ok {
		return syscall.SOMAXCONN
	}
	f := getFields(l)
	n, _, ok := dtoi(f[0])
	if n == 0 || !ok {
		return syscall.SOMAXCONN
	}
	// Linux stores the backlog in a uint16.
	// Truncate number to avoid wrapping.
	// See issue 5030.
	if n > 1<<16-1 {
		n = 1<<16 - 1
	}
	return n
}

accept操作

accept操作是由TCPListener提供。

type TCPListener struct {
	fd *netFD
	lc ListenConfig
}
func (l *TCPListener) Accept() (Conn, error) {
	if !l.ok() {
		return nil, syscall.EINVAL
	}
	c, err := l.accept()
	if err != nil {
		return nil, &OpError{Op: "accept", Net: l.fd.net, Source: nil, Addr: l.fd.laddr, Err: err}
	}
	return c, nil
}

func (ln *TCPListener) ok() bool { return ln != nil && ln.fd != nil }

func (ln *TCPListener) accept() (*TCPConn, error) {
	fd, err := ln.fd.accept()
	if err != nil {
		return nil, err
	}
	tc := newTCPConn(fd)
	if ln.lc.KeepAlive >= 0 {
		setKeepAlive(fd, true)
		ka := ln.lc.KeepAlive
		if ln.lc.KeepAlive == 0 {
			ka = defaultTCPKeepAlive // keepalive默认15min
		}
		setKeepAlivePeriod(fd, ka)
	}
	return tc, nil
}

TCPListener.Accept最终调用netFD.accept:

func (fd *netFD) accept() (netfd *netFD, err error) {
	d, rsa, errcall, err := fd.pfd.Accept() // d指向的新进来的socket fd
	if err != nil {
		if errcall != "" {
			err = wrapSyscallError(errcall, err)
		}
		return nil, err
	}
	if netfd, err = newFD(d, fd.family, fd.sotype, fd.net); err != nil {
		poll.CloseFunc(d)
		return nil, err
	}
	if err = netfd.init(); err != nil { // netfd.init()逻辑上面已分析过了。
    // 这里是将进来的连接的fd加入到epoll中去
		netfd.Close()
		return nil, err
	}
	lsa, _ := syscall.Getsockname(netfd.pfd.Sysfd)
	netfd.setAddr(netfd.addrFunc()(lsa), netfd.addrFunc()(rsa))
	return netfd, nil
}

netFD.Accept最终由internal/poll.FD

func (fd *FD) Accept() (int, syscall.Sockaddr, string, error) {
	if err := fd.readLock(); err != nil {
		return -1, nil, "", err
	}
	defer fd.readUnlock()

	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return -1, nil, "", err
	}
	for {
		// 由于fd.Sysfd是非阻塞的,那么accept也是非阻塞的,程序写的时候,要不断处理返回EAGAIN的错误情况
		s, rsa, errcall, err := accept(fd.Sysfd) // 调用系统调用accept,若有数据直接返回,
        // 若没有数据则判断错误类型,若是非阻塞情况返回的EAGIN,说明只是没有数据而已,则使用internal/poll.FD.waitRead等待
		if err == nil {
			return s, rsa, "", err
		}
		switch err {
		case syscall.EAGAIN:
			if fd.pd.pollable() {
				if err = fd.pd.waitRead(fd.isFile); err == nil { // internal/poll.FD.waitRead阻塞等待
					continue
				}
			}
		case syscall.ECONNABORTED:
			// This means that a socket on the listen
			// queue was closed before we Accept()ed it;
			// it's a silly error, so try again.
			continue
		}
		return -1, nil, errcall, err
	}
}

internal/poll.FD.waitRead的实现:

func (pd *pollDesc) waitRead(isFile bool) error {
	return pd.wait('r', isFile)
}

func (pd *pollDesc) wait(mode int, isFile bool) error {
	if pd.runtimeCtx == 0 {
		return errors.New("waiting for unsupported file type")
	}
	res := runtime_pollWait(pd.runtimeCtx, mode)
	return convertErr(res, isFile)
}

runtime_pollWait最终是由runtime.poll_runtime_pollWait实现。

//go:linkname poll_runtime_pollWait internal/poll.runtime_pollWait
func poll_runtime_pollWait(pd *pollDesc, mode int) int {
	err := netpollcheckerr(pd, int32(mode))
	if err != 0 {
		return err
	}
	// As for now only Solaris, illumos, and AIX use level-triggered IO.
	if GOOS == "solaris" || GOOS == "illumos" || GOOS == "aix" {
		netpollarm(pd, mode)
	}
	for !netpollblock(pd, int32(mode), false) { // netpollblock是核心
		err = netpollcheckerr(pd, int32(mode))
		if err != 0 {
			return err
		}
		// Can happen if timeout has fired and unblocked us,
		// but before we had a chance to run, timeout has been reset.
		// Pretend it has not happened and retry.
	}
	return 0
}

我们来看下netpollblock的实现:

// 如果socket io是可读可写进来了,那么返回true,否则返回false
func netpollblock(pd *pollDesc, mode int32, waitio bool) bool {
	gpp := &pd.rg
	if mode == 'w' {
		gpp = &pd.wg
	}

	// set the gpp semaphore to WAIT
	for {
		old := *gpp
		if old == pdReady {
			*gpp = 0
			return true
		}
		if old != 0 {
			throw("runtime: double wait")
		}
		if atomic.Casuintptr(gpp, 0, pdWait) {
			break
		}
	}

	// need to recheck error states after setting gpp to WAIT
	// this is necessary because runtime_pollUnblock/runtime_pollSetDeadline/deadlineimpl
	// do the opposite: store to closing/rd/wd, membarrier, load of rg/wg
	if waitio || netpollcheckerr(pd, mode) == 0 {
		gopark(netpollblockcommit, unsafe.Pointer(gpp), waitReasonIOWait, traceEvGoBlockNet, 5) 
	}
	// be careful to not lose concurrent READY notification
	old := atomic.Xchguintptr(gpp, 0)
	if old > pdWait {
		throw("runtime: corrupted polldesc")
	}
	return old == pdReady
}

从上面可以看到Go中Accept的阻塞不同linux的Accept,Go中通过休眠G实现阻塞的,而linux是通过Accept调用进行阻塞的。

Read操作

Read操作是由TCPConn提供:

type TCPConn struct {
	conn
}

type conn struct {
	fd *netFD
}
func newTCPConn(fd *netFD) *TCPConn {
	c := &TCPConn{conn{fd}}
	setNoDelay(c.fd, true)
	return c
}

func (c *conn) ok() bool { return c != nil && c.fd != nil }

// Read implements the Conn Read method.
func (c *conn) Read(b []byte) (int, error) {
	if !c.ok() {
		return 0, syscall.EINVAL
	}
	n, err := c.fd.Read(b)
	if err != nil && err != io.EOF {
		err = &OpError{Op: "read", Net: c.fd.net, Source: c.fd.laddr, Addr: c.fd.raddr, Err: err}
	}
	return n, err
}

TCPConn的Read操作最终是调用netFD.Read:

func (fd *netFD) Read(p []byte) (n int, err error) {
	n, err = fd.pfd.Read(p)
	runtime.KeepAlive(fd)
	return n, wrapSyscallError("read", err)
}

而netFD.Read是由internal/poll.FD实现:

func (fd *FD) Read(p []byte) (int, error) {
	if err := fd.readLock(); err != nil {
		return 0, err
	}
	defer fd.readUnlock()
	if len(p) == 0 {
		// If the caller wanted a zero byte read, return immediately
		// without trying (but after acquiring the readLock).
		// Otherwise syscall.Read returns 0, nil which looks like
		// io.EOF.
		// TODO(bradfitz): make it wait for readability? (Issue 15735)
		return 0, nil
	}
	if err := fd.pd.prepareRead(fd.isFile); err != nil {
		return 0, err
	}
	if fd.IsStream && len(p) > maxRW {
		p = p[:maxRW]
	}
	for {
		n, err := syscall.Read(fd.Sysfd, p) // 调用系统调用Read,尝试读数据,读到则返回,由于fd.Sysfd是非阻塞的,所以syscall.Read也是非阻塞的
		if err != nil {
			n = 0
			if err == syscall.EAGAIN && fd.pd.pollable() { // 没有数据可读,那么等待可读
				if err = fd.pd.waitRead(fd.isFile); err == nil {
					continue
				}
			}

			// On MacOS we can see EINTR here if the user
			// pressed ^Z.  See issue #22838.
			if runtime.GOOS == "darwin" && err == syscall.EINTR {
				continue
			}
		}
		err = fd.eofError(n, err)
		return n, err
	}
}

Read操作和Accept类似,最后都是调用fd.pd.waitRead来处理没有数据的情况。waitRead最后是调用gopark休眠goroutine,用户层感受到的Read阻塞住是因为通过休眠G实现的,而并没有阻塞到某个系统调用里,因此对于系统来说实现了非阻塞IO。

Write操作

同Read类似,略。

keepalive

keepalive是一种探活机制,一般用于服务端探测客户端存活状态,当客户端异常时候,主动下掉服务端维持的长连接,减少资源消耗。默认情况由内核以下几个参数控制:

# sudo sysctl -a | grep keepalive
net.ipv4.tcp_keepalive_intvl = 75
net.ipv4.tcp_keepalive_probes = 9
net.ipv4.tcp_keepalive_time = 7200
  • tcp_keepalive_time

    连接处于空闲状态下最长多长时间会发送keepalive包。默认2小时。当对方回复如下三种情况下,会采取不同操作。

    • 对方回复ACK

      说明对方处于存活状态,不做任何处理。等待tcp_keepalive_time再发送keepalive包

    • 对方回复RST

      说明对方重启或下线,则服务端会关闭此连接

    • 对方没有任何回复

      那么会重试 tcp_keepalive_probes次,每次间隔tcp_keepalive_intvl秒,若最后还是不可达,那么向应用程序返回ETIMEOUT或EHOST错误

  • tcp_keepalive_intvl

    keepalive报文包重试间隔

  • tcp_keepalive_probes

    keepalive报文包最大重试次数

关于keepalive包,一般有两种情况:

  1. 发送一个空包

  2. 发送一个字节大小的包,包内容为空字符,对应就是ascii码表中0x00对应的那个符号。

    这里面有个技巧。假定客户端上一次发送的接收ack是101,那么服务端发送keepalive的seq是100,即客户端ack减少1。客户端接收到这个包,发现序号100的报文之前已经接收过了,那么就会丢弃这个空字符,也就不正常报文造成干扰了,接着响应ack。

需要注意的是内核参数并不能决定应用程序是否开启keepalive机制。这需要应用程序开启。我们可以针对每个fd设置keepalive参数,下面是相关系统调用:

setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &val, sizeof(val)) // 开启keepalive
setsockopt(fd, IPPROTO_TCP, TCP_KEEPIDLE, &val, sizeof(val)) // 设置tcp_keepalive_time
setsockopt(fd, IPPROTO_TCP, TCP_KEEPINTVL, &val, sizeof(val)) // 设置tcp_keepalive_intvl
setsockopt(fd, IPPROTO_TCP, TCP_KEEPCNT, &val, sizeof(val)) // 设置tcp_keepalive_probes

Go中的keepalive

Go语言中每当一个新连接accept进来时候,默认会开启keepalive机制:

func (ln *TCPListener) accept() (*TCPConn, error) {
	fd, err := ln.fd.accept()
	if err != nil {
		return nil, err
	}
	tc := newTCPConn(fd)
	if ln.lc.KeepAlive >= 0 {
		setKeepAlive(fd, true) // 开启keepalive
		ka := ln.lc.KeepAlive
		if ln.lc.KeepAlive == 0 {
			ka = defaultTCPKeepAlive // tcp_keepalive_time默认15min
		}
		setKeepAlivePeriod(fd, ka) // 设置tcp_keepalive_time
	}
	return tc, nil
}

setKeepAlive和setKeepAlivePeriod函数操作的对象是netFD对象:

func setKeepAlive(fd *netFD, keepalive bool) error {
	err := fd.pfd.SetsockoptInt(syscall.SOL_SOCKET, syscall.SO_KEEPALIVE, boolint(keepalive))
	runtime.KeepAlive(fd)
	return wrapSyscallError("setsockopt", err)
}

// setKeepAlivePeriod函数中没有设置tcp_keepalive_probes,
// 但将tcp_keepalive_intvl和tcp_keepalive_time设置为相同值,这相当于设置tcp_keepalive_probes=net.ipv4.tcp_keepalive_probes+1。
func setKeepAlivePeriod(fd *netFD, d time.Duration) error {
	// The kernel expects seconds so round to next highest second.
	secs := int(roundDurationUp(d, time.Second))
	if err := fd.pfd.SetsockoptInt(syscall.IPPROTO_TCP, syscall.TCP_KEEPINTVL, secs); err != nil { // 设置tcp_keepalive_intvl
		return wrapSyscallError("setsockopt", err)
	}
	err := fd.pfd.SetsockoptInt(syscall.IPPROTO_TCP, syscall.TCP_KEEPIDLE, secs) // 设置tcp_keepalive_time
	runtime.KeepAlive(fd)
	return wrapSyscallError("setsockopt", err)
}

func (fd *FD) SetsockoptInt(level, name, arg int) error {
	if err := fd.incref(); err != nil {
		return err
	}
	defer fd.decref()
	return syscall.SetsockoptInt(fd.Sysfd, level, name, arg)
}

系统调用包syscall中SetsockoptInt的实现如下:

func SetsockoptInt(fd, level, opt int, value int) (err error) {
	var n = int32(value)
	return setsockopt(fd, level, opt, unsafe.Pointer(&n), 4)
}

func setsockopt(s int, level int, name int, val unsafe.Pointer, vallen uintptr) (err error) {
	_, _, e1 := Syscall6(SYS_SETSOCKOPT, uintptr(s), uintptr(level), uintptr(name), uintptr(val), uintptr(vallen), 0)
	if e1 != 0 {
		err = errnoErr(e1)
	}
	return
}

Syscall6是通过汇编实现系统调用的。系统调用约定参见使用专有系统调用指令:

// func Syscall6(trap, a1, a2, a3, a4, a5, a6 uintptr) (r1, r2, err uintptr)
TEXT ·Syscall6(SB),NOSPLIT,$0-80
	CALL	runtime·entersyscall(SB)
	MOVQ	a1+8(FP), DI
	MOVQ	a2+16(FP), SI
	MOVQ	a3+24(FP), DX
	MOVQ	a4+32(FP), R10
	MOVQ	a5+40(FP), R8
	MOVQ	a6+48(FP), R9
	MOVQ	trap+0(FP), AX	// syscall entry
	SYSCALL
	CMPQ	AX, $0xfffffffffffff001
	JLS	ok6
	MOVQ	$-1, r1+56(FP)
	MOVQ	$0, r2+64(FP)
	NEGQ	AX
	MOVQ	AX, err+72(FP)
	CALL	runtime·exitsyscall(SB)
	RET
ok6:
	MOVQ	AX, r1+56(FP)
	MOVQ	DX, r2+64(FP)
	MOVQ	$0, err+72(FP)
	CALL	runtime·exitsyscall(SB)
	RET

参考资料