Skip to content

Latest commit

 

History

History
827 lines (573 loc) · 33.1 KB

2 应用层.md

File metadata and controls

827 lines (573 loc) · 33.1 KB
title
2 应用层

1 Principles of Network Applications

Processes Communication

A process sends message into, and receives message from, the network through a software interface called a socket(套接字).

A socket is also referred to as the API between the application and the network, since the socket is the programming interface with which network applications are built.

The application developer has control of everything on the application-layer side of the socket but has little control of the transport-layer side of the socket. The only control that the application developer has on the transport-layer side is

  • the choice of transport protocol
  • perhaps the ability to fix a few transport-layer parameters such as maximum buffer and maximum segment sizes

application_processes_sockets_and_underlying_transport_protoco

Transport Services Available to Applications

What are the services that a transport-layer protocol can offer to applications invoking it? We can broadly classify the possible services along four dimensions: reliable data transfer(可靠数据传输), throughput(吞吐量), timing(定时), and security(安全性).

Reliable Data Transfer

Packets can get lost within a computer network(e.g. overflow a buffer in a router, discarded by a host or router after having some of its bits corrupted).

  • If a protocol provides such a guaranteed data delivery service, it is said to provide reliable data transfer(可靠数据传输).
  • Loss-tolerant applications(容忍丢失的应用), most notably multimedia applications such as conversational audio/video, can tolerate some amount of data loss.

Throughput

In the context of a communication session between two processes along a network path, throughput(吞吐量)is the rate at which the sending process can deliver bits to the receiving process.

  • Applications that have throughput requirements are said to be bandwidth-sensitive applications(带宽敏感应用). e.g. multimedia applications.
  • Elastic applications(弹性应用) can make use of as much, or as little, throughput as happens to be available. e.g.electronic mail, file transfer, and web transfer.

Timing

Low delay is appealing to interactive real-time applications, such as Internet telephony, virtual environments, teleconferencing, and multiplayer games, all of which require tight timing constraints on data delivery in order to be effective.

For non-real-time applications, lower delay is always preferable to higher delay, but no tight constraint is placed on the end-to-end delays.

Transport Services Provided by the Internet

The Internet (and, more generally, TCP/IP networks) makes two transport protocols available to applications, UDP and TCP. Each of these protocols offers a different set of services to the invoking applications. The figure below shows the service requirements for some selected applications:

Application Data Loss Throughput Time-Sensitive Application-Layer Protocol Underlying Transport Protocol
File transfer No Loss Elastic No FTP TCP
Email No Loss Elastic No SMTP TCP
Web documents No Loss Elastic(few kbps) No HTTP TCP
Internet telephony Loss-tolerant bandwidth-sensitive Yes SIP/RTP UDP/TCP
Streaming multimedia Loss-tolerant bandwidth-sensitive Yes HTTP TCP

TCP Services:

  • connection-oriented service: TCP has the client and server exchange transport layer control information with each other before the application-level messages begin to flow. 面向连接:客户机/服务器进程间需要建立连接
  • Reliable data transfer service: The communicating processes can rely on TCP to deliver all data sent without error and in the proper order. 可靠的数据传输
  • TCP also includes a congestion-control mechanism: throttles a sending process (client or server) when the network is congested between sender and receiver.拥塞控制

UDP Services:

  • connectionless无连接
  • unreliable data transfer不可靠的数据传输: no guarantee that the message will ever reach the receiving process and may arrive at the receiving process out of order.

2 The Web and HTTP

Overview of HTTP

The HyperText Transfer Protocol (HTTP, 超文本传输协议), the Web’s application-layer protocol, is at the heart of the Web.

A Web page(also called a document) consists of objects. An object is simply a file.

Each URL has two components: the hostname of the server that houses the object and the object’s path name.

Http request response hehavior

HTTP uses TCP as its underlying transport protocol (rather than running on top of UDP).

It is important to note that the server sends requested files to clients without storing any state information about the client. If a particular client asks for the same object twice in a period of a few seconds, the server does not respond by saying that it just served the object to the client; instead, the server resends the object, as it has completely forgotten what it did earlier. Because an HTTP server maintains no information about the clients, HTTP is said to be a stateless protocol(无状态协议).

Non-Persistent/Persistent Connections

Non-persistent connections(非持续连接): each request/response pair be sent over a separate TCP connection. It is the default connection Under HTTP/1.0.

Round-trip time(RTT, 往返时间), is the time it takes for a small packet to travel from client to server and then back to the client.

Now, consider what happens when a user clicks on a hyperlink. It causes the browser to initiate a TCP connection between the browser and the Web server.

This involves a three-way handshake(三次握手):

  • the client sends a small TCP segment to the server,
  • the server acknowledges and responds with a small TCP segment,
  • finally, the client acknowledges back to the server.(combined with the HTTP request message)

Roughly, the total response time is two RTTs plus the transmission time at the server of the HTML file.

non-persistent-connections

Non-persistent connections have some shortcomings:

  • a brand-new connection must be established and maintained for each requested object.
  • each object suffers a delivery delay of two RTTs - one RTT to establish the TCP connection and one RTT to request and receive an object.

Persistent connections(持续连接): all of the requests and their corresponding responses are sent over the same TCP connection. It is the default connection Under HTTP/1.1.

  • 服务器在发送响应后保持该TCP连接打开。在相同的客户与服务器之间的后续请求和响应报文能够通过相同的连接进行传送。
  • 一个完整的Web页面可以用单个TCP连接进行传送。

HTTP Message Format

HTTP Request Message(HTTP请求报文):

HTTP Request message has three sections:

  • a request line(请求行): the method field, the URL field, and the HTTP version field
  • header lines(首部行)
  • entity body(): message

General_format_of_an_HTTP_request_message

GET /somedir/page.html HTTP/1.1  \r\n
# specifies the host on which the object resides
Host: www.someschool.edu \r\n
Connection: close  \r\n
User-agent: Mozilla/5.0  \r\n
Accept-language: fr \r\n
\r\n

!!! note

CR(Carriage Return, `\r`)表示回车, LF(Line Feed, `\n`)表示换行

* Dos和Windows采用回车+换行(CR+LF)表示下一行
* UNIX/Linux采用换行符(LF)表示下一行
* MAC OS系统采用回车符(CR)表示下一行

HTTP Response Message

Response message(响应报文) has three sections:

  • a status line: protocol version field, a status code, a corresponding status message
  • header lines
  • entity body: message

General format of an HTTP response message

# the server is using HTTP/1.1, everything is ok
HTTP/1.1 200 OK \r\n  
# the server tell the client close the tcp connection after sending the message
Connection: close  \r\n  
# the time when the HTTP response was created and sent by the server
Date: Wed, 19 Dec 2017 16:01:28 GMT \r\n
# the message was generated by an Apache Web server
Server: Apache/2.4.6 \r\n
# time when the object last modified
Last-Modified: Mon, 12 Nov 2018 16:25:17 GMT \r\n
ETag: "cf-57a7a257df256" \r\n
Accept-Ranges: bytes \r\n
# number of bytes in the object being sent
Content-Length: 207 \r\n
# the object in the entity body is HTML text
Content-Type: text/html; charset=UTF-8 \r\n
 \r\n
<head>
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta http-equiv="refresh"content="0;url=http://nyu.edu/projects/keithwross/">
<title> Automatic Forwarding </title>
</head>

Idempotence

Idempotence(等幂性) is the property of certain operations that they can be applied multiple times without changing the result.

idempotence

!!! note "PUT v.s. POST"

在HTTP中,PUT被定义为idempotent,POST则不是。PUT是幂等的操作,即重复操作不会产生变化,10次PUT的创建请求与1次PUT 的创建请求相同,只会创建一个资源,其实后面9次的请求只是对已创建资源的更新,且更新内容与原内容相同,所以不会产生变化。POST的重复操作截然不同,10次POST请求将会创建10个资源。

Cookies

Problem: HTTP is stateless. A Web site can't identify users if it wishes to restrict user access or wants to serve content as a function of the user identity.

Cookies allow sites to keep track of users. A cookie technology has four components:

  • a cookie header line in the HTTP response message;
  • a cookie header line in the HTTP request message;
  • a cookie file kept on the user's end system and managed by the user's browser;
  • a back-end database at the Web site.

Cookie_technology

Suppose Susan contacts Amaozon.com for the first time: the Amazon Web server responds to Susan's browser, including in the HTTP response a Set-cookie: header, which contains the identification number.

If Susan returns to Amazon’s site, say, one week later, her browser will continue to put the header line Cookie: 1678 in the request messages.

Web Caching

A Web cache(Web缓存)—also called a proxy server(代理服务器)—is a network entity that satisfies HTTP requests on the behalf of an origin Web server. The Web cache has its own disk storage and keeps copies of recently requested objects in this storage.

Typically a Web cache is purchased and installed by an ISP(e.g. a university, or major residential ISP).

Web caching has seen deployment in the Internet for two reasons:

  • substantially reduce the response time for a client request缩短客户请求的响应时间
  • substantially reduce traffic on an institution's access link to the Internet减少机构/组织的流量, does not have to upgrade bandwidth as quickly, thereby reducing costs.

Clients_requesting_objects_through_a_Web_cache

The Conditional Get

Conditional GET(条件GET) allows a cache to verity that its objects are up to date.

An HTTP request message is a so-called conditional GET message if

  • the request message uses the GET method
  • the request message includes an If-Modified-Since: <date> header line

conditional_get

Authentication

HTTP Authentication Schemes

http_auth_demo

  • 客户端访问一个受HTTP基本认证保护的资源。

  • 服务器返回401状态码,要求客户端提供用户名和密码进行认证。响应头会加上WWW-Authenticate: Basic realm="请求域"。)

    HTTP/1.1 401 Unauthorized\r\n
    Date: Thu, 20 Dec 2018 05:28:27 GMT\r\n
    Server: Apache/2.4.6 (CentOS)\r\n
    WWW-Authenticate: Basic realm="wireshark-students only"\r\n
    Content-Length: 381\r\n
    Keep-Alive: timeout=5, max=100\r\n
    Connection: Keep-Alive\r\n
    Content-Type: text/html; charset=iso-8859-1\r\n
    \r\n
  • 客户端将输入的用户名密码用Base64进行编码后,采用非加密的明文方式传送给服务器。

    GET /wireshark-labs/protected_pages/HTTP-wiresharkfile5.html HTTP/1.1\r\n
    Host: gaia.cs.umass.edu\r\n
    Connection: keep-alive\r\n
    Authorization: Basic d2lyZXNoYXJrLXN0dWRlbnRzOm5ldHdvcms=\r\n
    Upgrade-Insecure-Requests: 1\r\n
    User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_0)
    Accept: text/html,application/xhtml+xml,application/xml;
            q=0.9,image/webp,image/apng,*/*;q=0.8\r\n
    Accept-Encoding: gzip, deflate\r\n
    Accept-Language: zh-CN,zh;q=0.9,en;q=0.8\r\n
    \r\n
  • 服务器将Authorization头中的用户名密码解码并取出,进行验证,如果认证成功,则返回相应的资源。如果认证失败,则仍返回401状态,要求重新进行认证。

HTTP2.0

版本 产生时间 内容 发展现状
HTTP/1.0 1996年 传输内容格式不限制,增加PUT、PATCH、HEAD、 OPTIONS、DELETE命令
HTTP/1.1 1997年 持久连接(长连接)、节约带宽、HOST域、管道机制、分块传输编码
HTTP/2 2015年 多路复用、服务器推送、头信息压缩、二进制协议等

HTTPS

HTTPS(HyperText Transfer Protocol over Secure Socket Layer)是身披SSL外壳的HTTP。HTTPS是一种通过计算机网络进行安全通信的传输协议,经由HTTP进行通信,利用SSL/TLS建立全信道,加密数据包。HTTPS使用的主要目的是提供对网站服务器的身份认证,同时保护交换数据的隐私与完整性。

3 DNS

识别主机有两种方式 -- 通过主机名(hostname)和IP地址。人们喜欢便于记忆的主机名标识方式,而路由器则喜欢IP地址。DNS(域名系统,Domain Name System)的主要任务是进行主机名到IP地址的转换。DNS是

  • 一个由分层的DNS服务器实现的分布式数据库;
  • 一个使得主机能够查询分布式数据库的应用层协议;

除了进行主机名到IP地址的转换外,DNS还可以

  • 主机别名(host aliasing): 主机可以拥有一个或多个别名;
  • 复杂分配(load distribution): 用于在冗余的服务器之间进行负责分配。

Overview of How DNS Works

Suppose that some application running in a user's host needs to translate a hostname to an IP address:

  • The application will invoke the client side of DNS, specifying the hostname that needs to be translated.
  • DNS in the user’s host then takes over, sending a query message into the network. All DNS query and reply messages are sent within UDP datagrams to port 53.
  • After receiving a DNS reply message, the desired mapping is then passed to the invoking application.
A Distributed, Hierarchical Database

分布式、层次数据库

The DNS uses a large number of servers, organized in a hierarchical fashion and distributed around the world. There are three classes of DNS servers

  • root DNS servers 根域名服务器
    • 13 root DNS servers in 2006.
  • top-level domain(TLD) DNS servers 顶级域名服务器
    • responsible for top-level domain, such as com, org, net, edu
  • authoritative DNS servers 权威域名服务器
    • an organization can choose to implement its own authoritative DNS server (e.g. universities and large companies)
    • or pay to some service provider

portion of the hierarchy of DNS servers

There is another important type of DNS server called the local DNS server本地域名服务器.

  • it does not strictly belong to the hierarchy of servers
  • but is nevertheless central to the DNS architecture.
  • each ISP (e.g. university, company, residential ISP) has a local DNS server.

When a host makes a DNS query, the query is sent to the local DNS server, which acts a proxy, forwarding the query into the DNS server hierarchy.

Suppose the host cis.poly.edu desires the IP address of gaia.cs.umass.edu:

  1. the host sends a DNS query to its local DNS server dns.poly.edu
  2. the local DNS server forwards the query message to a root DNS server
  3. the root DNS server takes note of the edu suffix and returns to the local DNS server a list of IP addresses for TLD servers responsible for edu.
  4. the local DNS server then resends the query message to one of these TLD servers.
  5. the TLD server takes note of the umass.edu suffix and responds with the IP address of the authoritative DNS server for the University of Massachusetts, namely, dns.umass.edu.
  6. the local DNS server resends the query message directly to dns.umass.edu
  7. the authoritative DNS sever for dns.umass.edu responds with the IP address of gaia.cs.umass.edu.

Note that in this example, in order to obtain the mapping for one hostname, eight DNS messages were sent: four query messages and four reply messages! We’ll see how DNS caching reduces this query traffic in next section.

Interaction_of_the_various_DNS_servers

In theory, any DNS query can be iterative or recursive. For example, Figure below shows a DNS query chain for which all of the queries are recursive.

In practice, the queries typically follow the pattern in Figure above: The query from the requesting host to the local DNS server is recursive, and the remaining queries are iterative.

recursive_queirs_in_DNS

DNS Caching

DNS缓存

In a query chain, When a DNS server receives a DNS reply, it can cache the mapping in its local memory.

If a hostname/IP address pair is cached in a DNS server and another query arrives to the DNS server for the same hostname, the DNS server can provide the desired IP address, even if it is not authoritative for the hostname.

Because hosts and mappings between hostnames and IP addresses are by no means permanent, DNS servers discard cached information after a period of time (often set to two days).

DNS records and Messages

共同实现DNS分布式数据库的所有DNS服务器存储了资源记录(Resource Record, RR), RR提供了主机名到IP地址的映射。资源记录是一个包含了以下字段的4元组:

(Name, Value, Type, TTL)

TTL是资源记录的生存时间,它决定了资源记录应当从缓存中删除的事件。

  • A记录: Type=A, Name是主机名,Value是该主机名对应的IP地址。提供主机名到IP地址的映射
  • NS记录:Type=NS,Name是个域(e.g. edu.cn),Value是个知道如何获得该域名中主机IP地址的权威域名服务器的主机名。用于表明由哪台服务器对该域名进行解析
  • CNAME记录,Type=CNAME,Name是别名,Value是别名为Name的主机对应的规范主机名。将域名指向另外一个域名。

4 FTP

FTP是在两个相连的计算机之间进行文件传输时使用的协议,它使用两条TCP连接:

  • 控制连接(control connection):
    • 用于在两主机之间传输控制信息,如用户标识、口令、改变远程目录的命令以及PUT和GET文件的命令
    • 贯穿了整个用户会话期间
    • TCP21号端口
    • 通过ASCII码发送请求和接收应答
  • 数据连接(data connection):
    • 当进行GET, PUT, List等操作时,每次都会建立用于数据传输的TCP连接,当数据传送完毕之后,连接也会被断开
    • 会话中的每一次文件传输都需要建立一个新的数据连接
    • TCP20号端口

FTP服务器必须在整个会话期间保留用户的状态(state)。特别是,服务器必须把特定的用户账户与控制连接联系起来,随着用户在远程目录上徘徊,服务器必须追踪用户在远程目录树上的当前位置。

4 Socket Programming

UDP编程

import socket
# 要访问的服务器地址和端口号
serverName = socket.gethostname()
serverPort = 8080
# AF_INET指定了IPv4, SOCK_DGRAM指定了UDP协议
serverSocket = socket.socket(socket.AF_INEF, socket.SOCK_DGRAM)
serverSocket.bind((serveName, serverPort))
print "Ready..."
while True:
    message, clientAddress = serverSocket.recvfrom(2048)
    modifiedMessage = message.upper() # 全部转为大写
    serverSocket.sendto(modifiedMessage, clientAddress)
import socket
# 要访问的服务器地址和端口号
serverName = socket.gethostname()
serverPort = 7000
# AF_INET指定了IPv4, SOCK_DGRAM指定了UDP协议
clientSocket = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
message = input('Input lowercase sentence:')
# 发送数据,使用UDP发送数据的时候,每一次都需要指定IP地址和端口
clientSocket.sendto(message.encode(), (serverName, serverPort))
# modifiedMessage存响应报文,serverAddress存响应源地址, 缓存长度2048
modifiedMessage, serverAddress = clientSocket.recvfrom(2048)
print(modifiedMessage)
# 关闭连接
clientSocket.close()

!!! note "socket.listen"

In `socket.listen([backlog])`, `backlog` specifies the number of unaccepted connections that the system will allow before refusing new connections.也就是说如果backlog = 5, 连接数为20, 那么系统一开始可以允许5个客户端连接(未接受连接数为15),每次与客户端建立连接以后(未接受连接数减1),就可以允许额外的连接。

机理:Linux内核协议栈为一个tcp连接管理使用两个队列,一个是半连接队列(syns queue, 用来保存处于SYN_SENT和SYN_RECV状态的请求),一个是全连接队列(accept queue, accpetd状态)。`backlog`参数即是半连接队列的大小。

![](figures/syns_queue_accept_queue.png)

TCP编程

如果想要完成⼀个使用TCP协议的服务器的功能,需要的流程如下:

  1. socket(): 创建⼀个套接字
  2. bind(): 绑定ip和port
  3. listen(): 将主动套接字转换为监听套接字,该套接字可以接受来自客户端的连接请求
  4. accept(): 等待来自客户端的连接请求
  5. recvfrom(),send(): 接收发送数据

import socket
# 要访问的服务器地址和端口号
serverName = socket.gethostname()
serverPort = 8080
# 创建socket,AF_INET指定了IPv4, SOCK_STREAM指定了TCP协议
serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# 绑定地址和端口
serverSocket.bind((serverName, serverPort)
# 将主动套接字转化为监听套接字,设置可接受的TCP连接数为1
serverSocket.listen(1)
print("Ready...")
while True:
    # 套接字等待连接
    # 如果有新的客户端来连接服务器,那么就产生一个新的套接字connectionSocket
    # 专门为这个客户端服务, serverSocket专门等待其他新客户端的连接
    connectionSocket, addr = serverSocket.accpet()
    # 接收客户端发送的数据,最大接收2048字节
    message = connectionSocket.recvfrom(2048)
    modifiedMessage = message.upper() # 全部转为大写
    connectionSocket.send(modifiedMessage)
    # 关闭套接字
    connectionSocket.close()
# 关闭套接字
serverSocket.close()

客户端程序在连接套接字后,就可以直接发送数据。

import socket
# 要访问的服务器地址和端口号
serverName = socket.gethostname()
serverPort = 8080
# AF_INET指定了IPv4, SOCK_STREAM指定了TCP协议
clientSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# 创建TCP连接
clientSocket.connect((serverName, serverPort))
# 发送数据
message = input('Input lowercase sentence:')
clientSocket.send(message.encode())
# modifiedMessage存响应报文,serverAddress存响应源地址, 缓存长度2048
modifiedMessage = clientSocket.recvfrom(2048)
print(modifiedMessage)
# 关闭连接
clientSocket.close()

6 并发服务器

多进程/多线程服务器

利用multiprocessing.Process()threading.Thread()构建多进程和多线程服务器。

import multiprocessing
import socket
import sys

serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.bind(('', 7790))
serverSocket.listen(5)


def deal_with_client(new_socket, addr):
	while True:
		receive_message = newSocket.recv(2048)
		if receive_message:
			print("receive message:{0} from {1}"
			         .format(receive_message, addr))
			new_socket.send(receive_message.upper())
		else:
			print("The Client {0} has closed.".format(addr))
			break
	new_socket.close()


while True:
	try:
		newSocket, addr = serverSocket.accept()
		client_process = multiprocessing.Process(
		      target=deal_with_client, args=(newSocket, addr))
		client_process.start()
		# 因为已经向⼦进程中copy了⼀份(引⽤),并且⽗进程中这个套接字也没有⽤处了
		newSocket.close()
	except KeyboardInterrupt:
		serverSocket.close()
		sys.exit()
import threading
import socket
import sys

serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.bind(('', 7790))
serverSocket.listen(5)


def deal_with_client(new_socket, addr):
	while True:
		receive_message = newSocket.recv(2048)
		if receive_message:
			print("receive message:{0} from {1}"
			         .format(receive_message, addr))
			new_socket.send(receive_message.upper())
		else:
			print("The Client {0} has closed.".format(addr))
			break
	new_socket.close()


while True:
	try:
		newSocket, addr = serverSocket.accept()
		client_thread = threading.Thread(
		      target=deal_with_client, args=(newSocket, addr))
		client_thread.start()
		# 因为线程中共享这个套接字,如果关闭了会导致这个套接字不可⽤,
		# 但是此时在线程中这个套接字可能还在收数据,因此不能关闭
		# newSocket.close()
	except KeyboardInterrupt:
		serverSocket.close()
		sys.exit()

单线程服务器-非阻塞模式

通过设置socket.setblocking(False)将套接字设置为非阻塞。

import socket
socket_list = [] # 用来存储所有的连接的socket
serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.bind(('', 7792))
serverSocket.listen(5)

# 将套接字设置为⾮堵塞non-blocking
# 设置为⾮堵塞后,如果accept时,恰巧没有客户端connect,
# 那么accept会产⽣⼀个异常,所以需要try来进⾏处理
serverSocket.setblocking(False)

try:
	while True:
		try:
			newSocket, addr = serverSocket.accept()
		except BlockingIOError:
			pass
		else:
			print("new connection:" + str(addr))
			newSocket.setblocking(False)
			socket_list.append((newSocket, addr))

		for socket, addr in socket_list:
			try:
				message = socket.recv(1024)
				if message:
					print("receive:" + str(message))
					socket.send(message.upper())
				else:
					print("Client {0} has closed".format(addr))
					socket.close()
					socket_list.remove((socket, addr))
			except BlockingIOError:
				pass
except KeyboardInterrupt:
	serverSocket.close()
	exit()

单进程服务器-select版

并发服务器除了可以用多线程和多进程实现以外,还可以使用select实现并发。在Python中,可以使用Select包中的select()函数实现:

# select的函数声明
select.select(rlist, wlist, xlist[, timeout])

select.select()是Unix系统调用select的直接接口。它的参数列表:

  • rlist: 等待直到准备读取
  • wlist: 等待直到准备写入
  • xlist: 等待异常

Socket在操作系统中视为文件描述符,这些文件描述符被放在一个数组中,然后select调用的时候遍历这个数组,如果对应的文件描述符可读/写/错误则会返回该文件描述符。当遍历结束之后,如果仍然没有一个可读/写/错误的文件描述符,select会让用户进程睡眠,直到资源可用的时候唤醒,再次遍历数组。

select使用了I/O多路复用:

I/O多路复用的特点是通过一种机制一个进程能同时等待多个文件描述符,而这些文件描述符(套接字描述符)其中的任意一个进入读就绪状态,select()函数就可以返回。

import select
import socket
import sys
import re

serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.bind(('', 7788))
serverSocket.listen(5)

inputs = [serverSocket, sys.stdin]

while True:
	# 调用select函数,阻塞等待
	readable, writable, exceptional = select.select(inputs, [], [])

	# 数据抵达,循环
	for sock in readable:
		# 监听到有新的连接
		if sock == serverSocket:     # 服务器收到连接请求
			newSocket, addr = serverSocket.accept()   # 建立连接
			inputs.append(newSocket)        # 将套接字保存

		# 监听到键盘有输入
		elif sock == sys.stdin:
			serverSocket.close()
			exit()

		# 有数据到达
		else:
			# 读取客户端连接发送的数据
			data = sock.recv(2048)
			print("receive:{0}".format(data))
			if data:  # 客户端发送了数据
				sock.send(data.upper())
			else:   # 客户端关闭了连接
				# 移除select监听的socket
				inputs.remove(sock)
				print_info = re.search(r'raddr=\(.+\)', str(sock)).group()
				print("Client {0} has closed.".format(print_info))
				sock.close()

⽬前⼏乎在所有的平台上都⽀持select,良好跨平台⽀持是它的⼀个优点。select的⼀个重大缺点在于单个进程能够监视的⽂件描述符的数量存在最⼤限制,在Linux上⼀般为1024(位)或2048(64位),可以通过修改宏定义甚⾄重新编译内核的⽅式提升这⼀限制,但是这样也会造成效率的降低。

单进程服务器-epoll版

在select实现(1983)以后,基于同样原理的poll也成功实现(1997),只不过poll允许无限个连接。但是还是不够高效,因为必须遍历所有文件描述符,来确定I/O事件的发生。

而epoll很好地克服了select和poll的缺点。当I/O事件发生时,epoll能告诉进程哪个连接有I/O事件,然后进程就可以去处理这个I/O流。

!!! note

只有Linux支持epoll,BSD内核对应的是kqueue,在windows下对应的是iocp。

在Python中,可以用select模块的epoll()函数实现epoll。以下是通过select.kqueue()的实现:

import select
import socket

serverSocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
serverSocket.bind(('', 7792))
serverSocket.listen(5)

# mac只支持kqueue()
kq = select.kqueue()

# 生成kevents列表,监听socket
events = [select.kevent(serverSocket.fileno(),
                        select.KQ_FILTER_READ, select.KQ_EV_ADD)]

# 连接列表
connection_list = {}
connection_index = 0

try:
	while True:
		try:
			# 如果有可执行的kevent,则返回对应的kevent列表
			# 2表示同时允许的事件为2
			active_events = kq.control(events, 2)
		except select.error as e:
			pass

		if active_events:
			for event in active_events:
				# 有客户端连接请求
				# event.ident是文件描述符
				if event.ident == serverSocket.fileno():
					newSocket, addr = serverSocket.accept()
					connection_list[connection_index] = (newSocket, addr)
					events.append(select.kevent(newSocket.fileno(), \
					   select.KQ_FILTER_READ, select.KQ_EV_ADD,\
					   udata=connection_index))
					connection_index += 1
				else:
					# 有客户端发送数据请求
					newSocket, addr = connection_list[event.udata]
					message = newSocket.recv(1024)
					if message:
						print("Recevie message: " + str(message))
						newSocket.send(message.upper())
					else:
						print("The Client {0} has closed".format(addr))
						events.remove(select.kevent(newSocket.fileno(), \
						select.KQ_FILTER_READ, select.KQ_EV_ADD, udata=event.udata))
						del(connection_list[event.udata])
						newSocket.close()
except KeyboardInterrupt:
	serverSocket.close()
	exit()

协程实现

Python中的greenlet模块实现了协程,但是还需要人工切换。使用gevent模块能够自动切换协程,当一个套接字遇到IO操作时,就自动切换到其他的套接字,等到IO操作完成,再在适当的时候切换回来继续执行。

import gevent
from gevent import socket, monkey
# 打猴子补丁, 让你方便的导入非阻塞的模块,不需要特意的去引入。
monkey.patch_all()
def handle_request(new_socket):
	while True:
		message = new_socket.recv(1024)
		if not message:
			new_socket.close()
			break
		print("Receive:", message)
		new_socket.send(message)

def server(port):
	server_socket = socket.socket()
	server_socket.bind(('', port))
	server_socket.listen(5)
	while True:
		new_socket, addr = server_socket.accept()
		gevent.spawn(handle_request, new_socket)

if __name__ == "__main__":
	server(7790)