index.json

[{"categories":["Golang"],"content":"前言  目前使用go的工程越来越多,每个工程都有很多重复的逻辑,需要把重复的进行提炼统一起来,就在bitbucket上新建了一个新的仓库frame/goframe.git,后续所有通用的基础组件都会放在goframe库中,由于库是私有的,就需要支持私有的go mod.  下面主要说明为了支持go工程能引用私有的go mod,而需要做的配置. ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:1:0","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"工作环境设置(个人) ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:2:0","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"DNS设置  内网DNS地址: 172.16.30.243, 内网域名: go.goingint.io windows windows设置内网dnswindows \" windows设置内网dns linux  编辑文件/etc/resolv.conf,新增一行nameserver 172.16.30.243 linux设置内网dnslinux \" linux设置内网dns 测试  可以使用命令ping go.goingint.io来测试dns是否生效,如果能ping通,就表示已经生效了 测试dns是否生效ping \" 测试dns是否生效 ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:2:1","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"配置git 生成密钥  使用ssh-keygen工具来生成公私钥,使用命令ssh-keygen -t rsa -C \"steven.zhou@1quant.com\" -f private,如图: 生成密钥keygen \" 生成密钥 注意: -C \"steven.zhou@1quant.com\"换成自己的邮箱 -f private,private是文件名,生成的文件要放在~/.ssh目录下(windows下目录为C:\\Users\\Admin\\.ssh) 配置多个SSH key  在目录~/.ssh下(windows下目录为C:\\Users\\Admin\\.ssh),生成文件config,注意文件名必须为config. 文件内容如下: config文件内容config \" config文件内容 注意: 如果在执行命令git pull时报错,信息如下: Bad owner or permissions on /home/admin/.ssh/config. 解决方案: 修改~/.ssh/config的权限,运行命令chmod 0600 ~/.ssh/config 最终目录~/.ssh下的文件情况: ssh目录下的文件ssh \" ssh目录下的文件 注意文件id_rsa和id_rsa.pub是之前配置bitbucket用的. 配置bitbucket  查看文件private.pub文件的内容: 公钥内容pub \" 公钥内容 把以上内容粘贴到bitbucket中,如下图: bitbucket中添加公钥bitbucket \" bitbucket中添加公钥 配置git config  使用如下命令: git config --global url.\"git@go.goingint.io:\".insteadOf \"http://go.goingint.io/\" ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:2:2","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"配置golang环境变量  需要设置私有仓库,如下: go env -w GOINSECURE=\"go.goingint.io\" go env -w GOPRIVATE=\"go.goingint.io\" ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:2:3","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"后端相关设置 ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:3:0","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"域名配置  由于内网的bitbucket的地址是172.16.30.215:7999,域名go.goingint.io实际指向的nginx,然后通过nginx转发到bitbucket. nginx就需要支持转发ssh,如下配置: stream { upstream ssh { server 172.16.30.215:7999; } server { listen 22; proxy_pass ssh; proxy_connect_timeout 1h; proxy_timeout 1h; } } 在使用命令go mod tidy或go mod download下载依赖时,会调用GET /frame/goframe?go-get=1(可查看源码,src/cmd/go/internal/vcs/vcs.go:urlForImportPath),需要配置http,如下: server { listen 80; server_name go.goingint.io; if ($args ~* \"^go-get=1\") { set $condition goget; } if ($uri ~ ^/([a-zA-Z0-9_-]+)/([a-zA-Z0-9_-]+)$) { set $condition \"${condition}path\"; } if ($condition = gogetpath) { return 200 \"\u003c!DOCTYPE html\u003e\u003chtml\u003e\u003chead\u003e\u003cmeta content='go.goingint.io/$1/$2 git http://go.goingint.io/$1/$2.git' name='go-import'\u003e\u003c/head\u003e\u003c/html\u003e\"; } location / { proxy_pass http://172.16.30.215:7999/; } } 在nginx中对请求/frame/goframe?go-get=1直接进行拦截. 注意: 配置文件中的正则表达式可根据需要调整,还要把go.goingint.io换成自己的域名. ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:3:1","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"goframe库设置  由于库的路径为/frame/goframe,在初始化时需要使用命令: go mod init go.goingint.io/frame/goframe,go.mod内容: module go.goingint.io/frame/goframe go 1.16 ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:3:2","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"引用private mod  新建测试工程,go mod init example,然后新建main.go文件: package main import ( \"fmt\" \"go.goingint.io/frame/goframe/core/mq\" ) func main() { bus, err := mq.NewRabbitMQ(\"test\", mq.WithRabbitMQUrl(\"amqp://guest:guest@172.16.60.12:45672/quote\")) if err != nil { fmt.Printf(\"mq init err: %s\\n\", err.Error()) return } fmt.Print(\"mq init successfully\\n\") defer bus.Close() } 然后使用go mod tidy引入goframe私有包. ","date":"2021-11-20","objectID":"/2021/11/20/golang-private-mod/:4:0","tags":["go"],"title":"Golang Private Mod","uri":"/2021/11/20/golang-private-mod/"},{"categories":["Golang"],"content":"主要是描述Go语言调用易盛行情API的动态库过程中，一些需要注意的地方。 ","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:0:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"类型对应关系 C类型 调用方法 Go类型 字节数 char C.char byte 1 signed char C.schar int8 1 unsigned char C.uchar uint8 1 short int C.short int16 2 short unsigned int C.ushort uint16 2 int C.int int 4 unsigned int C.uint uint32 4 long int C.long int32 or int64 4 long unsigned int C.ulong uint32 or uint64 4 long long int C.longlong int64 8 long long unsigned int C.ulonglong uint64 8 float C.float float32 4 double C.double float64 8 wchar_t C.wchar_t 2 void * unsafe.Pointer ","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:1:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"导出函数 采用C++风格编译时,函数名会加上修饰符,会导致CGO查找不到对应的函数.所以必须采用C风格编译,这样函数名才会保持不变. 如下,就是采用C风格编译. #ifdef __cplusplus extern \"C\"{ #endif int Login(char* authCode, char* logPath, char* ip, int port, char* userName, char* passwd); ...... #ifdef __cplusplus } #endif 使用命令nm libesunny.so来查看动态库的符号表,可以看到函数Login的符号信息为0000000000001fb0 T Login,与名字是一样. ","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:2:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"字符串 在C中用结尾带'\\0’的字符数组来表示字符串的,而在Golang中string类型是原生类型,因此两种语言互操作时是需要进行字符串类型转换的. 通过C.String函数可以将Golang中的string类型转换为C的字符串类型,再传给C函数使用. // Golang中的字符串. s := \"hello cgo\\n\" // 转换为C的字符串. cs := C.String(s) // 传给C函数Print使用. C.Print(cs) 需要注意,转换后的cs并不能由Golang的GC所管理,必须手动释放cs所占用的内存,即必须显示调用C.free. // Golang中的字符串. s := \"hello cgo\\n\" // 转换为C的字符串. cs := C.String(s) // 传给C函数Print使用. C.Print(cs) // 显示释放cs的内存. C.free(cs) ","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:3:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"C中struct的定义 第一种方式 定义如下结构体. // 品种编码结构. struct ESunnyCommodity { char* ExchangeNo; ///\u003c 交易所编码 char CommodityType; ///\u003c 品种类型 char* CommodityNo; ///\u003c 品种编号 }; 若在其它结构体或函数中引用了上述ESunnyCommodity结构体,则必须显示申明为struct. // 品种信息. struct ESunnyCommodityInfo { struct ESunnyCommodity Commodity; ///\u003c 品种 char* CommodityName; ///\u003c 品种名称,GBK编码格式 char* CommodityEngName; ///\u003c 品种英文名称 double ContractSize; ///\u003c 每手乘数 double CommodityTickSize; ///\u003c 最小变动价位 int CommodityDenominator; ///\u003c 报价分母 char CmbDirect; ///\u003c 组合方向 int CommodityContractLen; ///\u003c 品种合约年限 char IsDST; ///\u003c 是否夏令时,'Y'为是,'N'为否 struct ESunnyCommodity RelateCommodity1; ///\u003c 关联品种1 struct ESunnyCommodity RelateCommodity2; ///\u003c 关联品种2 }; int QryCommodity(struct ESunnyCommodityInfo** info, int* len); 在上述引用的两个例子中,必须在ESunnyCommodityInfo前面显示说明为struct.否则在编译Golang代码时会报错error: unknown type name 'ESunnyCommodityInfo' 第二种方式 结构体采用如下方式定义. // 品种编码结构. typedef struct { char* ExchangeNo; ///\u003c 交易所编码 char CommodityType; ///\u003c 品种类型 char* CommodityNo; ///\u003c 品种编号 }ESunnyCommodity; 其它结构体或函数可以直接引用ESunnyCommodity. // 品种信息. struct ESunnyCommodityInfo { ESunnyCommodity Commodity; ///\u003c 品种 char* CommodityName; ///\u003c 品种名称,GBK编码格式 char* CommodityEngName; ///\u003c 品种英文名称 double ContractSize; ///\u003c 每手乘数 double CommodityTickSize; ///\u003c 最小变动价位 int CommodityDenominator; ///\u003c 报价分母 char CmbDirect; ///\u003c 组合方向 int CommodityContractLen; ///\u003c 品种合约年限 char IsDST; ///\u003c 是否夏令时,'Y'为是,'N'为否 ESunnyCommodity RelateCommodity1; ///\u003c 关联品种1 ESunnyCommodity RelateCommodity2; ///\u003c 关联品种2 }; int QryCommodity(ESunnyCommodityInfo** info, int* len); ","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:4:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"Golang中引用C语言 头文件和库的引用 package wrapper /* #cgo CFLAGS: -I../esunny/include #cgo LDFLAGS: -L../esunny/lib -lesunny -lTapQuoteAPI #include \u003cstdlib.h\u003e #include \"export.h\" #include \"tap.h\" */ import \"C\" import ( \"encoding/json\" \"errors\" \"quote/source/tap/config\" \"time\" \"unsafe\" \"github.com/tal-tech/go-zero/core/logx\" \"github.com/tal-tech/go-zero/core/threading\" ) 如上所述,在package下面通过注释的方式引入C. #cgo CFLAGS: -I../esunny/include,表示引用的C头文件的目录. #cgo LDFLAGS: -L../esunny/lib -lesunny -lTapQuoteAPI,表示引用的C动态库的路径及名字. #include \u003cstdlib.h\u003e,引入系统头文件,如果引用了函数C.free,就需要该头文件. #include \"export.h\",表示引用的动态库对应的头文件. import \"C\",必须紧跟在注释后面,中间不能有空行,否则会报错. 调用C函数 func (api *ESApi) SubscribeTick() error { for i := range api.contractDatas { // 把golang的string类型转换为C的char* cExchangeNo := C.CString(api.contractDatas[i].Contract.ExchangeNo) cCommoditNo := C.CString(api.contractDatas[i].Contract.CommodityNo) cContractNo := C.CString(api.contractDatas[i].Contract.ContractNo1) // C.char把golang的byte转换为C的char ret := C.SubscribeTick(cExchangeNo, cCommoditNo, C.char(api.contractDatas[i].Contract.CommodityType), cContractNo) // 调用C.free释放内存. C.free(unsafe.Pointer(cExchangeNo)) C.free(unsafe.Pointer(cCommoditNo)) C.free(unsafe.Pointer(cContractNo)) if ret != 0 { logx.Errorf(\"subscrbe tick failed: %d, exchangeNo: %s, commoditNo: %s, contractNo: %s, commoditType: %s\", ret, api.contractDatas[i].Contract.ExchangeNo, api.contractDatas[i].Contract.CommodityNo, api.contractDatas[i].Contract.ContractNo1, string(api.contractDatas[i].Contract.CommodityType)) continue } logx.Infof(\"subscrbe tick successfully, exchangeNo: %s, commoditNo: %s, contractNo: %s, commoditType: %s\", api.contractDatas[i].Contract.ExchangeNo, api.contractDatas[i].Contract.CommodityNo, api.contractDatas[i].Contract.ContractNo1, string(api.contractDatas[i].Contract.CommodityType)) } return nil } 访问C的结构体数组 C函数原型: /** * 查询品种信息,在外部要释放info对应的内存. * @param out info 品种信息数组. * @param out len 数组长度. * * @return 错误码,0-表示成功. */ int QryCommodity(struct ESunnyCommodityInfo** info, int* len); Golang调用代码: func (api *ESApi) QryCommodity() error { // 查询品种. api.commodityDatas = nil // 定义C中结构体ESunnyCommodityInfo的指针变量. var commodityInfo *C.struct_ESunnyCommodityInfo // 定义C中的int变量 var commodityLen C.int // 调用C函数. ret := C.QryCommodity(\u0026commodityInfo, \u0026commodityLen) if ret != 0 { logx.Errorf(\"qry commodity err: %d\", ret) return errors.New(\"query commodity failed\") } // 把C中的int变量强制类型转换为golang中的int. cLen := int(commodityLen) // 强制转换为uintptr,后续需要做指针运算. commodityPointer := uintptr(unsafe.Pointer(commodityInfo)) api.commodityDatas = make([]SourceCommodityInfo, 0, cLen) for i := 0; i \u003c cLen; i++ { // 强制转换为C的结构体对象. info := *(*C.struct_ESunnyCommodityInfo)(unsafe.Pointer(commodityPointer)) // 指针运算,指向下一个结构体对象. commodityPointer += unsafe.Sizeof(info) data := SourceCommodityInfo{ Commodity: SourceCommodity{ // 把C的char*转换为golang的string类型. ExchangeNo: C.GoString(info.Commodity.ExchangeNo), ///\u003c 交易所编码 // 把C的char转换为golang的byte类型. CommodityType: byte(info.Commodity.CommodityType), ///\u003c 品种类型 CommodityNo: C.GoString(info.Commodity.CommodityNo), ///\u003c 品种编号 }, ///\u003c 品种 CommodityName: C.GoString(info.CommodityName), ///\u003c 品种名称,GBK编码格式 CommodityEngName: C.GoString(info.CommodityEngName), ///\u003c 品种英文名称 // 把C的double转换为golang的float64类型. ContractSize: float64(info.ContractSize), ///\u003c 每手乘数 CommodityTickSize: float64(info.CommodityTickSize), ///\u003c 最小变动价位 // 把C的int转换为golang的int类型. CommodityDenominator: int(info.CommodityDenominator), ///\u003c 报价分母 CmbDirect: byte(info.CmbDirect), ///\u003c 组合方向 CommodityContractLen: int(info.CommodityContractLen), ///\u003c 品种合约年限 IsDST: byte(info.IsDST), ///\u003c 是否夏令时,'Y'为是,'N'为否 RelateCommodity1: SourceCommodity{ ExchangeNo: C.GoString(info.RelateCommodity1.ExchangeNo), ///\u003c 交易所编码 CommodityType: byte(info.RelateCommodity1.CommodityType), ///\u003c 品种类型 CommodityNo: C.GoString(info.RelateCommodity1.CommodityNo), ///\u003c 品种编号 }, ///\u003c 关联品种1 RelateCommodity2: SourceCommodity{ ExchangeNo: C.GoString(info.RelateCommodity2.ExchangeNo), ///\u003c 交易所编码 CommodityType: byte(info.","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:5:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"C语言中引用Golang 易盛行情API中涉及实时行情推送,需要有对应的回调函数. C中定义回调函数原型: #ifdef __cplusplus extern \"C\" { #endif /** * 外部函数,仅定义原型,接收tick数据回调. * @param in info tick数据. * */ extern void OnTick(struct ESunnyTickWhole* info); /** * 外部函数,仅定义原型,api断开连接回调. * @param in errorCode 错误码. * */ extern void OnDisconnect(int errorCode); #ifdef __cplusplus } #endif Golang中函数实现: //export OnTick func OnTick(info *C.struct_ESunnyTickWhole) { } //export OnDisconnect func OnDisconnect(errorCode C.int) { } 如上,采用//export方式来申明函数是导出的.注意//和export之间不能有空格. ","date":"2021-07-24","objectID":"/2021/07/24/golang-c-call/:6:0","tags":["go","cgo"],"title":"Golang与C语言相互调用","uri":"/2021/07/24/golang-c-call/"},{"categories":["Golang"],"content":"字符编码  计算机世界中只能识别二进制,所有的信息最终都会表示成一个二进制的字符串,每一个二进制位有0和1两种状态.  当在计算机中存储字符’A’时,是如何存储的?在读取时又是如何还原为字符’A’的列?比如可以把’A’存储为0100 0001,然后在读取时把0100 0001还原为’A',0100 0001与’A’的对应关系应该是唯一的,这种唯一映射规则就是编码.通过这种编码规则可以把字符映射到唯一的一种状态(二进制字符串).  而最早出现的编码规则是ASCII码,在ASCII编码规则中,字符’A’对应的是0100 0001. ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:1:0","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"ASCII码  在上世纪60年代,美国制定了一套字符编码,对英语字符与二进制位之间的关系,做了统一规定,这就是ASCII编码.  ASCII码规定了128个字符的编码,比如空格SPACE是32(二进制00100000),字母’A’是65(二进制0100 0001).这128个字符(包括32个不能打印出来的控制符号),只占用一个字节的后7位,最前面一位统一为0. ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:1:1","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"UNICODE  英语用128个符号编码足够了,但世界上还有很多其它语言,128是远远不够的,如在法语中,字母上方有注音符号,它就无法用ASCII码表示.后来ISO组织制定了一些扩展规则,利用字节中闲置的最高位编入新的符号.如法语中é的编码为130(二进制10000010),这样可以表示最多256个符号.  至于亚洲国家的文字,使用的符号就更多了,汉字就多达10万左右.使用一个字节是远远不够的,必须使用多个字节来表示一个符号.比如后来的GB2312、BIG5、GBK编码规则就是定义的多个字节来表述中文符号.  不同的国家/组织采用不同的编码规则,导致世界上存在很多编码规则,同一个二进制串可以被解释成不同的符号.因此,在打开一个文件时就必须知道它的编码方式,否则用错误的编码方式去解读,就会出现乱码.以前的邮件经常出现乱码,就是因为发件人和收件人使用的编码方式不一样.  就需要有一种编码,将世界上所有的符号都纳入其中,每一个符号都给予独一无二的编码,那么乱码就会消失.这就是UNICODE,现在的规模可以容纳100多万个符号.每个符号的编码不一样,具体的符号对应表,可以查询unicode.org或专门的汉字对应表  Unicode是指一张表,里面包含了可能出现的所有字符,每个字符对应一个数字,这个数字称为码点(Code Point),如字符’H’的码点为72(十进制),字符’李’的码点为26446(十进制).Unicode表包含了1114112个码点，即从000000(十六进制) - 10FFFF(十六进制).地球上所有字符都可以在Unicode表中找到对应的唯一码点.点击这里,可查询字符对应的码点.  需要注意,Unicode只是个符号集,它只规定了符号的二进制代码,却没有规定这个二进制代码应该如何存储  比如汉字周的Unicode码点是5468,用二进制表示为101 0100 0110 1000,总共有15位,这个符号至少需要两个字节.表示其它更大的符号,可能需要3个字节或4个字节.这里引申出两个严重的问题: 计算机怎么知道两个字节表示一个符号,而不是分别表示两个符号? 英文字母只需要一个字节就可以表示,如果Unicode统一规定,每个符号用三个字节或四个字节表示,那每个英文字母前面必然有二到三个字节都是0,这对于存储是极大的浪费. ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:1:2","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"UTF-8  互联网的普及,强烈要求出现一种统一的编码方式.UTF-8就是在互联网上使用最广的一种Unicode的实现方式.其它实现方式还包括UTF-16(字符用两个字节或四个字节表示)和UTF-32(字符用四个字节表示),不过在互联网上基本不用.  注意:UTF-8是Unicode的实现方式之一  UTF-8最大的一个特点: 就是它是一种变长的编码方式.它可以使用1~4个字节表示一个符号,根据不同的符号而变化字节长度.  UTF-8的编码规则很简单,只有二条: 对于单字节的符号,字节的第一位设为0,后面7位为这个符号的Unicode码.因此对于英语字母,UTF-8编码和ASCII码是相同的. 对于n字节的符号(n \u003e 1),第一个字节的前n位都设为1,第n+1位设为0,后面字节的前两位一律设为10.剩下的没有提及的二进制位,全部为这个符号的Unicode码.  规则如下: Unicode符号范围(十六进制) UTF-8编码方式(二进制) 0000 0000-0000 007F(0-127) 0xxxxxxx 0000 0080-0000 07FF(128-2047) 110xxxxx 10xxxxxx 0000 0800-0000 FFFF(2048-65535) 1110xxxx 10xxxxxx 10xxxxxx 0001 0000-0010 FFFF(65536-111411) 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx  根据上表,解读UTF-8编码非常简单.如果一个字节的第一位是0,则这个字节单独就是一个字符;如果第一位是1,则连续有多少个1,就表示当前字符占用多少个字节.  下面以汉字周为例,Unicode码点为5468(101 0100 0110 1000),根据上表,5468处在第三行的范围内(0000 0800-0000 FFFF),因此需要三个字节,即格式为1110xxxx 10xxxxxx 10xxxxxx,然后,从周的最后一个二进制位开始,依次从后向前填入格式中的x,多出的位补0.这样周的UTF-8编码为1110 0101 10 010001 10 101000,转换为十六进制就是E5 91A8  如下代码,就会打印出e591a8 name := \"周\" fmt.Printf(\"%x\", name) ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:1:3","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"BOM头  对于UTF-16和UTF-32编码方式,是采用多字节编码,计算机就需要知道其顺序,如字符A的码点是65(十进制),十六进制为41,根据UTF-16来编码时会使用两个字节,但计算机是以字节为单位来存储的,那这两个字节应该表示为0x0041还是表示为0x4100?这就引出了字节序的问题,需要依赖BOM(Byte Order Mark)机制来解决.  若为0x0041表示采用了大端序(Big endian),而为0x4100表示采用了小端序(Little endian).  在UCS(Universal Multiple-Octet Coded Character Set,属于ISO组织,与unicode基本保持一致)编码中有一个叫做Zero Width No-Break Space(零宽无间断间隔)的字符,它的编码是FEFF.规范建议在传输字节流之前,先传输该字符,若收到FEFF就表示是Big endian的,若收到FFFE就表示是Little endian.因此该字符又被称为BOM.  注意字节顺序这个概念对UTF-8来说是没有意义的,因为在UTF-8编码中,其自身已经带了控制信息,如1110xxxx 10xxxxxx 10xxxxxx 10xxxxxx,其中1110就起到了控制作用,所以不需要额外的BOM机制.  但windows在保存UTF-8编码的文件时,会在文件开始的地方自动插入BOM,即三个字符0xEF 0xBB 0xBF(字符Zero Width No-Break Space的UTF-8编码是 EF BB BF).windows是利用BOM来标记文本文件的编码方式. ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:1:4","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"string ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:2:0","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"原理 内建类型 代码在src/builtin/builtin.go // string is the set of all strings of 8-bit bytes, conventionally but not // necessarily representing UTF-8-encoded text. A string may be empty, but // not nil. Values of string type are immutable. type string string string是8字节的集合,通常但并不一定是utf8编码的文本.可以为空,但不能为nil.且不可修改. 底层结构 代码在src/runtime/string.go里面 type stringStruct struct { str unsafe.Pointer len int } str: 指针,指向存储实际字符串的地址. len: 字符串的长度,在代码中可以使用函数len()来获取该字段的值,注意:是指实际的字节数,而不是字符数. 可以采用如下方式来输出字符串底层结构字段的值 type stringStruct struct { str unsafe.Pointer len int } func main() { s := \"hello world\" fmt.Println(*(*stringStruct)(unsafe.Pointer(\u0026s))) } 输出结果为：{0x4bbad0 11} runtime.stringStruct是非导出的,不能直接在外部使用,所有在代码中定义一个stringStruct结构,与runtime.stringStruct的字段保持一样. 构建 先根据字符串构建stringStruct,再转换为string.代码在src/runtime/string.go里面 //go:nosplit func gostringnocopy(str *byte) string { // 根据字符串地址构建string ss := stringStruct{str: unsafe.Pointer(str), len: findnull(str)} // 先构造stringStruct s := *(*string)(unsafe.Pointer(\u0026ss)) // 转换成string return s } ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:2:1","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"索引和遍历 在使用for-range循环来遍历字符串时需要注意,for-range对多字节编码有特殊的支持. func main() { s := \"hi 高盈\" for index, c := range s { fmt.Printf(\"%d %c\\n\", index, c) } } 上面代码输出： 0 h 1 i 2 3 高 6 盈 可以看出：遍历时是按照字符来循环的(如果有多字节字符,就会导致索引不连续). ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:2:2","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"string和[]byte类型转换 标准转换 []byte的底层结构如下: type slice struct { array unsafe.Pointer len int cap int } string -\u003e []byte: 语法为[]byte(str). []byte -\u003e string: 语法为string(bs). string转[]byte的实现在src/runtime/string.go中 // The constant is known to the compiler. // There is no fundamental theory behind this number. const tmpStringBufSize = 32 type tmpBuf [tmpStringBufSize]byte func stringtoslicebyte(buf *tmpBuf, s string) []byte { var b []byte if buf != nil \u0026\u0026 len(s) \u003c= len(buf) { *buf = tmpBuf{} b = buf[:len(s)] } else { b = rawbyteslice(len(s)) // 当长度超过了32,需要新申请一块内存. } copy(b, s) return b } // rawbyteslice allocates a new byte slice. The byte slice is not zeroed. func rawbyteslice(size int) (b []byte) { cap := roundupsize(uintptr(size)) p := mallocgc(cap, nil, false) if cap != uintptr(size) { memclrNoHeapPointers(add(p, uintptr(size)), cap-uintptr(size)) } *(*slice)(unsafe.Pointer(\u0026b)) = slice{p, size, int(cap)} return } 注意: 当s的长度大于32时,需要调用mallocgc分配一块新的内存来存放[]byte.在转换时,如果字符串比较长(字节数超过32),标准转换方式会有一次内存分配的操作. []byte转string的实现也在src/runtime/string.go中 // slicebytetostring converts a byte slice to a string. // It is inserted by the compiler into generated code. // ptr is a pointer to the first element of the slice; // n is the length of the slice. // Buf is a fixed-size buffer for the result, // it is not nil if the result does not escape. func slicebytetostring(buf *tmpBuf, ptr *byte, n int) (str string) { if n == 0 { // Turns out to be a relatively common case. // Consider that you want to parse out data between parens in \"foo()bar\", // you find the indices and convert the subslice to string. return \"\" } if raceenabled { racereadrangepc(unsafe.Pointer(ptr), uintptr(n), getcallerpc(), funcPC(slicebytetostring)) } if msanenabled { msanread(unsafe.Pointer(ptr), uintptr(n)) } if n == 1 { p := unsafe.Pointer(\u0026staticuint64s[*ptr]) // 当长度为1时,使用静态数组staticuint64s,其保存了从0x00-\u003e0xFF之间的数字. if sys.BigEndian { p = add(p, 7) } stringStructOf(\u0026str).str = p stringStructOf(\u0026str).len = 1 return } var p unsafe.Pointer if buf != nil \u0026\u0026 n \u003c= len(buf) { p = unsafe.Pointer(buf) } else { p = mallocgc(uintptr(n), nil, false) // 长度超过32时,需要新申请一块内存. } stringStructOf(\u0026str).str = p stringStructOf(\u0026str).len = n memmove(p, unsafe.Pointer(ptr), uintptr(n)) return } 强转换 string和[]byte的底层结构是很类似的,可以通过unsafe和reflect包来强制转换 func String2Bytes(str string) []byte { sh := (*reflect.StringHeader)(unsafe.Pointer(\u0026str)) bh := reflect.SliceHeader{ Data: sh.Data, Len: sh.Len, Cap: sh.Len, } return *(*[]byte)(unsafe.Pointer(\u0026bh)) } func Bytes2String(b []byte) string { return *(*string)(unsafe.Pointer(\u0026b)) } 性能比较 import \"testing\" func Benchmark_NormalBytes2String(b *testing.B) { x := []byte(\"Going International Group\") for i := 0; i \u003c b.N; i++ { _ = string(x) } } func Benchmark_Bytes2String(b *testing.B) { x := []byte(\"Going International Group\") for i := 0; i \u003c b.N; i++ { _ = Bytes2String(x) } } func Benchmark_NormalString2Bytes(b *testing.B) { x := \"Going International Group\" for i := 0; i \u003c b.N; i++ { _ = []byte(x) } } func Benchmark_String2Bytes(b *testing.B) { x := \"Going International Group\" for i := 0; i \u003c b.N; i++ { _ = String2Bytes(x) } } func Benchmark_NormalBytes2String1(b *testing.B) { x := []byte(\"Going International Group, Going International Group, Going International Group, Going International Group, Going International Group\") for i := 0; i \u003c b.N; i++ { _ = string(x) } } func Benchmark_Bytes2String1(b *testing.B) { x := []byte(\"Going International Group, Going International Group, Going International Group, Going International Group, Going International Group\") for i := 0; i \u003c b.N; i++ { _ = Bytes2String(x) } } func Benchmark_NormalString2Bytes1(b *testing.B) { x := \"Going International Group, Going International Group, Going International Group, Going International Group, Going International Group\" for i := 0; i \u003c b.N; i++ { _ = []byte(x) } } func Benchmark_String2Bytes1(b *testing.B) { x := \"Going International Group, Going International Group, Going International Group, Going In","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:2:3","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"字符串拼接 +拼接 最常用最简单的就是通过+来拼接字符串 func StringPlus() string { var s string s += \"steven\" + \" zhou\" s += \" works\" + \" at\" s += \" Going International Group\" return s } 使用指令go tool compile -N -l -S main.go来查看具体的实现. $ go tool compile -N -l -S main.go \"\".StringPlus STEXT size=269 args=0x10 locals=0x50 funcid=0x0 0x0000 00000 (main.go:7) TEXT \"\".StringPlus(SB), ABIInternal, $80-16 0x0000 00000 (main.go:7) MOVQ (TLS), CX 0x0009 00009 (main.go:7) CMPQ SP, 16(CX) 0x000d 00013 (main.go:7) PCDATA $0, $-2 0x000d 00013 (main.go:7) JLS 259 0x0013 00019 (main.go:7) PCDATA $0, $-1 0x0013 00019 (main.go:7) SUBQ $80, SP 0x0017 00023 (main.go:7) MOVQ BP, 72(SP) 0x001c 00028 (main.go:7) LEAQ 72(SP), BP 0x0021 00033 (main.go:7) FUNCDATA $0, gclocals·2a5305abe05176240e61b8620e19a815(SB) 0x0021 00033 (main.go:7) FUNCDATA $1, gclocals·f207267fbf96a0178e8758c6e3e0ce28(SB) 0x0021 00033 (main.go:7) XORPS X0, X0 0x0024 00036 (main.go:7) MOVUPS X0, \"\".~r0+88(SP) 0x0029 00041 (main.go:8) XORPS X0, X0 0x002c 00044 (main.go:8) MOVUPS X0, \"\".s+56(SP) 0x0031 00049 (main.go:9) XORPS X0, X0 0x0034 00052 (main.go:9) MOVUPS X0, (SP) 0x0038 00056 (main.go:9) MOVQ $0, 16(SP) 0x0041 00065 (main.go:9) LEAQ go.string.\"steven zhou\"(SB), AX 0x0048 00072 (main.go:9) MOVQ AX, 24(SP) 0x004d 00077 (main.go:9) MOVQ $11, 32(SP) 0x0056 00086 (main.go:9) PCDATA $1, $0 0x0056 00086 (main.go:9) CALL runtime.concatstring2(SB) 需要注意\"steven\" + \" zhou\"直接被合并为\"steven zhou\",编译器已经做了优化. +的具体实现是runtime.concatstring2(SB)函数.代码在src/runtime/string.go里 // The constant is known to the compiler. // There is no fundamental theory behind this number. const tmpStringBufSize = 32 type tmpBuf [tmpStringBufSize]byte // concatstrings implements a Go string concatenation x+y+z+... // The operands are passed in the slice a. // If buf != nil, the compiler has determined that the result does not // escape the calling function, so the string data can be stored in buf // if small enough. func concatstrings(buf *tmpBuf, a []string) string { idx := 0 l := 0 count := 0 for i, x := range a { n := len(x) if n == 0 { continue } if l+n \u003c l { throw(\"string concatenation too long\") } l += n count++ idx = i } if count == 0 { return \"\" } // If there is just one string and either it is not on the stack // or our result does not escape the calling frame (buf != nil), // then we can return that string directly. if count == 1 \u0026\u0026 (buf != nil || !stringDataOnStack(a[idx])) { return a[idx] } s, b := rawstringtmp(buf, l) for _, x := range a { copy(b, x) b = b[len(x):] } return s } func concatstring2(buf *tmpBuf, a [2]string) string { return concatstrings(buf, a[:]) } func concatstring3(buf *tmpBuf, a [3]string) string { return concatstrings(buf, a[:]) } func concatstring4(buf *tmpBuf, a [4]string) string { return concatstrings(buf, a[:]) } func concatstring5(buf *tmpBuf, a [5]string) string { return concatstrings(buf, a[:]) } // stringDataOnStack reports whether the string's data is // stored on the current goroutine's stack. func stringDataOnStack(s string) bool { ptr := uintptr(stringStructOf(\u0026s).str) stk := getg().stack return stk.lo \u003c= ptr \u0026\u0026 ptr \u003c stk.hi } func rawstringtmp(buf *tmpBuf, l int) (s string, b []byte) { if buf != nil \u0026\u0026 l \u003c= len(buf) { b = buf[:l] s = slicebytetostringtmp(\u0026b[0], len(b)) } else { s, b = rawstring(l) } return } 内置了concatstring2到concatstring5,但实际上最终都是调用concatstring函数. 先计算所有字符串的累加长度. 如果有效的字符串个数为0,直接返回\"\". 如果只有一个字符串且该字符串不在当前栈空间内或返回结果的字符串没有逃逸,则直接返回该字符串本身. 如果buf不为nil且累计长度没有超过32,直接使用该数组空间.否则就重新分配一块内存. 依次循环拷贝字符串. fmt拼接 使用fmt.Sprint的系列函数来进行拼接,然后返回拼接的字符串. func StringFmt() string { return fmt.Sprint(\"steven\", \" zhou\", \" works\", \" at\", \" Going International Group\") } join拼接 使用strings.Join函数进行拼接,接受一个字符串数组,转换为一个拼接好的字符串. func StringJoin() string { s := []string{\"steven\", \" zhou\", \" works\", \" at\", \" Going International Group\"} return strings.Join(s, \"\") } buffer拼接 使用bytes.Buffer来进行拼接,该结构体不止可以拼接字符串,还可以是byte,rune等,且实现了io.Writer接口. func StringBuffer() string { var b bytes.Buffer b.WriteString(\"steven\") b.WriteString(\" zho","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:2:4","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Golang"],"content":"乱码和不可打印字符 如果字符串中出现不合法的utf8编码,打印时对于每个不合法的编码字节都会输出一个特定的符号�： func main() { s := \"高盈\" fmt.Println(s[:5]) } 上面代码输出：高�,由于\"盈\"编码有三个字节,s[:5]只取前两个字节,无法组成一个合法的UTF8字符,故输出该特定符号. 还需要注意不可打印字符,两个字符串打印输出的内容相同,但两个字符串就是不相等: func main() { b1 := []byte{0xEF, 0xBB, 0xBF, 72, 101, 108, 108, 111} b2 := []byte{72, 101, 108, 108, 111} s1 := string(b1) s2 := string(b2) fmt.Println(s1) fmt.Println(s2) fmt.Println(s1 == s2) } 上面代码输出: hello hello false 字符串比较是会比较长度和每个字节的,虽然打印出来是相同的,但b1是带了BOM头的(0xEFBBBF就是BOM头),b2没有带,这种一般会出现在从文件读取数据,一定要注意文件是否带了BOM头. ","date":"2021-07-24","objectID":"/2021/07/24/golang-string/:2:5","tags":["go"],"title":"golang字符串解析","uri":"/2021/07/24/golang-string/"},{"categories":["Microservice"],"content":"简介 一致性算法允许一个集群作为一个整体来工作,并允许集群中的某些节点发生故障,而不影响集群的整体运作.因此一致性算法在构建可靠的大型软件系统中扮演了关键角色. raft是用于管理复制日志的一致性协议,在设计之初就非常注重是否易于理解和易于学习,相比Paxos而言,raft更简单,更易于理解和学习.为此raft分离了一致性协议的关键要素,如leader选举、日志复制、安全性,并强制执行强一致性以减少必须考虑的状态的数量,raft还包括一种新的改变集群成员的机制,它使用重叠的大多数来保证安全. raft相比于其它一致性算法,有如下几个特性: Strong leader: raft最多只有一个leader,日志条目只能从leader流向其它节点,这简化了复制日志的管理. Leader选举: raft使用随机计时器来触发leader选举.在心跳基础上增加少量机制就能实现,同时很简单快速的解决了选举中的冲突问题. 成员变更: raft更改集群中节点的机制使用了一种新的联合一致性方法,其中两种不同配置的大多数在过渡期间会重叠.这允许集群在更改配置期间继续正常运行. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:1:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"复制状态机 复制状态机主要是为了解决分布式系统中的各种容错处理,通常使用复制日志来实现.如下图所示,每个节点存储一个包含一系列命令的日志,其状态机按顺序执行日志中的命令.每个日志中的命令都相同且顺序也一样,因此每个状态机处理相同的命令序列.这样每个节点能得到相同的状态和相同的输出序列. 复制状态机架构复制状态机架构 \" 复制状态机架构 一致性算法的工作就是保证复制日志的一致性.leader节点的一致性模块接收客户端的命令,并将它们添加到日志中.还负责与其它节点的一致性模块通信,以确保每个日志最终以相同的顺序包含相同的命令,即使有些节点下线了.一旦命令被正确复制,每个节点上的状态机按日志顺序处理它们,并将输出返回给客户端.这样就形成高可用的复制状态机. 实际系统中的一致性算法通常具有以下属性: 它们确保在所有非拜占庭条件下(包括网络延迟,分区和数据丢失,重复和乱序)的安全性(不会返回不正确的结果). 只要任何大多数(过半)节点可以运行,且可以相互通信和与客户端通信,一致性算法就可用.如5个节点的典型集群就最多允许2个节点发生故障. 不依赖于时钟来保证日志的一致性:错误的时钟和极端消息延迟可能在最坏的情况下导致性能问题. 在通常情况下,只要集群的大部分(过半节点)已响应了单轮RPC,命令就可以完成,少数(一半以下)慢节点不影响整个系统的性能. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:2:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"算法基本概念 ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:3:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"摘要 如下图是算法的浓缩,可用作参考: raft精简摘要raft精简摘要 \" raft精简摘要 如下图是算法关键的特性: raft关键特性raft关键特性 \" raft关键特性 节点之间是使用RPC进行通信,主要包括: 请求投票(RequestVote)RPC由candidate节点在选举期间发起. 追加条目(AppendEntries)RPC由leader节点发起,用来复制日志和提供一种心跳机制. 为了在节点之间传输快照增加了第三种RPC. 注意:当节点没有及时收到RPC的响应时,会进行重试,且能够并行的发起RPC来获得最佳的性能. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:3:1","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"角色 raft通过选举leader并由leader节点来负责管理日志复制来实现多副本的一致性.raft包含三种角色: leader: 负责接收客户端的请求,并追加到本地日志,然后将日志复制到其它节点,并会告知何时来应用日志是安全的.在状态机应用之后才会有响应. follower: 负责响应来自leader和candidate的请求. cadidate: 选举leader过程中的中间状态. 角色转换如下图: 角色转换图角色转换图 \" 角色转换图 所有节点初始状态都是follower角色. follower在超时时间内没有收到leader的请求则转换为candidate,开始进行选举. candidate在收到大多数阶段的选票后转换为leader. candidate在发现已有leader或收到更高term的请求时转换为follower. leader在收到更高term的请求时转换为follower. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:3:2","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"任期(term) raft把时间切分为一个一个的任期,每个任期都有一个任期ID,采用连续的整数.每个任期都是从选举开始的,若选举成功,选举出来的leader会在这个任期内负责管理集群;若选举失败,即没有成功选出一个leader,会通过超时机制再次开始选举,任期ID也会相应的增加.即一个完整的任期要么选举失败没有leader,要么选举成功只会有一个leader. 任期划分任期划分 \" 任期划分 注意:在不同的节点上观察到的任期转换的次数可能是不一样的,在某些情况下,一个节点可能没有看到leader选举过程或甚至整个任期全程. 任期在raft算法中充当逻辑时钟的作用,这使得节点可以发现一些过去的信息如过时的leader.每个节点都会存储当前的任期ID,该ID随着时间单调递增.节点在通信时会交换任期ID,如果一个节点的当前任期ID比其它节点小,该节点就会把任期ID更新为较大的那个值.如果一个leader节点或candidate节点发现自己的任期ID过期了,就会立即回到follower状态.如果节点接收到一个包含过期的任期ID的请求,会直接拒绝这个请求. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:3:3","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"leader选举 raft主要是使用心跳机制来触发leader选举.服务启动时,都是follower状态,若服务节点能从leader或candidate接收到有效的RPC,就会一直保持follower状态.leader节点会周期性的向所有follower节点发送心跳(不包含日志条目的AppendEntries RPC)来维持自己的地位.若follower节点在选举超时时间内没有收到任何消息,就会认为集群中没有可用的leader,就会增加任期ID并把状态转变为candidate,向集群中的其它节点发送RequestVote RPC,发起新一轮选举. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:4:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"成功当选leader 若candidate获取集群中过半节点针对同一任期的投票,就会赢得选举并成功当选leader.在同一任期内,每个节点只能把票投给一个candidate,按照先来先服务(first-come-first-served)的原则.要求获得过半的投票规则可以确保最多只有一个candidate能赢得此次选举.一旦赢得选举当先leader,就会向其它节点发送心跳消息来确定自己的地位并阻止新的选举. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:4:1","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"其它节点当选为leader 若candidate收到了另一个声称自己是leader的节点发来的AppendEntries RPC: 如果该leader节点的任期ID不小于candidate当前的任期ID,则candidate会承认该leader的合法性并把自身状态变更为follower. 如果该leader节点的任期ID小于candidate当前的任期ID,则candidate会直接拒绝该请求并继续保持candidate状态. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:4:2","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"没有任何获胜者 若本轮选举没有candidate获得了超过半数的投票(有可能是有多个follower同时成为candidate,导致选票被瓜分从而没有candidate赢得过半的投票),每个candidate都会超时,然后通过增加任期ID来触发下一轮新的选举.但如果没有一些其它机制,该情况可能会无限重复,导致脑裂,没法选出leader. raft使用随机选举超时时间的方法来确保很少发生选票被瓜分的情况,选举超时时间是从一个固定的区间(如150-300毫秒)来随机选择.这样可以把节点都分散开以至于大多数情况下只有一个节点会选举超时,然后该节点赢得选举并向其它节点发送心跳.每个candidate在开始一次新的选举时都会重置一个随机的选举超时时间,然后一直等待直到选举超时,这样就减少了在新的选举中再次发生选票被瓜分的情况. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:4:3","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"日志复制 ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:5:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"日志运作方式 leader节点会把客户端的指令作为一个新的条目追加到日志中去,然后并发的发送AppendEntries RPC给其它的节点,让它们复制该日志.当日志被安全的复制后(大多数节点已复制),leader节点将会把该条目应用到状态机中(状态机执行指令),然后把执行的结果返回给客户端.如果follower节点崩溃或运行缓慢,或网络丢包,leader节点会不停的重试AppendEntries RPC(即使已经回复客户端了)直至所有follower节点最终都存储了所有的日志条目. 日志组织方式如下图所示: 日志组织方式日志组织方式 \" 日志组织方式 每个日志条目存储一条状态机指令和leader节点收到该指令时的任期ID.任期ID用来检测多个日志副本之间的不一致情况,每个日志条目都有一个整数索引值来表明它在日志中的位置. 当日志条目被过半的节点复制了,那么该日志条目就会被提交(如上图中的条目7,已被大多数节点复制),这种日志条目被称为已提交的,同时leader节点日志中该日志条目之前的所有日志条目也都会被提交,包括由其它leader节点创建的条目.raft算法保证所有已提交的日志条目都是持久化的且最终会被所有可用的状态机执行.leader节点会追踪已提交的日志条目的最大索引,未来的所有AppendEntries RPC都会包含该索引,这样其它节点才能最终直到哪些日志条目需要被提交.follower节点一旦明确某个日志条目已经被提交,就会把该日志条目应用到本地状态机中(按照日志的顺序). ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:5:1","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"日志的特性 raft维护着以下日志特性: 如果不同节点日志中的两个条目拥有相同的索引和任期ID,那么它们存储了相同的指令. 如果不同节点日志中的两个条目拥有相同的索引和任期ID,那么它们之前的所有日志条目也都相同. leader节点在特定的任期ID内一个日志索引处最多创建一个日志条目,同时日志条目在日志中的位置从来不会改变,这保证了第一条特性.第二条特性是由AppendEntries RPC执行一个简单的一致性检查所保证的.leader节点在发送AppendEntries RPC时会将前一个日志条目的索引位置和任期ID包含在里面,如果follower节点在它的日志中找不到包含相同索引位置和任期ID的条目,那么就会拒绝新的日志条目.leader会通过强制follower复制它的日志来解决不一致的问题,这意味着follower中跟leader冲突的日志条目会被leader的日志条目覆盖. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:5:2","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"日志的一致性机制 如下图展示了在什么情况下follower的日志可能和新的leader的日志不同: 日志不一致情况日志不一致 \" 日志不一致情况 当leader当选成功时,follower可能是(a-f)中的任何情况: follower节点可能缺失了一些日志条目,如(a-b) follower节点可能有一些未提交的日志条目,如(c-d) folloer节点可能缺失也有可能有一些未提交的,如(e-f) 场景f可能是这样发生,f对应的节点在任期2时是leader,追加了一些日志条目到本地日志中,但都还未提交时就崩溃了;该节点很快重启,在任期3重新被选为leader,又追加了一些日志条目到本地日志中,但又未提交.即任期2和任期3里的日志都还没有被提交之前,节点又挂了,且在接下来的几个任期内一直处于宕机状态. leader节点会针对每一个follower节点维护一个nextIndex,表示leader要发送给follower的下一条日志条目的索引.当新leader被选举出来时,会将所有的nextIndex重置为自己最后一个日志条目的index+1(如上图leader会把nextIndex重置为11).如果follower的日志和leader不一样,那么下一次AppendEntries RPC的一致性检查就会失败,在follower拒绝之后,leader就会减小nextIndex值并重试AppendEntries RPC,最终nextIndex会在某个位置使得leader和follower的日志达成一致.这时一致性检查就会成功,将follower中跟leader冲突的日志条目全部删除然后追加leader中的日志条目(如果有追加的话).一旦AppendEntries RPC成功,就表示follower日志和leader的一致,并且在该任期内接下来的时间里保持一致. 通过这种机制,在leader当选后就不需要任何特殊操作来使得日志保持一致性.另外需要注意:leader节点从来不删除或覆盖自己的日志条目,leader具备Append-Only属性. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:5:3","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"安全性 ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:6:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"选举限制 想象下如下场景:集群中一个follower节点可能会进入不可用状态,在此期间leader已提交若干的日志条目.在后续新的选举时,该follower节点当选为leader,会造成什么样的后果? 当follower节点当选为leader时会用自身的日志条目来覆盖其它节点的日志条目,那这样新leader和老leader的状态机就会执行不同的指令序列,在一致性算法中是不能允许出现此种状况的. raft在投票时必须保证赢得选举的candidate包含了所有已提交的日志条目.在选举时candidate节点要与集群中过半的节点进行通信,这意味着至少其中一个节点包含了所有已提交的日志条目,若candidate的日志和至少过半的节点的一样新,那么它就一定包含了所有已提交的日志条目.RequestVote RPC中包含了candidate的日志信息,如果其它投票者的日志比candidate的还新,就会拒绝该投票请求. 主要是通过比较最后一条日志条目的索引值和任期ID来定义谁的日志比较新.如果两条日志最后的任期ID不同,则任期ID大的日志更新;如果任期ID相同,则索引值大的那个更新. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:6:1","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"提交老任期内的日志条目 查看如下的情况: leader节点无法判断老的任期内的日志是否已被提交leader节点无法判断老的任期内的日志是否已被提交 \" leader节点无法判断老的任期内的日志是否已被提交 在(a)中,S1是leader,部分复制了索引位置2的日志条目 在(b)中,S1崩溃,S5在新任期内赢得选举成为新leader(来自S3/S4/自身的选票),然后从客户端接收了一条不一样的日志放在索引位置2处. 在(c)中,S5崩溃,S1重新启动在新任期内赢得选举成为新leader,继续复制日志.此时来自任期2的那条日志被复制到集群中的大多数节点上,但还未提交. 在(d)中,S1又崩溃,S5可以被重新选举成功(来自S2/S3/S4的选票),然后覆盖了它们在索引2处的日志. 但在S1又崩溃之前,在S1的新任期内复制了日志条目到大多数节点上,如(e)中,然后这个条目就会被提交(S5后来就不可能选举成功).在这种情况下,之前的所有日志也被提交了. 如图中所描述的,一个已经被存储到过半节点上的老日志条目,仍然可能会被未来的leader覆盖掉.为了消除该问题,raft永远不会通过计算副本数目的方式来提交老任期内的日志条目,只有leader当前任期内的日志条目才通过计算副本数量的方式来提交;一旦当前任期的某个日志条目以这种方式被提交,那么由于日志匹配特性,之前所有的日志条目也都会被间接地提交. raft在提交规则上额外增加了复杂性,当leader复制老任期内的日志条目时,这些日志条目都保留原来的任期号. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:6:2","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"安全性论证 主要讨论leader的完整性特性(Leader Completeness Property),可以先假设leader完整性特性是不满足的,然后推导出矛盾来.假设任期T的leader(leader T)在任期内提交了一个日志条目,但是该日志条目没有被存储在未来某些任期的leader中.假设U是大于T的没有存储该日志条目的最小任期号,如下图所示: S1(任期T的leader)在它的任期内提交了一个新的日志条目,然后S5在之后的任期U里被选举为leader,那么肯定至少会有一个节点,如S3,既接收了来自S1的日志条目,也给S5投票了S1(任期T的leader)在它的任期内提交了一个新的日志条目,然后S5在之后的任期U里被选举为leader,那么肯定至少会有一个节点,如S3,既接收了来自S1的日志条目,也给S5投票了 \" S1(任期T的leader)在它的任期内提交了一个新的日志条目,然后S5在之后的任期U里被选举为leader,那么肯定至少会有一个节点,如S3,既接收了来自S1的日志条目,也给S5投票了 U一定在刚成为leader时就没有那条被提交的日志条目了(leader从不删除或覆盖任何条目). leader T复制该日志条目到集群中的过半节点,同时leader U从集群中的过半节点赢得了选票.因此,至少有一个节点(投票者)同时接受了来自leader T的日志条目和给leader U投票了,该投票者是产生矛盾的关键. 该投票者必须在给leader U投票之前先接受了从leader T发来的已经被提交的日志条目;否则它会拒绝来自leader T的AppendEntries请求(如果先给U投票,此时任期号会比T大) 该投票者在给U投票时依然保有该日志条目,因为任何U、T之间的leader都包含该日志条目(根据上述假设),leader从不删除条目,follower只跟leader有冲突时才会删除条目. 该投票者把自己的选票投给leader U时,leader U的日志必须至少和投票者是一样新的,这就导致以下两个矛盾: 如果该投票者和leader U的最后一个日志的任期号相同,那么leader U的日志至少和投票者的一样长,所以leader U的日志一定包含该投票者日志中的所有日志.这是一个矛盾,在上述假设中,leader U是不包含的. 否则leader U的最后一个日志的任期号就必须比投票者的大.此外该任期号也比T大,因为该投票者的最后一个日志条目的任期号至少和T一样大(它包含了来自任期T的已提交日志).leader U之前的leader一定已经包含了该已被提交的日志条目(根据上述假设,leader U是第一个不包含该日志条目的leader).所以,根据日志匹配特性,leader U一定也包含该已被提交的日志条目,这里产生了矛盾. 因此所有比T大的任期的leader一定都包含了任期T中提交的所有日志条目. 日志匹配特性保证了未来的leader也会包含被间接提交的日志条目. 通过Leader Completeness特性,就能证明状态机的安全特性,即如果某个节点已经将某个给定的索引处的日志条目应用到自己的状态机里了,那么其它的节点就不会在相同的索引处应用一个不同的日志条目.在一个节点应用一个日志条目到自己状态机中时,它的日志和leader的日志从开始到该条日志条目都相同,并且该日志条目必须被提交.某节点在某个任期中某个特定的索引处应用了一个日志条目,日志完整性特性保证拥有更高任期号的leader会存储相同的日志条目,所以之后任期里节点应用该索引处的日志条目也会是相同的值.因此,状态机安全特性是成立的. raft要求节点按照日志索引顺序应用日志条目.再加上状态机安全特性,这意味着所有节点都会按照相同的顺序应用相同的日志条目到自己的状态机中. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:6:3","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"Follower和Candidate崩溃 如果follower或candidate崩溃可,那么后续发送给它们的RequestVote和AppendEntries RPC就会失败.raft通过无限的重试来处理这种失败;若崩溃节点重启了,那么这些RPC就会成功地完成.如一个节点在完成一个RPC,但是还没有响应的时候崩溃了,那么它在重启之后会再次收到相同的请求.raft的RPCs都是幂等的,所以重试不会造成任何问题.例如一个follower如果收到AppendEntries请求但是它的日志中已经包含了这些日志条目,它会直接忽略这个新的请求中的这些日志条目. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:7:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"定时和可用性 raft的要求之一就是安全性不能依赖于定时:整个系统不能因为某些事件运行得比预期快一点或慢一点就产生错误的结果.但是,可用性(系统能及时响应客户端)又不可避免的要依赖于定时.例如:当有节点崩溃时,消息交换的时间就会比正常情况下长,candidate将不会等待太长的时间赢得选举;没有一个稳定的leader,raft将无法工作. leader选举是raft中定时最为关键的方面.只要整个系统满足下面的时间要求,raft就可以选举出并维持一个稳定的leader: 广播时间(broadcastTime) « 选举超时时间(electionTimeout) « 平均故障间隔时间(MTBF) 广播时间是指一个节点并行发送RPCs给集群中所有的其它服务器并接收到响应的平均时间.广播时间必须比选举超时时间小一个量级,这样leader才能可靠的发送心跳消息来阻止follower开始进入选举状态;再加上随机化选举超时时间的方法,这个不等式也使得选票瓜分的情况变得不可能.选举超时时间需要比平局故障间隔时间小上几个数量级,这样整个系统才能稳定地运行.当leader崩溃后,整个系统会有大约选举超时时间不可用. raft的RPCs需要接收方将信息持久化保存到稳定的存储中去,所以广播时间大约是0.5毫秒到20毫秒之间,具体取决于存储的技术.选举超时时间可能需要在10毫秒到500毫秒之间. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:8:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"集群成员变更 ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:9:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"两个独立的大多数 为了使配置变更机制能够安全,在转换的过程中不能够存在任何时间点使得同一任期里可能选出两个leader.任何节点直接从旧配置转换到新配置的方案都是不安全的,一次性转换所有节点是不可能的,所以在转换期间可能划分成两个独立的大多数,如下图所示: 两个独立的大多数两个独立的大多数 \" 两个独立的大多数 在上图中,集群从3个节点变为5个.这样存在一个时间点,同一任期内两个不同的leader会被选举出来,一个获得旧配置里的过半节点的投票,一个获得新配置里的过半节点的投票.Server 1在旧配置中获得自身的和Server 2的选票而当选,Server 5在新配置中获得自身的、Server 4和Server 3的选票而当选. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:9:1","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"联合一致方案 在raft中,集群先切换到一个过渡的配置,称为联合一致(joint consensus),一旦联合一致已经被提交,则系统就切换到新的配置上,联合一致结合了老配置和新配置: 日志条目被复制给集群中新、老配置的所有节点. 新、旧配置的节点都可以成为leader. 达成一致(针对选举和提交)需要分别在两种配置上获得过半的支持. 联合一致允许独立的节点在不妥协安全性的前提下,在不同的时刻进行配置转换过程.此外,联合一致允许集群在配置变更期间依然响应客户端请求. 联合一致配置过程联合一致配置过程 \" 联合一致配置过程 集群配置在复制日志中以特殊的日志条目来存储和通信.如上图所示,当一个leader接收到一个改变配置从C-old到C-new的请求,它就为联合一致将该配置(图中的C-old,new)存储为一个日志条目,并以前面描述的方式复制该日志.一旦某个节点将该新配置日志条目增加到自己的日志中,它就会用该配置来做出未来所有的决策(节点总是使用它日志中最新的配置,无论配置日志是否已经被提交).这意味着leader会使用C-old,new的规则来决定C-old,new的日志条目是什么时候被提交的.如果leader崩溃了,新leader可能是在C-old配置也可能是在C-old,new配置下选出来的,这取决于赢得选举的candidate是否已经收到了C-old,new配置.在任何情况下,C-new在这一时期不能做出单方面的决定. 一旦C-old,new被提交,那么C-old和C-new都不能在没有得到对方认可的情况下做出决定,并且leader完整性特性保证了只有拥有C-old,new日志条目的节点才能被选举为leader.现在leader创建一个描述C-new配置的日志条目并复制到其它节点就是安全的了.此外,新配置被节点收到后就会立即生效.当新的配置在C-new的规则下被提交,旧配置就会变得无关紧要,同时不使用新配置的节点就可以被关闭了.如上图所示,任何时刻C-old和C-new都不能单方面做出决定,这保证了安全性. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:9:2","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"配置变更需要解决的问题 新的节点开始时可能没有存储任何日志条目 当新节点以这种状态加入到集群中,它们需要一段时间来追赶上其它节点,这段它们无法提交新的日志条目.为了避免因此而造成的系统短时间的不可用,raft在配置变更阶段引入一个额外的阶段,在该阶段,新节点以没有投票权身份加入到集群中来(leader也复制日志给它们,但在考虑过半的时候不用考虑它们).一旦新节点追赶上集群中其它节点,配置变更就可以按上述描述的方式进行. 集群的leader可能不是新配置中的一员 在这种情况下,leader一旦提交了C-new日志条目就会退位(回到follower状态).这意味着有一段时间(leader提交C-new期间),leader管理这一个不包括自己的集群;它复制过半日志但不把自己算在过半里.leader转换发生在C-new被提交的时候,因为这是新配置可以独立运转的最早时刻(将总能在C-new配置下选出新的leader).在此之前可能只能从C-old中选出leader. 那些被移除的节点(不在C-new中)可能会扰乱集群 这些节点将不会再接收到心跳,所以当选举超时,它们就会进行新的选举过程.会发送新的任期号的RequestVote RPCs,这样会导致leader回到follower状态.新的leader会被再次选举出来,但被移除的节点会再次超时,然后这个过程会再次重复,导致系统可用性很差. 为了防止这种问题,当节点认为当前leader存在时,节点会忽略RequestVote RPCs.特别的,当节点在最小选举超时时间内收到一个RequestVote RPC,它不会更新任期号或投票.这不会影响正常的选举,每个节点在开始一次选举之前,最少会等待最小选举超时时间.相反,这有利于避免被移除的节点的扰乱,如果leader能够发送心跳给集群,那它就不会被更大的任期号废黜. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:9:3","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"日志压缩 raft的日志会随着客户端请求不断的增长,在实际系统中,日志不能无限的增长.随着日志越长,占用的空间会越来越多,且需要花费更多的时间来进行回放.如果没有一定的机制来清除日志中积累的过期信息,最终会带来可用性问题. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:10:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"快照技术 整个当前系统的状态都以快照的形式持久化到稳定的存储中,该时间点之前的日志全部丢弃. raft快照技术raft快照技术 \" raft快照技术 如上图所示,一个节点用一个新快照替代了它日志中已提交了的条目(索引1到5),该快照只存储了当前的状态(变量x和y的值).快照的last included index和last included term被保存来定位日志中条目6之前的快照. 每个节点独立的创建快照,快照只包含自己日志中已被提交的条目.主要的工作是状态机将自己的状态写入到快照中.raft快照中也包含了少量的元数据:the last included index指的是最后一个被快照取代的日志条目的索引值(状态机最后应用的日志条目),the last included term指的是该条目的任期号.保留这些元数据是为了支持快照之后第一个条目的AppendEntries一致性检查,需要之前的索引号和任期号.为了支持集群成员变更,快照中也包括日志中最新的配置作为last included index.一旦节点完成写快照,就可以删除last included index之前的所有日志条目,包括之前的快照. leader节点有时必须偶尔发送快照给一些落后的follower.通常发生在leader已经丢弃了需要发送给follower的下一条日志条目的时候.leader使用InstallSnapshot RPC来发送快照给太落后的follower,如下图所示: installsnapshot RPCinstallsnapshot \" installsnapshot RPC 当follower收到带有这种RPC的快照时,它必须决定如何处理已经存在的日志条目.通常该快照会包含接收者日志中没有的信息.在这种情况下,follower丢弃它所有的日志;这些会被该快照所取代,且可能一些没有提交的条目和该快照产生冲突.如果接收到的快照是自己日志的前面部分(由于网络重传或错误),那些被快照包含的条目将会被全部删除,但是快照之后的条目仍然有用并保留. ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:10:1","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"参考资料 raft算法 raft论文翻译 ","date":"2021-01-05","objectID":"/2021/01/05/raft-consensus-algorithm/:11:0","tags":["raft"],"title":"分布式一致性算法:Raft","uri":"/2021/01/05/raft-consensus-algorithm/"},{"categories":["Microservice"],"content":"简介 etcd是强一致性的,分布式KV存储,为分布式系统或集群提供可靠的数据存储.底层基于raft共识算法,可以在网络分区情况下正常进行leader选举,即便是leader节点出现故障.满足CAP中的CP. ","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:1:0","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"部署 ","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:2:0","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"基于源码 编译 # 下载源码. git clone git@github.com:etcd-io/etcd.git # 编译,可执行文件放在bin目录下. cd etcd make build 集群方式运行 在本机上起3个etcd实例,放在如下目录: # sky @ sky-HP in ~/data/etcdcluster [11:10:09] $ ll 总用量 23M drwxr-xr-x 3 sky sky 4.0K 12月 20 11:09 etcd1 drwxr-xr-x 3 sky sky 4.0K 12月 20 11:09 etcd2 drwxr-xr-x 3 sky sky 4.0K 12月 20 11:09 etcd3 -rwxr-xr-x 1 sky sky 23M 12月 19 12:51 etcdctl -rwxr-xr-x 1 sky sky 2.0K 12月 20 11:07 start.sh # sky @ sky-HP in ~/data/etcdcluster [11:28:14] $ cd etcd1/ # sky @ sky-HP in ~/data/etcdcluster/etcd1 [11:29:47] $ ll 总用量 29M drwxr-xr-x 3 sky sky 4.0K 12月 20 11:07 data.etcd -rwxr-xr-x 1 sky sky 29M 12月 19 12:51 etcd 用脚本start.sh来启动etcd实例. #!/bin/bash TOKEN=token-1 CLUSTER_STATE=new NAME_1=etcd-1 NAME_2=etcd-2 NAME_3=etcd-3 HOST_1=192.168.3.5 HOST_2=192.168.3.5 HOST_3=192.168.3.5 PEER_PORT_1=2380 PEER_PORT_2=3380 PEER_PORT_3=4380 CLIENT_PORT_1=2379 CLIENT_PORT_2=3379 CLIENT_PORT_3=4379 CLUSTER=${NAME_1}=http://${HOST_1}:${PEER_PORT_1},${NAME_2}=http://${HOST_2}:${PEER_PORT_2},${NAME_3}=http://${HOST_3}:${PEER_PORT_3} nohup ./etcd1/etcd --data-dir=./etcd1/data.etcd --name ${NAME_1} \\ --initial-advertise-peer-urls http://${HOST_1}:${PEER_PORT_1} --listen-peer-urls http://${HOST_1}:${PEER_PORT_1} \\ --advertise-client-urls http://${HOST_1}:${CLIENT_PORT_1} --listen-client-urls http://${HOST_1}:${CLIENT_PORT_1} \\ --initial-cluster ${CLUSTER} \\ --initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \u003e etcd1.log 2\u003e\u00261 \u0026 nohup ./etcd2/etcd --data-dir=./etcd2/data.etcd --name ${NAME_2} \\ --initial-advertise-peer-urls http://${HOST_2}:${PEER_PORT_2} --listen-peer-urls http://${HOST_2}:${PEER_PORT_2} \\ --advertise-client-urls http://${HOST_2}:${CLIENT_PORT_2} --listen-client-urls http://${HOST_2}:${CLIENT_PORT_2} \\ --initial-cluster ${CLUSTER} \\ --initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \u003e etcd2.log 2\u003e\u00261 \u0026 nohup ./etcd3/etcd --data-dir=./etcd3/data.etcd --name ${NAME_3} \\ --initial-advertise-peer-urls http://${HOST_3}:${PEER_PORT_3} --listen-peer-urls http://${HOST_3}:${PEER_PORT_3} \\ --advertise-client-urls http://${HOST_3}:${CLIENT_PORT_3} --listen-client-urls http://${HOST_3}:${CLIENT_PORT_3} \\ --initial-cluster ${CLUSTER} \\ --initial-cluster-state ${CLUSTER_STATE} --initial-cluster-token ${TOKEN} \u003e etcd3.log 2\u003e\u00261 \u0026 查看集群状态 # sky @ sky-HP in ~/data/etcdcluster [11:32:54] $ export ETCDCTL_API=3 # sky @ sky-HP in ~/data/etcdcluster [11:09:51] $ ENDPOINTS=192.168.3.5:2379,192.168.3.5:3379,192.168.3.5:4379 # sky @ sky-HP in ~/data/etcdcluster [11:10:47] $ ./etcdctl --endpoints=$ENDPOINTS --write-out=table member list +------------------+---------+--------+-------------------------+-------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+--------+-------------------------+-------------------------+------------+ | 1dd6eece19e43a7e | started | etcd-1 | http://192.168.3.5:2380 | http://192.168.3.5:2379 | false | | 2be980070f5fd239 | started | etcd-2 | http://192.168.3.5:3380 | http://192.168.3.5:3379 | false | | e85f82dc23644097 | started | etcd-3 | http://192.168.3.5:4380 | http://192.168.3.5:4379 | false | +------------------+---------+--------+-------------------------+-------------------------+------------+ # sky @ sky-HP in ~/data/etcdcluster [11:10:57] $ ./etcdctl --endpoints=$ENDPOINTS --write-out=table endpoint status +------------------+------------------+-----------+---------+-----------+------------+-----------+------------+--------------------+--------+ | ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS | +------------------+------------------+-----------+---------+-----------+------------+-----------+------------+--------------------+--------+ | 192.168.3.5:2379 | 1dd6eece19e43a7e | 3.5.0-pre | 25 kB | false | false | 9 | 56 | 56 | | | 192.168.3.5:3379 | 2be980070f5fd239 | 3.5.0-pre | 25 kB | true | false | 9 | 56 | 56 | | | 192.168.3.","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:2:1","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"基于Docker docker-compose文件 使用docker-compose来统一部署,基于quay.io/coreos/etcd:v3.4.14镜像. version:\"3.5\"services:etcd1:hostname:etcd1image:quay.io/coreos/etcd:v3.4.14container_name:etcd1restart:unless-stoppedports:- \"2379:2379\"- \"2380:2380\"volumes:- \"./etcd1.data:/etcd_data\"environment:- \"ETCD_ADVERTISE_CLIENT_URLS=http://192.168.20.151:2379\"- \"ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379\"- \"ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380\"- \"ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd1:2380\"- \"ALLOW_NONE_AUTHENTICATION=yes\"- \"ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380\"- \"ETCD_NAME=etcd1\"- \"ETCD_DATA_DIR=/etcd_data\"- \"ETCD_INITIAL_CLUSTER_STATE=new\"- \"ETCD_INITIAL_CLUSTER_TOKEN=token-1\"- \"TZ=Asia/Shanghai\"networks:- etcdclusteretcd2:hostname:etcd2image:quay.io/coreos/etcd:v3.4.14container_name:etcd2restart:unless-stoppedports:- \"12379:2379\"- \"12380:2380\"volumes:- \"./etcd2.data:/etcd_data\"environment:- \"ETCD_ADVERTISE_CLIENT_URLS=http://192.168.20.151:12379\"- \"ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379\"- \"ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380\"- \"ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd2:2380\"- \"ALLOW_NONE_AUTHENTICATION=yes\"- \"ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380\"- \"ETCD_NAME=etcd2\"- \"ETCD_DATA_DIR=/etcd_data\"- \"ETCD_INITIAL_CLUSTER_STATE=new\"- \"ETCD_INITIAL_CLUSTER_TOKEN=token-1\"- \"TZ=Asia/Shanghai\"networks:- etcdclusteretcd3:hostname:etcd3image:quay.io/coreos/etcd:v3.4.14container_name:etcd3restart:unless-stoppedports:- \"22379:2379\"- \"22380:2380\"volumes:- \"./etcd3.data:/etcd_data\"environment:- \"ETCD_ADVERTISE_CLIENT_URLS=http://192.168.20.151:22379\"- \"ETCD_LISTEN_CLIENT_URLS=http://0.0.0.0:2379\"- \"ETCD_LISTEN_PEER_URLS=http://0.0.0.0:2380\"- \"ETCD_INITIAL_ADVERTISE_PEER_URLS=http://etcd3:2380\"- \"ALLOW_NONE_AUTHENTICATION=yes\"- \"ETCD_INITIAL_CLUSTER=etcd1=http://etcd1:2380,etcd2=http://etcd2:2380,etcd3=http://etcd3:2380\"- \"ETCD_NAME=etcd3\"- \"ETCD_DATA_DIR=/etcd_data\"- \"ETCD_INITIAL_CLUSTER_STATE=new\"- \"ETCD_INITIAL_CLUSTER_TOKEN=token-1\"- \"TZ=Asia/Shanghai\"networks:- etcdclusternetworks:etcdcluster:name:etcdcluster 需要注意环境变量ETCD_ADVERTISE_CLIENT_URLS,被配置成了真实IP+映射后的端口,在客户端获取MemberList时会返回该变量的值,客户端会通过该值来访问etcd. 客户端会定时去同步参数,代码详见etcd/clientv3/client.go:autoSync,是通过定时器触发,调用MemberList方法来获取成员列表,并取其ClientURLs字段,来初始化客户端连接. 查看集群状态 # 查看容器运行状态. # zhou.yingan @ localhost in ~/docker/etcd [14:55:36] $ docker-compose ps Name Command State Ports -------------------------------------------------------------------------------------- etcd1 /usr/local/bin/etcd Up 0.0.0.0:2379-\u003e2379/tcp, 0.0.0.0:2380-\u003e2380/tcp etcd2 /usr/local/bin/etcd Up 0.0.0.0:12379-\u003e2379/tcp, 0.0.0.0:12380-\u003e2380/tcp etcd3 /usr/local/bin/etcd Up 0.0.0.0:22379-\u003e2379/tcp, 0.0.0.0:22380-\u003e2380/tcp # 查看节点状态. $ ./etcdctl --endpoints=$ENDPOINTS --write-out=table endpoint status +----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+------- -+| ENDPOINT | ID | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+------- -+| 192.168.20.151:2379 | 7ff9976e030eb51c | 3.4.14 | 25 kB | false | false | 2 | 43 | 43 | || 192.168.20.151:12379 | c8f871bdf8305be | 3.4.14 | 25 kB | false | false | 2 | 43 | 43 | || 192.168.20.151:22379 | b8dbeeea8e10767d | 3.4.14 | 25 kB | true | false | 2 | 43 | 43 | |+----------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+------- -+ # 查看集群节点列表. $ ./etcdctl --endpoints=$ENDPOINTS --write-out=table member list +------------------+---------+-------+-------------------+-----------------------------+------------+ | ID | STATUS | NAME | PEER ADDRS | CLIENT ADDRS | IS LEARNER | +------------------+---------+-------+-------------------+----------","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:2:2","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"基于clientv3与etcd交互 需要注意,etcd和grpc版本兼容性可能有问题,当前使用的版本: module etcdexample go 1.14 require ( github.com/gogo/protobuf v1.3.1 // indirect github.com/google/uuid v1.1.2 // indirect go.etcd.io/etcd v3.3.25+incompatible go.uber.org/zap v1.16.0 // indirect google.golang.org/grpc v1.34.0 // indirect ) replace go.etcd.io/etcd v3.3.25+incompatible =\u003e go.etcd.io/etcd v0.0.0-20200402134248-51bdeb39e698 replace google.golang.org/grpc v1.34.0 =\u003e google.golang.org/grpc v1.29.1 ","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:3:0","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"基本增删查改等操作 func main() { flag.Parse() endpoints := strings.Split(*addr, \",\") // 创建etcd Client对象. cli, err := clientv3.New( clientv3.Config{ Endpoints: endpoints, RejectOldCluster: true, }) if err != nil { fmt.Println(err) return } defer cli.Close() ctx := context.Background() // 设置租约为50秒. lease, err := cli.Grant(ctx, 50) if err != nil { fmt.Println(err) return } // 使用租约来设置key1-value1. _, err = cli.Put(ctx, \"key1\", \"value1\", clientv3.WithLease(lease.ID)) if err != nil { fmt.Println(err) return } // 使用前缀key,来获取数据. resp, err := cli.Get(ctx, \"key\", clientv3.WithPrefix()) if err != nil { fmt.Println(err) return } // 打印etcd返回的key-value. for i := range resp.Kvs { fmt.Println(\"key:\", string(resp.Kvs[i].Key), \" value:\", string(resp.Kvs[i].Value)) } // 根据租约ID来查询租约当前详细信息. tresp, err := cli.TimeToLive(ctx, lease.ID) if err != nil { fmt.Println(err) return } // 打印设置的ttl值和当前的ttl值. fmt.Println(tresp.GrantedTTL, tresp.TTL) // 根据租约ID来刷新当前的ttl,更新为所设置的ttl值. _, err = cli.KeepAliveOnce(ctx, lease.ID) if err != nil { fmt.Println(err) return } // 修改. _, err = cli.Put(ctx, \"key1\", \"value2\", clientv3.WithLease(lease.ID)) if err != nil { fmt.Println(err) return } // 删除key1,是精确匹配,也可采用前缀模式. _, err = cli.Delete(ctx, \"key1\") if err != nil { fmt.Println(err) return } } ","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:3:1","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"Watch操作 主要是对感兴趣的Key进行监听,事件包括增删改. wchan := cli.Watch(ctx, \"key\", clientv3.WithPrefix()) for resp := range wchan { err = resp.Err() if err != nil { fmt.Println(err) break } if resp.IsProgressNotify() { continue } for _, event := range resp.Events { if event.IsCreate() { fmt.Println(\"创建:\", string(event.Kv.Key), string(event.Kv.Value), event.Type.String()) } else if event.IsModify() { if event.PrevKv != nil { fmt.Println(\"修改前:\", string(event.PrevKv.Key), string(event.PrevKv.Value)) } fmt.Println(\"修改后:\", string(event.Kv.Key), string(event.Kv.Value), event.Type.String()) } else { fmt.Println(\"删除:\", string(event.Kv.Key), string(event.Kv.Value), event.Type.String()) } } } 注意:当修改时,Event对象中PrevKv字段的值为nil,而根据字段定义,该字段是保存事件发生之前的KV值,这里实际情况和字段定义不符. ","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:3:2","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"事务操作 etcd支持事务,提供了在一个事务中对多个key的更新功能,这一组key的操作要么全部成功,要么全部失败.是基于CAS方式来实现的. 主要提供的方法为: // 创建一个事务对象. Txn(ctx context.Context) Txn type Txn interface { // If takes a list of comparison. If all comparisons passed in succeed, // the operations passed into Then() will be executed. Or the operations // passed into Else() will be executed. // 条件. If(cs ...Cmp) Txn // Then takes a list of operations. The Ops list will be executed, if the // comparisons passed in If() succeed. // 条件成功会执行. Then(ops ...Op) Txn // Else takes a list of operations. The Ops list will be executed, if the // comparisons passed in If() fail. // 条件失败会执行. Else(ops ...Op) Txn // Commit tries to commit the transaction. // 提交事务. Commit() (*TxnResponse, error) } 具体调用代码如下: func txnTransfer(cli *clientv3.Client, from, to string, amount int64) { ctx := context.Background() // 在一个事务中同时获取from和to的数据. resp, err := cli.Txn(ctx).Then(clientv3.OpGet(from), clientv3.OpGet(to)).Commit() if err != nil { fmt.Println(err) return } // 获取键为from的数据. fromKV := resp.Responses[0].GetResponseRange().Kvs[0] // 获取对应的值,转换为int64. fromAmount, err := strconv.ParseInt(string(fromKV.Value), 10, 64) if err != nil { fmt.Println(err) return } // 获取键为to的数据. toKV := resp.Responses[1].GetResponseRange().Kvs[0] // 获取对应的值,转换为int64. toAmount, err := strconv.ParseInt(string(toKV.Value), 10, 64) if err != nil { fmt.Println(err) return } // 判断from账户的钱是否充足. if fromAmount \u003c amount { fmt.Println(\"money is not enough\") return } // 开启事务,根据If中的来判断,若满足条件则可以执行Then里的操作,Then里的操作要么全部成功要么全部失败. putresp, err := cli.Txn(ctx). If( // 判断etcd中from的ModRevision是否和之前查询出来的相同. clientv3.Compare(clientv3.ModRevision(from), \"=\", fromKV.ModRevision), // 判断etcd中to的ModRevision是否和之前查询出来的相同. clientv3.Compare(clientv3.ModRevision(to), \"=\", toKV.ModRevision)). Then( // 把from对应的值减去amount. clientv3.OpPut(from, strconv.Itoa(int(fromAmount-amount))), // 把to对应的值加上amount. clientv3.OpPut(to, strconv.Itoa(int(toAmount+amount)))). Commit() if err != nil { fmt.Println(err) return } if putresp.Succeeded { fmt.Println(\"执行成功\") } else { fmt.Println(\"执行失败\") } } ","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:3:3","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"高级扩展操作 分布式锁 主要提供的方法为: // 初始化sync.Locker对象. func NewLocker(s *Session, pfx string) sync.Locker // 申请锁. func (lm *lockerMutex) Lock() // 释放锁. func (lm *lockerMutex) Unlock() 具体调用代码如下: func uselock(cli *clientv3.Client, name string) { // 创建Session对象. sess, err := concurrency.NewSession(cli) if err != nil { fmt.Println(err) return } defer sess.Close() // 根据Session来创建sync.Locker对象. locker := concurrency.NewLocker(sess, name) // 加锁. locker.Lock() // 模拟业务逻辑处理. time.Sleep(30 * time.Second) // 释放锁. locker.Unlock() } 在两个终端上调用该函数时,第一个能顺利申请到锁,第二个会被阻塞直到第一个占用的锁被释放. 关注etcd中的数据. $ ./etcdctl --endpoints=$ENDPOINTS get test-lock --prefix test-lock/351c76848abb3131 test-lock/5be76848abafc1d 在调用concurrency.NewSession时,会设置ttl,默认为60秒,Session对象会持有对应的LeaseID,并会调用KeepAlive来续期,使得锁在Unlock之前一直是有效的,其它想抢占分布式锁的程序只能是等待.假如第一个终端里的程序在调用Unlock之前被提前终止了,所持有的锁实际并不会马上被释放,而是要等到租约到期,之后第二个终端的程序才能正常获取到分布式锁. Mutex 主要提供的方法为: // 初始化Mutex. func NewMutex(s *Session, pfx string) *Mutex // 尝试获取锁,若锁被占用获取失败会立即返回. func (m *Mutex) TryLock(ctx context.Context) error // 申请锁. func (m *Mutex) Lock(ctx context.Context) error // 释放锁. func (m *Mutex) Unlock(ctx context.Context) error // 是否是当前持有的锁. func (m *Mutex) IsOwner() v3.Cmp // 获取锁对应的key. func (m *Mutex) Key() string // 获取从etcd返回的数据的协议头. func (m *Mutex) Header() *pb.ResponseHeader 具体调用代码如下: func useMutex(cli *clientv3.Client, name string) { // 创建Session对象. sess, err := concurrency.NewSession(cli) if err != nil { fmt.Println(err) return } defer sess.Close() // 创建Mutex对象. m := concurrency.NewMutex(sess, name) ctx := context.Background() // 不是标准的sync.Locker接口,需要传入Context对象,在获取锁时可以设置超时时间,或主动取消请求. err = m.Lock(ctx) if err != nil { fmt.Println(err) return } // 获取Mutex对象的key. key := m.Key() fmt.Println(key) // 模拟业务逻辑处理. time.Sleep(30 * time.Second) // 释放锁. err = m.Unlock(ctx) if err != nil { fmt.Println(err) } } Mutex提供的TryLock方法,会尝试获取锁,如果该锁没有被其它Session获取,则加锁成功;如果已经被占用,则立马返回并报错mutex: Locked by another session. 读写锁 主要提供的方法为: // 初始化读写锁对象. func NewRWMutex(s *concurrency.Session, prefix string) *RWMutex // 申请读锁. func (rwm *RWMutex) RLock() error // 申请写锁. func (rwm *RWMutex) Lock() // 释放读锁. func (rwm *RWMutex) RUnlock() error // 释放写锁. func (rwm *RWMutex) Unlock() error 具体调用代码如下: // 加读锁. func useRWMutex(cli *clientv3.Client, name string) { sess, err := concurrency.NewSession(cli) if err != nil { fmt.Println(err) return } defer sess.Close() // 生成RWMutex读写锁对象. rw := recipe.NewRWMutex(sess, name) // 加读锁. err = rw.RLock() if err != nil { fmt.Println(err) return } fmt.Println(\"加读锁成功\") // 模拟业务逻辑处理. time.Sleep(30 * time.Second) // 释放锁. err = rw.RUnlock() if err != nil { fmt.Println(err) } } // 加写锁. func useRWMutexW(cli *clientv3.Client, name string) { time.Sleep(10 * time.Second) sess, err := concurrency.NewSession(cli) if err != nil { fmt.Println(err) return } defer sess.Close() // 生成RWMutex读写锁对象. rw := recipe.NewRWMutex(sess, name) // 加写锁. err = rw.Lock() if err != nil { fmt.Println(err) return } fmt.Println(\"加写锁成功\") // 模拟业务逻辑处理. time.Sleep(30 * time.Second) // 释放锁. err = rw.Unlock() if err != nil { fmt.Println(err) } } 同时启动3个goroutine,2个执行useRWMutex加读锁,1个执行useRWMutexW加写锁(先阻塞10秒,使读锁先加成功). 观察etcd的数据,总共有三个锁,两个读一个写,写被读阻塞.当读锁都被释放后,加写锁成功. $ ./etcdctl --endpoints=$ENDPOINTS get test-lock --prefix test-lock/read/1608689345032840000 test-lock/read/1608689345033840100 test-lock/write/1608689355033412000 分布式读写锁在加锁时是写锁优先还是读锁优先? 在等待锁的过程中是按照先进先出的原则来依次获取锁的.当前第一个Session获取到了读锁,第二个Session申请获取写锁被阻塞,第三个Session申请获取读锁也会被阻塞,第四个Session申请获取写锁也被阻塞.当第一个Session释放了读锁后,第二个Session获取到写锁,第三个和第四个Session仍然处于被阻塞状态.当第二个Session释放了写锁后,第三个Session获取到读锁,第四个Session仍然处于被阻塞状态.当第三个Session释放读锁后,第四个Session获取到写锁. 分布式队列 分布式队列(支持多读多写) 先进先出,主要提供的方法为: // 初始化队列对象. func NewQueue(client *v3.Client, keyPrefix string) *Queue // 入队. func (q *Queue) Enqueue(val string) error // 出队. func (q *Queue) Dequeue() (string, error) 具体调用代码: // 入队. func useQueueEn(cli *clientv3.Client, name string) { // 创建分布式队列. q := recipe.NewQueue(cli, name) // 入队. err := q.Enqueue(\"key1\") if err != nil { fmt.Println(err) return } //","date":"2020-12-23","objectID":"/2020/12/23/etcd-operation-guide/:3:4","tags":["go","etcd"],"title":"Etcd操作指南","uri":"/2020/12/23/etcd-operation-guide/"},{"categories":["Microservice"],"content":"问题起因 通过docker-compose部署了3个节点的etcd集群,服务在启动时会随机报0个、1个或2个告警信息,信息如下: { \"level\": \"warn\", \"ts\": \"2020-12-21T14:43:33.380+0800\", \"caller\": \"clientv3/retry_interceptor.go:62\", \"msg\": \"retrying of unary invoker failed\", \"target\": \"passthrough:///192.168.20.151:22379\", \"attempt\": 0, \"error\": \"rpc error: code = Canceled desc = context canceled\" } ","date":"2020-12-21","objectID":"/2020/12/21/etcd-application-start-warn/:1:0","tags":["go","etcd","grpc"],"title":"服务启动时连接etcd集群的告警分析","uri":"/2020/12/21/etcd-application-start-warn/"},{"categories":["Microservice"],"content":"问题分析 服务是基于go-zero框架的,etcd配置如下: Etcd:Hosts:- 192.168.20.151:2379- 192.168.20.151:12379- 192.168.20.151:22379Key:xxx.rpc 由于是服务启动时报的,追踪服务的启动过程.在启动时会向etcd注册服务的信息.首先是创建etcd客户端: // go-zero@v1.0.28/core/discov/internal/registry.go func DialClient(endpoints []string) (EtcdClient, error) { return clientv3.New(clientv3.Config{ Endpoints: endpoints, AutoSyncInterval: autoSyncInterval, DialTimeout: DialTimeout, DialKeepAliveTime: dialKeepAliveTime, DialKeepAliveTimeout: DialTimeout, RejectOldCluster: true, }) } 继续追踪clientv3.New方法,最终会调用newClient: // etcd/cleintv3/client.go func newClient(cfg *Config) (*Client, error) { if cfg == nil { cfg = \u0026Config{} } // 中间一大串可以忽略. ...... // 前面该参数是被设置为true的,就会调用到checkVersion. if cfg.RejectOldCluster { if err := client.checkVersion(); err != nil { client.Close() return nil, err } } go client.autoSync() return client, nil } func (c *Client) checkVersion() (err error) { var wg sync.WaitGroup eps := c.Endpoints() errc := make(chan error, len(eps)) ctx, cancel := context.WithCancel(c.ctx) if c.cfg.DialTimeout \u003e 0 { cancel() ctx, cancel = context.WithTimeout(c.ctx, c.cfg.DialTimeout) } wg.Add(len(eps)) // 由于配置了3个etcd节点的地址,这里会起3个goroutine. for _, ep := range eps { // if cluster is current, any endpoint gives a recent version go func(e string) { defer wg.Done() // 查询状态. resp, rerr := c.Status(ctx, e) if rerr != nil { errc \u003c- rerr return } // 解析版本号. vs := strings.Split(resp.Version, \".\") maj, min := 0, 0 if len(vs) \u003e= 2 { var serr error if maj, serr = strconv.Atoi(vs[0]); serr != nil { errc \u003c- serr return } if min, serr = strconv.Atoi(vs[1]); serr != nil { errc \u003c- serr return } } if maj \u003c 3 || (maj == 3 \u0026\u0026 min \u003c 2) { rerr = ErrOldCluster } errc \u003c- rerr }(ep) } // wait for success for range eps { // 只有errc这个channel内有数据且错误是nil,就会跳出循环. if err = \u003c-errc; err == nil { break } } // 跳出后就会调用cancel函数取消其它goroutine的查询状态操作. // 这里就会导致有rpc操作被取消了,对应到本文的错误信息. // 告警信息为什么有时候是0个,有时候又是1个或2个列?主要是看这里3个操作完成的时间,如果同时完成,cancel就没作用,就不会有告警信息了. cancel() wg.Wait() return err } ","date":"2020-12-21","objectID":"/2020/12/21/etcd-application-start-warn/:2:0","tags":["go","etcd","grpc"],"title":"服务启动时连接etcd集群的告警分析","uri":"/2020/12/21/etcd-application-start-warn/"},{"categories":["Microservice"],"content":"问题总结 当连接到是etcd集群且不允许连接老版本的集群,则这里的告警信息是正常的. 当碰到问题时多看源码. ","date":"2020-12-21","objectID":"/2020/12/21/etcd-application-start-warn/:3:0","tags":["go","etcd","grpc"],"title":"服务启动时连接etcd集群的告警分析","uri":"/2020/12/21/etcd-application-start-warn/"},{"categories":["Microservice"],"content":"问题起因 在采用开源框架go-zero开发的过程中,服务在启动一段时间后控制台一直在报如下警告信息: { \"level\": \"warn\", \"ts\": \"2020-12-15T16:43:21.709+0800\", \"caller\": \"clientv3/retry_interceptor.go:62\", \"msg\": \"retrying of unary invoker failed\", \"target\": \"endpoint://client-478bc374-adc9-4cec-9dcb-1c8245b9ad36/192.168.20.151:2379\", \"attempt\": 0, \"error\": \"rpc error: code = DeadlineExceeded desc = latest balancer error: all SubConns are in TransientFailure, latest connection error: c onnection error: desc = \\\"transport: Error while dialing dial tcp 0.0.0.0:2379: connectex: No connection could be made because the target machine actively refused it.\\\"\"} 看信息是在连接0.0.0.0:2379的时候失败了,但target明明是192.168.20.151:2379,怎么会变成0.0.0.0的列? ","date":"2020-12-19","objectID":"/2020/12/19/etcd-configure-error/:1:0","tags":["go","etcd","grpc"],"title":"Etcd配置错误后的分析","uri":"/2020/12/19/etcd-configure-error/"},{"categories":["Microservice"],"content":"问题分析 根据报错信息,有定位到是clientv3/retry_interceptor.go:62,那就在该文件打个断点看看到底是怎么回事? 定位错误产生的位置error \" 定位错误产生的位置 可以看到在调用gRPC接口时,Picker接口实际指向了errPicker,导致错误的发生. 回头再去分析调用链上的pickerWrapper.pick函数(来自文件grpc@v1.29.1/picker_wrapper.go),有如下代码: ch = pw.blockingCh p := pw.picker pw.mu.Unlock() pickResult, err := p.Pick(info) Picker来自于pw.picker字段,而该字段的变更只在如下代码中: // updatePicker is called by UpdateBalancerState. It unblocks all blocked pick. func (pw *pickerWrapper) updatePicker(p balancer.Picker) { pw.updatePickerV2(\u0026v2PickerWrapper{picker: p, connErr: pw.connErr}) } // updatePicker is called by UpdateBalancerState. It unblocks all blocked pick. func (pw *pickerWrapper) updatePickerV2(p balancer.V2Picker) { pw.mu.Lock() if pw.done { pw.mu.Unlock() return } pw.picker = p // pw.blockingCh should never be nil. close(pw.blockingCh) pw.blockingCh = make(chan struct{}) pw.mu.Unlock() } 继续在updatePicker函数断点,看是什么导致的变更? 跟踪updatePickerupdatePicker \" 跟踪updatePicker 调用链源头为ccBalancerWrapper.watcher函数(来自文件grpc@v1.29.1/balancer_conn_wrappers.go) // watcher balancer functions sequentially, so the balancer can be implemented // lock-free. func (ccb *ccBalancerWrapper) watcher() { for { select { case t := \u003c-ccb.scBuffer.Get(): ccb.scBuffer.Load() if ccb.done.HasFired() { break } ccb.balancerMu.Lock() su := t.(*scStateUpdate) if ub, ok := ccb.balancer.(balancer.V2Balancer); ok { ub.UpdateSubConnState(su.sc, balancer.SubConnState{ConnectivityState: su.state, ConnectionError: su.err}) } else { ccb.balancer.HandleSubConnStateChange(su.sc, su.state) } ccb.balancerMu.Unlock() case \u003c-ccb.done.Done(): } if ccb.done.HasFired() { ccb.balancer.Close() ccb.mu.Lock() scs := ccb.subConns ccb.subConns = nil ccb.mu.Unlock() for acbw := range scs { ccb.cc.removeAddrConn(acbw.getAddrConn(), errConnDrain) } ccb.UpdateState(balancer.State{ConnectivityState: connectivity.Connecting, Picker: nil}) return } } } 变化来自channel对象ccb.scBuffer,搜索该对象只在ccBalancerWrapper.handleSubConnStateChange函数中有Put操作. func (ccb *ccBalancerWrapper) handleSubConnStateChange(sc balancer.SubConn, s connectivity.State, err error) { // When updating addresses for a SubConn, if the address in use is not in // the new addresses, the old ac will be tearDown() and a new ac will be // created. tearDown() generates a state change with Shutdown state, we // don't want the balancer to receive this state change. So before // tearDown() on the old ac, ac.acbw (acWrapper) will be set to nil, and // this function will be called with (nil, Shutdown). We don't need to call // balancer method in this case. if sc == nil { return } ccb.scBuffer.Put(\u0026scStateUpdate{ sc: sc, state: s, err: err, }) } 在Put地方打上断点继续追踪 Put消息resetTransport \" Put消息 在调用链入口查看变量: addrs的值addrs \" addrs的值 可以看到地址为0.0.0.0:2379,找到地址来源了,与错误信息中的可以匹配. 继续追踪resetTransport,只会在addrConn.connect函数被调用,在此断点: 追踪connectconnect \" 追踪connect 可以看到在Sync函数(在文件etcd/clientv3/client.go)中获取的地址就是0.0.0.0:2379(来自Members的ClientURLs字段),而c.MemberList最终调用到如下:(来自文件etcd/etcdserver/etcdserverpb/rpc.pb.go,etcd当前最新版3.4.14的文件位置已发生变化) func (c *clusterClient) MemberList(ctx context.Context, in *MemberListRequest, opts ...grpc.CallOption) (*MemberListResponse, error) { out := new(MemberListResponse) err := grpc.Invoke(ctx, \"/etcdserverpb.Cluster/MemberList\", in, out, c.cc, opts...) if err != nil { return nil, err } return out, nil } 调用gRPC接口/etcdserverpb.Cluster/MemberList获取etcd集群的成员列表.而etcd集群的成员列表是由环境变量ETCD_ADVERTISE_CLIENT_URLS或参数--advertise-client-urls来控制的. 查看etcd的部署文件docker-compose.yml,发现配置为- \"ETCD_ADVERTISE_CLIENT_URLS=http://0.0.0.0:2379\",把配置修改为http://192.168.20.151:2379,该问题消失. ","date":"2020-12-19","objectID":"/2020/12/19/etcd-configure-error/:2:0","tags":["go","etcd","grpc"],"title":"Etcd配置错误后的分析","uri":"/2020/12/19/etcd-configure-error/"},{"categories":["Microservice"],"content":"总结 ","date":"2020-12-19","objectID":"/2020/12/19/etcd-configure-error/:3:0","tags":["go","etcd","grpc"],"title":"Etcd配置错误后的分析","uri":"/2020/12/19/etcd-configure-error/"},{"categories":["Microservice"],"content":"配置文档 在官网文档上其实已经提到了关于配置项的: Configuration What is the difference between listen-\u003cclient,peer\u003e-urls, advertise-client-urls or initial-advertise-peer-urls? listen-client-urls and listen-peer-urls specify the local addresses etcd server binds to for accepting incoming connections. To listen on a port for all interfaces, specify 0.0.0.0 as the listen IP address. advertise-client-urls and initial-advertise-peer-urls specify the addresses etcd clients or other etcd members should use to contact the etcd server. The advertise addresses must be reachable from the remote machines. Do not advertise addresses like localhost or 0.0.0.0 for a production setup since these addresses are unreachable from remote machines. 明确提到了不要把advertise-client-urls和initial-advertise-peer-urls设置成localhost或0.0.0.0,这些地址从远程机器是不能访问的,应该配置为具体的IP地址. ","date":"2020-12-19","objectID":"/2020/12/19/etcd-configure-error/:3:1","tags":["go","etcd","grpc"],"title":"Etcd配置错误后的分析","uri":"/2020/12/19/etcd-configure-error/"},{"categories":["Microservice"],"content":"流程 流程图流程 \" 流程图 ","date":"2020-12-19","objectID":"/2020/12/19/etcd-configure-error/:3:2","tags":["go","etcd","grpc"],"title":"Etcd配置错误后的分析","uri":"/2020/12/19/etcd-configure-error/"},{"categories":["Microservice"],"content":"部署 ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:1:0","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"基于Docker 所有组件基于Docker,后端存储采用Elasticsearch.docker-compose.yml文件如下: version:'2.2'services:elasticsearch:image:elasticsearch:6.8.13volumes:- \"data01:/usr/share/elasticsearch/data\"ports:- \"9200:9200\"networks:- elastic-jaegerenvironment:- TZ=Asia/Shanghai- node.name=es01- cluster.name=es-docker-cluster- discovery.type=single-node- http.host=0.0.0.0- transport.host=127.0.0.1- ES_JAVA_OPTS=-Xms512m -Xmx512m- xpack.security.enabled=falserestart:unless-stoppedjaeger-collector:image:jaegertracing/jaeger-collector:latestports:- 14268:14268- 14250:14250networks:- elastic-jaegerenvironment:- TZ=Asia/Shanghai- SPAN_STORAGE_TYPE=elasticsearchcommand:[\"--es.server-urls=http://elasticsearch:9200\",\"--es.num-shards=1\",\"--es.num-replicas=0\",\"--log-level=debug\"]restart:unless-stoppeddepends_on:- elasticsearchjaeger-agent:image:jaegertracing/jaeger-agent:latestports:- 6831:6831/udp- 6832:6832/udp- 5778:5778networks:- elastic-jaegerenvironment:- TZ=Asia/Shanghai- SPAN_STORAGE_TYPE=elasticsearchcommand:[\"--reporter.grpc.host-port=jaeger-collector:14250\",\"--log-level=debug\"]restart:unless-stoppeddepends_on:- jaeger-collectorjaeger-query:image:jaegertracing/jaeger-query:latestports:- 16686:16686- 16687:16687networks:- elastic-jaegerenvironment:- TZ=Asia/Shanghai- SPAN_STORAGE_TYPE=elasticsearch- no_proxy=localhostcommand:[\"--es.server-urls=http://elasticsearch:9200\",\"--span-storage.type=elasticsearch\",\"--log-level=debug\"]restart:unless-stoppeddepends_on:- jaeger-agentvolumes:data01:driver:localdriver_opts:type:nonedevice:./es/datao:bindnetworks:elastic-jaeger:driver:bridge 注意:Elasticsearch是以单节点方式启动的 执行如下命令: # 先启动Elasticsearch,等服务启动成功,通过docker logs jaeger_elasticsearch_1来观察. $ docker-compose up -d elasticsearch Creating network \"jaeger_elastic-jaeger\" with driver \"bridge\" Creating jaeger_elasticsearch_1 ... done # 再启动其它服务. $ docker-compose up -d jaeger_elasticsearch_1 is up-to-date Creating jaeger_jaeger-collector_1 ... done Creating jaeger_jaeger-agent_1 ... done Creating jaeger_jaeger-query_1 ... done # 查看状态. $ docker-compose ps Name Command State Ports --------------------------------------------------------------------------------------------------------------------------------------------- jaeger_elasticsearch_1 /usr/local/bin/docker-entr ... Up 0.0.0.0:9200-\u003e9200/tcp, 9300/tcp jaeger_jaeger-agent_1 /go/bin/agent-linux --repo ... Up 5775/udp, 0.0.0.0:5778-\u003e5778/tcp, 0.0.0.0:6831-\u003e6831/udp, 0.0.0.0:6832-\u003e6832/udp jaeger_jaeger-collector_1 /go/bin/collector-linux -- ... Up 0.0.0.0:14250-\u003e14250/tcp, 0.0.0.0:14268-\u003e14268/tcp jaeger_jaeger-query_1 /go/bin/query-linux --es.s ... Up 0.0.0.0:16686-\u003e16686/tcp, 0.0.0.0:16687-\u003e16687/tcp 如下地址需要注意: http://elasticsearch:9200,是Elasticsearch暴露的地址,供Collector和Query服务使用,通过参数es.server-urls来设置. http://127.0.0.1:16686,是Query服务对外暴露的地址,用来查看Jaeger UI. jaeger-collector:14250,是Collector暴露的gRPC端口,供Agent发送Span数据到Collector,通过参数reporter.grpc.host-port来设置. http://127.0.0.1:14268,是Collector暴露的http端口,供应用程序直接通过HTTP协议发送Span数据到Collector,端点为/api/traces. 127.0.0.1:6831,是Agent对外暴露的UDP端口,供应用程序通过UDP协议发送Span数据到Agent. ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:1:1","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"基于源码 下载源码: # 从github中下载. $ git clone git@github.com:jaegertracing/jaeger.git Cloning into 'jaeger'... remote: Enumerating objects: 34, done. remote: Counting objects: 100% (34/34), done. remote: Compressing objects: 100% (33/33), done. remote: Total 15400 (delta 7), reused 14 (delta 1), pack-reused 15366 Receiving objects: 100% (15400/15400), 19.79 MiB | 16.00 KiB/s, done. Resolving deltas: 100% (10130/10130), done. # 使用v1.21.0tag所对应的版本. $ git checkout -b 1.21.0 v1.21.0 Switched to a new branch '1.21.0' # 加载子模块. $ git submodule update --init --recursive Submodule 'idl' (https://github.com/jaegertracing/jaeger-idl.git) registered for path 'idl' Submodule 'jaeger-ui' (https://github.com/jaegertracing/jaeger-ui.git) registered for path 'jaeger-ui' Cloning into 'D:/project/jaeger/idl'... Cloning into 'D:/project/jaeger/jaeger-ui'... 依赖安装: # 安装yarn(facebook发布的一款取代npm的包管理工具),先要安装node.js. $ npm install -g yarn # 查看yarn版本. $ yarn --version 1.22.10 # 配置淘宝源. $ yarn config set registry https://registry.npm.taobao.org -g yarn config v1.22.10 success Set \"registry\" to \"https://registry.npm.taobao.org\". Done in 0.13s. $ yarn config set sass_binary_site http://cdn.npm.taobao.org/dist/node-sass -g yarn config v1.22.10 success Set \"sass_binary_site\" to \"http://cdn.npm.taobao.org/dist/node-sass\". Done in 0.07s. # 编译ui. $ make build-ui 在Windows下编译各个服务: # 编译agent服务. PS D:\\project\\jaeger\u003e go build .\\cmd\\agent # 编译collector服务. PS D:\\project\\jaeger\u003e go build .\\cmd\\collector # 编译query服务. PS D:\\project\\jaeger\u003e go build -tags=ui .\\cmd\\query 在Windows下启动服务: # 启动Collector. D:\\project\\jaeger\u003ecollector.exe --span-storage.type=elasticsearch --es.server-urls=http://192.168.20.151:9200 --es.num-shards=1 --es.num-replicas=0 --log-level=debug # 启动Agent,指定Collector地址. D:\\project\\jaeger\u003eagent.exe --reporter.grpc.host-port=127.0.0.1:14250 --log-level=debug # 启动Query. D:\\project\\jaeger\u003equery.exe --es.server-urls=http://192.168.20.151:9200 --log-level=debug ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:1:2","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"基于Golang使用Jaeger ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:2:0","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"初始化Jaeger Tracer 可以采用UDP协议连接到Agent上,也可以采用HTTP协议直连到Collector上,两个参数是互斥的. cfg := jaegercfg.Configuration{ ServiceName: \"goods\", // 采样策略,这里使用Const,全部采样. Sampler: \u0026jaegercfg.SamplerConfig{ Type: \"const\", Param: 1.0, }, Reporter: \u0026jaegercfg.ReporterConfig{ BufferFlushInterval: time.Second, LocalAgentHostPort: \"192.168.20.153:6831\", // 采用UDP协议连接Agent. //CollectorEndpoint: \"http://192.168.20.153:14268/api/traces\", // 采用HTTP协议直连Collector }, } // 根据配置生成Tracer对象,启用Span的内存池. tracer, closer, err := cfg.NewTracer(jaegercfg.PoolSpans(true)) if err != nil { fmt.Println(err) return } // 调用Close,释放资源. defer closer.Close() // 注册为opentracing里的GlobalTracer对象. opentracing.SetGlobalTracer(tracer) ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:2:1","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"函数追踪 直接利用opentracing.StartSpan来创建Span,其OperationName为zerosql. func tracedSQL() { span := opentracing.StartSpan(\"zerosql\") defer span.Finish() dsn := \"root:123456@tcp(127.0.0.1:3306)/xxx?charset=utf8\u0026parseTime=true\u0026loc=Local\" db := NewZeroMysql(dsn) goodsql := \"select * from goods where goodsid = ?\" var goodsinfo GoodsInfo err := db.QueryRow(\u0026goodsinfo, goodsql, 1) if err != nil { fmt.Println(err) return } fmt.Println(goodsinfo) } ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:2:2","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"http中间件 // 主要记录http的statuscode. type withHTTPCodeResponse struct { writer http.ResponseWriter code int } func (w *withHTTPCodeResponse) Header() http.Header { return w.writer.Header() } func (w *withHTTPCodeResponse) Write(bytes []byte) (int, error) { return w.writer.Write(bytes) } func (w *withHTTPCodeResponse) WriteHeader(code int) { w.writer.WriteHeader(code) w.code = code } // HttpTracing http.Handler. func HttpTracing(next http.HandlerFunc) http.HandlerFunc { return func(w http.ResponseWriter, r *http.Request) { // 获取全局tracer对象. tracer := opentracing.GlobalTracer() // 尝试从http的Header中提取上游的SpanContext. spanCtx, _ := tracer.Extract( opentracing.HTTPHeaders, opentracing.HTTPHeadersCarrier(r.Header)) span := tracer.StartSpan(r.RequestURI, opentracing.ChildOf(spanCtx)) ext.HTTPMethod.Set(span, r.Method) defer span.Finish() cw := \u0026withHTTPCodeResponse{writer: w} rc := opentracing.ContextWithSpan(r.Context(), span) r = r.WithContext(rc) defer func() { // 设置statuscode和error. ext.HTTPStatusCode.Set(span, uint16(cw.code)) if cw.code \u003e= http.StatusBadRequest { ext.Error.Set(span, true) } }() next(cw, r) } } ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:2:3","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"gRPC一元客户端拦截器 // OpenTracingClientInterceptor grpc unary clientinterceptor. func OpenTracingClientInterceptor() grpc.UnaryClientInterceptor { return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error { // 从Context中尝试提取Span对象. span := ChildOfSpanFromContext(ctx, method) // 设置tag. ext.Component.Set(span, \"grpc\") ext.SpanKindRPCClient.Set(span) defer span.Finish() carrier := make(opentracing.TextMapCarrier) tracer := opentracing.GlobalTracer() // 把SpanContext注入到Carrier中. err := tracer.Inject(span.Context(), opentracing.TextMap, carrier) if err == nil { var pairs []string _ = carrier.ForeachKey(func(key, val string) error { pairs = append(pairs, key, val) return nil }) // 然后把这些数据放入到gRPC的HEADER中. ctx = metadata.AppendToOutgoingContext(ctx, pairs...) } err = invoker(ctx, method, req, reply, cc, opts...) if err != nil { // 若返回错误,把错误日志记录到Span中. ext.LogError(span, err) } return err } } // ChildOfSpanFromContext 根据context中的span生成ChildOf的span. func ChildOfSpanFromContext(ctx context.Context, operationName string) opentracing.Span { return newSubSpanFromContext(ctx, operationName, opentracing.ChildOf) } func newSubSpanFromContext( ctx context.Context, operationName string, op func(opentracing.SpanContext) opentracing.SpanReference) opentracing.Span { tracer := opentracing.GlobalTracer() span := opentracing.SpanFromContext(ctx) if span == nil { span = tracer.StartSpan(operationName) } else { span = tracer.StartSpan(operationName, op(span.Context())) } return span } ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:2:4","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"gRPC一元服务端拦截器 // OpenTracingServerInterceptor grpc unary serverinterceptor. func OpenTracingServerInterceptor() grpc.UnaryServerInterceptor { return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (resp interface{}, err error) { md, ok := metadata.FromIncomingContext(ctx) var spanCtx opentracing.SpanContext tracer := opentracing.GlobalTracer() if ok { // 从gRPC的元数据中提取SpanContext. carrier := make(opentracing.TextMapCarrier) for k, v := range md { carrier.Set(k, v[0]) } // 提取成功,生成子Span对象. spanCtx, _ = tracer.Extract(opentracing.TextMap, carrier) } span := tracer.StartSpan(info.FullMethod, ext.RPCServerOption(spanCtx)) // 设置tag. ext.Component.Set(span, \"grpc\") defer span.Finish() // 把Span对象放入到Context中,来传递到业务代码中. ctx = opentracing.ContextWithSpan(ctx, span) resp, err = handler(ctx, req) if err != nil { ext.LogError(span, err) } return } } ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:2:5","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"参考 tracing库 ","date":"2020-12-17","objectID":"/2020/12/17/opentracing-jaeger-action/:3:0","tags":["OpenTracing"],"title":"Jaeger实战","uri":"/2020/12/17/opentracing-jaeger-action/"},{"categories":["Microservice"],"content":"系统架构 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:0","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"组件 jaeger即可以作为单体应用来部署(所有jaeger后台组件全部运行在一个进程内),也可以作为一个可扩展的分布式系统来部署.如下所述.有两个主要的部署选项: Collectors直接把数据写入到存储中. Collectors把数据写入到kafka中,作为缓冲,再异步写入存储. 第一种情况,直接写入存储: 直接写入存储的架构图架构图1 \" 直接写入存储的架构图 第二种情况,先写入kafka: 写入kafka的架构图架构图2 \" 写入kafka的架构图 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:1","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"客户端库 jaeger客户端是基于OpenTracing API的特定语言实现.它们可以为应用程序提供分布式追踪的能力,并能与已经集成OpenTracing的开源框架(如Flask,Dropwizard,gRPC等等)一起很好的协作. 当服务接收到新请求时会同时收到附加的上下文信息(trace id,span id和baggage),服务会基于上下文创建Span,然后继续往下游发送.在请求中仅仅相关id和baggage会被传播;而其它数据,如operation name,timing,tags和logs都不会被传播.这些数据会在后台被异步发送给jaeger服务. 为了降低负载,jager客户端提供了各种不同的采样策略.当一个trace被采样,其所有的span数据都会被抓取并被发送到jaeger服务.如果trace没有被采样,其所有的数据都不会被收集,并会对OpenTracing API实行熔断以减少开销.默认情况下,客户端的采样率为0.1%,并能从jaeger服务获取到采样策略(见上面架构图的adaptive sampling). 传播传播 \" 传播 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:2","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Agent agent是一个网络代理服务,主要基于UDP协议来接收Span,然后把Span分批的发送给收集器.它是作为基础结构组件部署到所有主机上.该代理抽象的将收集器的路由和发现从客户端分离. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:3","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Collector collector从agent接收追踪数据,然后基于数据进行一系列的操作,当前操作有验证数据,建立索引,执行相关转换,最后存储. 存储是基于插件化设计的,当前支持Cassandra, Elasticsearch and Kafka. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:4","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Query 查询服务,从存储中检索trace,然后在UI上展示. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:5","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Ingester 从kafka中读取数据,然后写入其它的存储中(如Cassandra, Elasticsearch) ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:1:6","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Reporting APIs Agent和Collector这两个组件都能接收Span数据.目前它们支持两组不重叠的API. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:2:0","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Thrift over UDP Agent只能基于UDP协议接收Thrift格式的Span数据.主要的API是处理的UDP包,其中包含jaeger.thrift IDL文件中定义的Thrift编码的Batch结构,该IDL文件位于jaeger-idl代码库中.大多数客户端库使用Thrift紧凑模式编码,但有些客户端库不支持(如Node.js),其会使用Thrift二进制模式编码(发送到不同的UDP端口上).API都定义在agent.thrift IDL文件中. 由于历史原因,Agent还能接收Zipkin格式的Span数据,但只有老版本的客户端库才能发送该格式的数据,目前官方已经正式弃用了. Port Protocol Component Function 5775 UDP agent 接收zipkin.thrift的thrift协议的紧凑格式(已弃用,老版本客户端能使用) 6831 UDP agent 接收jaeger.thrift的thrift协议的紧凑格式 6832 UDP agent 接收jaeger.thrift的thrift协议的二进制格式 5778 HTTP agent 服务配置 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:2:1","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"gRPC 在典型的jaeger部署中,Agent会从客户端接收Span数据然后将其转发给Collector.从jaeger 1.11版本开始,Agent与Collector之间的官方推荐协议是gRPC与Protobuf,协议定义在collector.proto IDL文件中. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:2:2","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Thrift over HTTP 在某些情况下,将Agent和应用程序部署在一起是不可行的,比如当应用程序代码作为AWS的Lambda函数运行时.在这些场景下,jaeger客户端可以通过HTTP/HTTPS协议直接把Span数据发送给Collector. 基于相同jaeger.thrift IDL文件定义的数据可以以HTTP Post请求被提交给/api/traces端点,如https://jaeger-collector:14268/api/traces.Batch结构需要使用Thrift二进制格式编码,在HTTP头中要设置特殊的content type,Content-Type: application/vnd.apache.thrift.binary. Port Protocol Component Function 14268 HTTP collector 直接从客户端接收jaeger.thrift ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:2:3","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Zipkin格式 Collector也可以接受来自Zipkin格式的Span数据,基于JSON v1/v2和Thrift.Collector需要配置为启用Zipkin HTTP服务,如在9411端口启用被当做Zipkin收集器.服务启用了两个端点来接受POST请求: /api/v1/spans,用来接受Zipkin的JSON v1或Thrift格式的数据. /api/v2/spans,用来接受Zipkin的JSON v2格式的数据. Port Protocol Component Function 9411 HTTP collector 适配Zipkin(可选) ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:2:4","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"采样策略 Jaeger库实现了一致性的前端(或基于头部的)采样.举例,假设我们有一个简单的调用图,其中服务A调用服务B,服务B调用服务C:A-\u003eB-\u003eC.当服务A接收到一个没有包含追踪信息的请求时,会开启一个新的trace,并分配一个随机的trade ID,然后基于当前的采样策略会对该trace做出一个采样决策.这个决策会跟随这个请求传播到服务B和服务C,因此服务B和C不在需要进行采样决策,而是尊重顶级服务A的采样决策.这种方法保证了如果一个trace被采样了,其所有的spans都会被记录到Jaeger服务中.如果每个服务都在做自己的采样决策,那么在Jaeger服务中很少能获取到完整的追踪信息. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:3:0","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"客户端采样配置 当使用配置对象去实例化一个tracer时,可以根据sampler.type和sampler.param属性来选择采样类型.Jaeger库支持如下采样策略: Constant(sampler.type=const),该采样策略会对所有trace采用相同的决策.如果sampler.param=1将对所有trace都采样,如果sampler.param=0将都不睬采样. Probabilistic(sampler.type=probabilistic),该采样策略会基于sampler.param属性值做出一个概率性的随机决策.例如,sampler.param=0.1时大约在10个trace中会有1个被采样. Rate Limiting(sampler.type=ratelimiting),该采样策略使用一个漏洞限流器来确保trace会按照一个恒定的速率被采样.例如,sampler.param=2.0时会按照每秒钟2个trace的速率来采样. Remote(sampler.type=remote,也是默认策略),会向Jaeger Agent咨询在当前服务中使用的合适的采样策略.这允许从Jaeger服务的中心配置来控制服务的采样策略,甚至是动态的(参见自适应采样). ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:3:1","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"自适应采样 自适应采样策略是一个组合策略,结合了两个功能: 它在每个操作的基础上进行采样决策,即基于span的操作名称.这在API服务中是特别有用的,这些服务的端点可能有非常不同的流量,并且对整个服务使用单一的概率采样策略可能会导致饿死(无法采样)一些低QPS的端点. 它支持最小保证的采样率,如总是允许每秒最多N次trace,然后以一定的概率对超过这个频率的任何数据进行采样(所有操作都是按操作而不是按服务进行的). 每个操作参数都可以静态配置或定期从Jaeger后端拉取(基于Remote采样策略).自适应采样策略是设计与Jaeger后端即将到来的自适应采样功能一起工作的. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:3:2","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Collector采样配置 收集器可以使用静态采样策略(如果使用Remote采样配置,则将其传播到相应的服务),通过--sampling.strategies-file选项.此选项需要一个json文件的路径,该文件定义了采样策略. 如果没有相应的配置项,收集器会对所有服务返回默认的采样率为0.001(0.1%)的随机采样策略. 举例strategies.json: { \"service_strategies\": [ { \"service\": \"foo\", \"type\": \"probabilistic\", \"param\": 0.8, \"operation_strategies\": [ { \"operation\": \"op1\", \"type\": \"probabilistic\", \"param\": 0.2 }, { \"operation\": \"op2\", \"type\": \"probabilistic\", \"param\": 0.4 } ] }, { \"service\": \"bar\", \"type\": \"ratelimiting\", \"param\": 5 } ], \"default_strategy\": { \"type\": \"probabilistic\", \"param\": 0.5, \"operation_strategies\": [ { \"operation\": \"/health\", \"type\": \"probabilistic\", \"param\": 0.0 }, { \"operation\": \"/metrics\", \"type\": \"probabilistic\", \"param\": 0.0 } ] } } service_strategies元素定义了相关服务的特殊的采样策略,operation_strategies定义了相关操作的特殊的采样策略.这儿用到了两种策略:probabilistic和ratelimiting,可以参见客户端采样配置(注意:ratelimiting不支持operation_strategies).default_strategy定义默认采样策略,适用于所有没有包含在service_strategies里的服务. 在上面的例子中: 服务foo的所有操作会按照0.8的概率被采用,除了操作op1和op2,op1会按照0.2的概率,op2会按照0.4的概率. 服务bar的所有操作会按照每秒5个trace的速率来采样. 其它任何服务都会按照default_strategy中定义的0.5的概率被采样. default_strategy还包括每个操作共享的策略.在这个例子中我们通过使用0概率来禁用了所有服务中对包含/health和/metrics端点的追踪.这些每个操作策略将应用于配置中未列出的任何新服务，以及foo和bar服务，除非它们为这两个操作定义了自己的策略。 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:3:3","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"部署 主要的Jaeger后端组件已经作为镜像发布在Docker Hub上: 组件 仓库地址 jaeger-agent hub.docker.com/r/jaegertracing/jaeger-agent jaeger-collector hub.docker.com/r/jaegertracing/jaeger-collector jaeger-query hub.docker.com/r/jaegertracing/jaeger-query jaeger-ingester hub.docker.com/r/jaegertracing/jaeger-ingester 下面是为了运行Jaeger而编排的模板: Kubernetes: jaeger-kubernetes OpenShift: jaeger-openshit ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:0","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"配置选项 Jaeger二进制文件可以采用下面几种方式来配置(优先级递减): 命令行参数. 环境变量. 配置文件(JSON,TOML,YAML,HCL等格式)或Java属性格式. 要查看选项的完整列表,可以通过运行二进制文件的help命令或参考CLI Flags页以获得更多信息.只有在选择存储类型时,才会列出特定于某个存储后端的选项.当在使用Cassandra存储时查看所有可用的选项,使用命令docker run --rm -e SPAN_STORAGE_TYPE=cassandra jaegertracing/jaeger-collector:1.21 help. 为了通过环境变量来提供配置参数,查找相应的命令行选项并将其名称转换为大写格式.如: 命令行选项 环境变量 –cassandra.connections-per-host CASSANDRA_CONNECTIONS_PER_HOST –metrics-backend METRICS_BACKEND ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:1","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Agent Jaeger客户端库期望jaeger-agent进程在每个主机上本地运行.agent暴露的端口信息: 端口号 协议 功能 6831 UDP 接收紧凑格式的jaeger.thrift协议,大多数Jaeger客户端使用的 6832 UDP 接收二进制格式的jaeger.thrift协议,主要是Node.js客户端使用 5778 HTTP 服务配置,采样策略 5775 UDP 接收紧凑格式的zipkin.thrift协议(已弃用,老客户端使用) 14271 HTTP 管理端口:健康检查/和指标/metrics 可以在主机上直接运行或采用Docker方式运行: ## make sure to expose only the ports you use in your deployment scenario! docker run \\ --rm \\ -p6831:6831/udp \\ -p6832:6832/udp \\ -p5778:5778/tcp \\ -p5775:5775/udp \\ jaegertracing/jaeger-agent:1.21 服务发现集成 Agents可以点对点连接到单个Collector上,也可以依赖其它基础组件(如NDS)在多个Collector之间负载均衡.Agent也可以配置一个静态的Collector服务地址的列表. 采用Docker启动,命令行如下: docker run \\ --rm \\ -p5775:5775/udp \\ -p6831:6831/udp \\ -p6832:6832/udp \\ -p5778:5778/tcp \\ jaegertracing/jaeger-agent:1.21 \\ --reporter.grpc.host-port=jaeger-collector.jaeger-infra.svc:14250 当使用gRPC时,在负载均衡和名称解析上有几个选项: 单个连接没有负载均衡.这是默认采用的方式host:port.(例如:--reporter.grpc.host-port=jaeger-collector.jaeger-infra.svc:14250) 静态主机名列表和轮训负载均衡.地址之间采用逗号分隔.(例如:--reporter.grpc.host-port=jaeger-collector1:14250,jaeger-collector2:14250,jaeger-collector3:14250) 动态DNS解析和轮训负载均衡.要获得这种能力,需要在地址前加上前缀dns:///,gRPC将尝试使用SRV记录(用于外部负载均衡)、TXT记录(用于服务配置)和A记录来解析主机名.参考gRPC名称解析文档和dns_resolver获取更多信息.(例如:--reporter.grpc.host-port=dns:///jaeger-collector.jaeger-infra.svc:14250) 代理层标签 Jaeger支持代理层标签,这些标签可以添加到所有的通过Agent传输的spans中.通过命令行参数--jaeger.tags=key1=value1,key2=value2,...,keyn=valuen,也可以通过环境变量--jaeger.tags=key=${envFlag:defaultValue},标签值将会被设置为envFlag环境变量的值,如没有设置,则标签值默认设置为defaultValue. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:2","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"收集器 Collector是无状态的且可以并行的运行多个jaeger-collector实例.Collector大部分是不需要配置的,除了使用Cassandra集群时的--cassandra.keyspace和--cassandra.servers选项,或则使用Elasticsearch集群时的--es.server-urls选项,依赖于特定的存储.用如下命令查看所有命令行选项go run ./cmd/collector/main.go -h,或则没有源码时使用如下命令docker run -it --rm jaegertracing/jaeger-collector:1.21 -h. Collector默认暴露的端口号: 端口号 协议 功能 14250 gRPC jaeger-agent使用,用来发送model.proto格式的span到Collector 14268 HTTP 可以直接从客户端接受二进制的thrift协议的数据 9411 HTTP 可以接受Zipkin的span数据,支持Thrift,JSON和Proto(默认是禁用) 14269 HTTP 管理端口:在/的健康检查和在/metrics的指标 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:3","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"后端存储 Collector需要一个支持持久化的后端存储,Cassandra和ElasticSearch是主要支持的存储后端,其它后端可参考这里. 存储类型是根据环境变量SPAN_STORAGE_TYPE来的,其值可以为cassandra,elasticsearch,kafka(仅用于缓存),grpc-plugin,badger(仅用于all-in-one)和memory(仅用于all-in-one). 在版本1.6.0之后,通过向环境变量SPAN_STORAGE_TYPE提供以逗号分隔的多个有效类型列表,可以同时使用多个存储类型.存储列表对应的所有存储都会写入,但只有列表第一个类型对应的存储负责提供读和归档. 对于大规模生产部署,Jaeger团队推荐Elasticsearch后端超过Cassandra. Memory in-memory存储不是为了生产环境准备的,主要是为了快速搭建环境而使用,当程序终止时数据将丢失,不具备持久化功能. 默认情况下,在内存中是不会限制trace的数量的,但也可以通过参数--memory.max-traces来设置. Badger - 本地存储 1.9版本之后的实验性功能. Badger是嵌入式本地存储器,只适用于all-in-one发行版.默认情况下,它充当临时存储,使用临时文件系统.这些文件可以通过参数--badger.ephemeral=false来重写. docker run \\ -e SPAN_STORAGE_TYPE=badger \\ -e BADGER_EPHEMERAL=false \\ -e BADGER_DIRECTORY_VALUE=/badger/data \\ -e BADGER_DIRECTORY_KEY=/badger/key \\ -v \u003cstorage_dir_on_host\u003e:/badger \\ -p 16686:16686 \\ jaegertracing/all-in-one:1.21 Cassandra 支持版本为3.4+.部署Cassandra可参考文档Apache Cassandra Docs. 配置 最简化命令:docker run -e SPAN_STORAGE_TYPE=cassandra -e CASSANDRA_SERVERS=\u003c...\u003e jaegertracing/jaeger-collector:1.21 所有选项,通过如下命令可查看所有配置选项:docker run -e SPAN_STORAGE_TYPE=cassandra jaegertracing/jaeger-collector:1.21 --help Schema脚本 提供了一个使用Cassandra的交互式shellcqlsh来初始化Cassandra的keyspace和schema的脚本:MODE=test sh ./plugin/storage/cassandra/schema/create.sh | cqlsh 在生产环境下,传递MODE=prod DATACENTER={datacenter}参数给该脚本,其中{datacenter}是Cassandra配置/网络拓扑中使用的名称. 该脚本允许覆盖TTL,keyspace名称,复制因子等.运行不带参数的脚本,可查看可识别参数的完整列表. TLS支持 Jaeger支持在客户端与所配置的Cassandra集群之间采用TLS连接.在例如cqlsh的验证之后,可使用如下命令配置Collector: docker run \\ -e CASSANDRA_SERVERS=\u003c...\u003e \\ -e CASSANDRA_TLS=true \\ -e CASSANDRA_TLS_SERVER_NAME=\"CN-in-certificate\" \\ -e CASSANDRA_TLS_KEY=\u003cpath to client key file\u003e \\ -e CASSANDRA_TLS_CERT=\u003cpath to client cert file\u003e \\ -e CASSANDRA_TLS_CA=\u003cpath to your CA cert file\u003e \\ jaegertracing/jaeger-collector:1.21 schema工具也支持TLS,可以使用如下自定义的cqlshrc文件: # Creating schema in a cassandra cluster requiring client TLS certificates. # # Create a volume for the schema docker container containing four files: # cqlshrc: this file # ca-cert: the cert authority for your keys # client-key: the keyfile for your client # client-cert: the cert file matching client-key # # if there is any sort of DNS mismatch and you want to ignore server validation # issues, then uncomment validate = false below. # # When running the container, map this volume to /root/.cassandra and set the # environment variable CQLSH_SSL=--ssl [ssl] certfile = ~/.cassandra/ca-cert userkey = ~/.cassandra/client-key usercert = ~/.cassandra/client-cert # validate = false Elasticsearch 从Jaeger0.6版本之后开始支持,支持5.x,6.x,7.x版本的Elasticsearch. Elasticsearch版本会自动从根/ping端点检索.基于这些版本Jaeger使用兼容的索引映射和Elasticsearch REST API.版本可以通过参数--es.version=来指定. 除了安装和运行Elasticsearch外,Elasticsearch不需要初始化.一旦运行之后,会将正确的配置值传递给Jaeger收集器和查询服务. 配置 最简化命令:docker run -e SPAN_STORAGE_TYPE=elasticsearch -e ES_SERVER_URLS=\u003c...\u003e jaegertracing/jaeger-collector:1.21 所有选项,通过如下命令可查看所有配置选项:docker run -e SPAN_STORAGE_TYPE=elasticsearch jaegertracing/jaeger-collector:1.21 --help Elasticsearch索引的分片和副本 分片和副本有些配置值是需要特别关注的,因为这些会决定索引的创建.这篇文章将详细介绍如何选择多少个分片在优化时. Kafka 从Jaeger1.6.0版本之后开始支持,支持0.9+版本的Kafka. Kafka主要是作为收集器和真实的存储之间的临时缓存.收集器被配置为SPAN_STORAGE_TYPE=kafka,使得收集器上接收到的所有spans会被写入到一个Kafka的topic中.在1.7.0版本新增的一个组件Ingester,用来从Kafka中读取然后存储spans到另一个存储后端(如Elasticsearch或Cassandra). 写入Kafka对于构建后处理数据管道特别有用. 配置 最简化命令:docker run -e SPAN_STORAGE_TYPE=kafka -e KAFKA_PRODUCER_BROKERS=\u003c...\u003e -e KAFKA_TOPIC=\u003c...\u003e jaegertracing/jaeger-collector:1.21 所有选项,通过如下命令可查看所有配置选项:docker run -e SPAN_STORAGE_TYPE=kafka jaegertracing/jaeger-collector:1.21 --help Topic和分区 除非Kafka集群被配置为自动创建topic,否则应该提前创建好topic.可以参考Kafka快速入门文档来了解如何做到这一点. 在官方文档可了解到更多关于topic和分区的信息.这篇文章关于如何选择分区的数量有更多的细节. 存储插件 Jaeger支持基于gRPC协议的存储插件.更多细节参考jaeger/plugin/storage/grpc 当前可用插件: InfluxDB Logz.io,安全,可扩展,可管理,基于云的ELK存储. 使用如下命令: docker run \\ -e SPAN_STORAGE_TYPE=grpc-plugin \\ -e GRPC_STORAGE_PLUGIN_BINARY=\u003c...\u003e \\ -e GRPC_STORAGE_PLUGIN_CONFIGURATION_FILE=\u003c...\u003e \\ jaegertracing/all-in-one:1.21 ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:4","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Ingester jaeger-ingester是一个服务用来从Kafka的topic中读取spans数据,然后写入到另一个后端存储中(Elasticsearch或Cassandra). 端口号 协议 功能 14270 HTTP 管理端口:在/端点的健康检查和在/metrics的指标 使用如下命令查看所有配置选项:docker run -e SPAN_STORAGE_TYPE=cassandra jaegertracing/jaeger-ingester:1.21 --help ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:5","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"Query Service \u0026 UI jaeger-query服务提供API和一个React/Javascript的UI.服务是无状态的,经常是运行在负载均衡器之后,如NGINX. 有如下端口会暴露: 端口号 协议 功能 16686 HTTP /api/*端点,UI在/ 16686 gRPC Protobuf/gRPC 查询服务 16687 HTTP 管理端口:在/端点的健康检查和在/metrics的指标 最小依赖例子(Elasticsearch为后端存储): docker run -d --rm \\ -p 16686:16686 \\ -p 16687:16687 \\ -e SPAN_STORAGE_TYPE=elasticsearch \\ -e ES_SERVER_URLS=http://\u003cES_SERVER_IP\u003e:\u003cES_SERVER_PORT\u003e \\ jaegertracing/jaeger-query:1.21 时钟偏移调整 Jaeger后端会聚合来自不同主机的应用程序的追踪数据.主机上的硬件时钟经常会经历相对漂移,称为时钟偏移效应.时钟偏移会使得追踪数据的推断变得困难,假如,当一个服务的span有可能比客户端的span更早产生,这种情况是不可能的.查询服务实现了一个时钟偏移调整的算法,使用span之间的因果关系知识来校正时钟偏移.所有被调整的span在UI上会显示一个告警信息,提供应用其时间戳的精确时钟偏移增量. 有时这些调整本身使得追踪难以理解.例如,当在父span的范围内重新定位服务span时,Jaeger不知道请求和响应延迟之间的确切关系,因此假设两者相等,并将子span放在父span的中间(参见issue #961). 查询服务支持一个配置选项--query.max-clock-skew-adjustment,用来控制多少时钟偏移调整是被允许的.如果该参数被设置为zero(0s)则会完全禁用时钟偏移调整.此设置适用于从给定查询服务检索的所有追踪.有一个开放标签#197来支持在UI中直接切换调整的开启和关闭. UI基础Path 针对所有jaeger-query的HTTP路由的基础路径可以设置为一个非root的值,如/jaeger,会导致所有UI URLs会已/jaeger开头.当在反向代理后运行的jaeger-query是非常有用的. 可以通过命令行参数--query.base-path或环境变量QUERY_BASE_PATH来配置基础路径. ","date":"2020-12-14","objectID":"/2020/12/14/opentracing-jaeger-guide/:4:6","tags":["OpenTracing"],"title":"Jaeger指南","uri":"/2020/12/14/opentracing-jaeger-guide/"},{"categories":["Microservice"],"content":"快速启动 可以通过http://your_host:9411去访问zipkin UI. Docker docker run -d -p 9411:9411 openzipkin/zipkin Java 需要Java8或更高版本. curl -sSL https://zipkin.io/quickstart.sh | bash -s java -jar zipkin.jar Source 可以通过源码来安装并运行. # get the latest source git clone https://github.com/openzipkin/zipkin cd zipkin # Build the server and also make its dependencies ./mvnw -DskipTests --also-make -pl zipkin-server clean install # Run the server java -jar ./zipkin-server/target/zipkin-server-*exec.jar ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:1:0","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"架构 整体架构如下图所示(来源于官网): 架构图架构 \" 架构图 zipkin已支持的平台和语言列表. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:2:0","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"流程示例 标示符会在服务之间传播,而详细信息会被发送到zipkin.在这两种情况下,追踪库都负责创建有效的追踪并呈现它们.追踪库会确保两种数据之间保持奇偶校验一致性. 下面是http追踪的示例,其中用户代码调用了资源/foo.这是一个单独的Span,在用户代码收到响应后会被异步发送到zipkin中. ┌─────────────┐ ┌───────────────────────┐ ┌─────────────┐ ┌──────────────────┐ │ User Code │ │ Trace Instrumentation │ │ Http Client │ │ Zipkin Collector │ └─────────────┘ └───────────────────────┘ └─────────────┘ └──────────────────┘ │ │ │ │ ┌─────────┐ │ ──┤GET /foo ├─▶ │ ────┐ │ │ └─────────┘ │ record tags │ │ ◀───┘ │ │ ────┐ │ │ │ add trace headers │ │ ◀───┘ │ │ ────┐ │ │ │ record timestamp │ │ ◀───┘ │ │ ┌─────────────────┐ │ │ ──┤GET /foo ├─▶ │ │ │X-B3-TraceId: aa │ ────┐ │ │ │X-B3-SpanId: 6b │ │ │ │ └─────────────────┘ │ invoke │ │ │ │ request │ │ │ │ │ │ │ ┌────────┐ ◀───┘ │ │ ◀─────┤200 OK ├─────── │ │ ────┐ └────────┘ │ │ │ record duration │ │ ┌────────┐ ◀───┘ │ ◀──┤200 OK ├── │ │ │ └────────┘ ┌────────────────────────────────┐ │ │ ──┤ asynchronously report span ├────▶ │ │ │ │{ │ │ \"traceId\": \"aa\", │ │ \"id\": \"6b\", │ │ \"name\": \"get\", │ │ \"timestamp\": 1483945573944000,│ │ \"duration\": 386000, │ │ \"annotations\": [ │ │--snip-- │ └────────────────────────────────┘ 追踪库会异步发送Span,是为了防止与追踪系统有关的延迟或故障导致用户代码的延迟或破坏. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:3:0","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"Transport 追踪库发送Span时,必须要从被追踪系统传输到Zipkin的collectors组件.目前主要有三种方式传输:HTTP,kafka和Scribe. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:4:0","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"组件 主要包含四个组件: collector storage query service web UI ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:5:0","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"Collector 当数据到达zipkin的收集器守护程序后,将对其进行验证、存储及索引,供zipkin收集器进行查找. HTTP 默认HTTP方式是可用的,URI为POST /api/v1/spans和POST /api/v2/spans,目前主要使用v2版本.支持如下配置项: 属性 环境变量 描述 zipkin.collector.http.enabled COLLECTOR_HTTP_ENABLED false禁用HTTP方式,默认为true Kafka 当参数KAFKA_BOOTSTRAP_SERVERS设置为v0.10+版本的Kafka时,该收集器就会启用.支持如下配置项: 变量 新Consumer配置 描述 COLLECTOR_KAFKA_ENABLED N/A false禁用Kafka收集器,默认为true KAFKA_BOOTSTRAP_SERVERS bootstrap.servers 以逗号分隔的broker地址,如127.0.0.1:9092.没有默认值 KAFKA_GROUP_ID group.id 此过程代表的消费组.默认为zipkin KAFKA_TOPIC N/A 以逗号分隔的topic列表,默认为zipkin KAFKA_STREAMS N/A 消费topic的线程数,默认为1 启动命令:KAFKA_BOOTSTRAP_SERVERS=127.0.0.1:9092 java -jar zipkin.jar 也可以设置其它Kafka的conusmer属性 举例: # 容器方式启动kafka $ export KAFKA_BOOTSTRAP_SERVERS=$(docker-machine ip `docker-machine active`) # Run Kafka in the background $ docker run -d -p 9092:9092 \\ --env ADVERTISED_HOST=$KAFKA_BOOTSTRAP_SERVERS \\ --env AUTO_CREATE_TOPICS=true \\ spotify/kafka # Start the zipkin server, which reads $KAFKA_BOOTSTRAP_SERVERS $ java -jar zipkin.jar # 设置多个broker地址 $ KAFKA_BOOTSTRAP_SERVERS=broker1.local:9092,broker2.local:9092 java -jar zipkin.jar # 备用topic名称 $ KAFKA_BOOTSTRAP_SERVERS=127.0.0.1:9092 java -Dzipkin.collector.kafka.topic=zapkin,zipken -jar zipkin.jar # 使用系统属性取代环境变量KAFKA_BOOTSTRAP_SERVERS. $ java -Dzipkin.collector.kafka.bootstrap-servers=127.0.0.1:9092 -jar zipkin.jar RabbitMQ 支持如下配置项: 属性 环境变量 描述 zipkin.collector.rabbitmq.concurrency RABBIT_CONCURRENCY 当前消费者数量,默认为1 zipkin.collector.rabbitmq.connection-timeout RABBIT_CONNECTION_TIMEOUT 等待建立连接的超时时间,单位为毫秒,默认为60000(1分钟) zipkin.collector.rabbitmq.queue RABBIT_QUEUE 队列名称,默认为zipkin zipkin.collector.rabbitmq.uri RABBIT_URI rabbitmq完整的uri,如:amqp://user:pass@host:10000/vhost 如果uri被设置了,下面的配置项将会被忽略: 属性 环境变量 描述 :– :– :– zipkin.collector.rabbitmq.addresses RABBIT_ADDRESSES 逗号分隔的rabbitmq地址,如:localhost:5672,localhost:5673 zipkin.collector.rabbitmq.password RABBIT_PASSWORD 连接rabbitmq的密码,默认为guest zipkin.collector.rabbitmq.username RABBIT_USER 连接rabbitmq的用户名,默认为guest zipkin.collector.rabbitmq.virtual-host RABBIT_VIRTUAL_HOST rabbitmq的虚拟主机名,默认为/ zipkin.collector.rabbitmq.use-ssl RABBIT_USE_SSL 设置为true,表示使用ssl连接到rabbitmq 队列会被申明为持久化的,收集器使用单个conn连接到rabbitmq,通过配置的concurrency数量的线程(每个线程一个channel)来消费消息.消费消息时autoAck设置为on了,表示消费者收到消息后rabbitmq就会在队列中自动删除该消息,若消费者出现异常是不能再重复消费该消息的. 启动命令:RABBIT_ADDRESSES=localhost java -jar zipkin.jar ActiveMQ 支持ActiveMQ v5.x版本. 属性 环境变量 描述 zipkin.collector.activemq.enabled COLLECTOR_ACTIVEMQ_ENABLED false表示禁用该收集器,默认为true zipkin.collector.activemq.url ACTIVEMQ_URL ActiveMQ broker地址,如:tcp://localhost:61616或者故障转移:(tcp://localhost:61616,tcp://remotehost:61616) zipkin.collector.activemq.queue ACTIVEMQ_QUEUE 队列名,默认为zipkin zipkin.collector.activemq.client-id-prefix ACTIVEMQ_CLIENT_ID_PREFIX 消费者的客户端ID的前缀,默认为zipkin zipkin.collector.activemq.concurrency ACTIVEMQ_CONCURRENCY 消费者数量,默认为1 zipkin.collector.activemq.username ACTIVEMQ_USERNAME 连接到ActiveMQ时的用户名,可选 zipkin.collector.activemq.password ACTIVEMQ_PASSWORD 连接到ActiveMQ时的密码,可选 启动命令:ACTIVEMQ_URL=tcp://localhost:61616 java -jar zipkin.jar ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:5:1","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"Storage 存储组件是采用插件化的方式实现,目前支持InMemory、Cassandra、ElasticSearch和MySQL,还有其它第三方实现的. InMemory 默认启动的就是In-Memory方式,所有数据全部保存在内存中,没有持久化功能. 启动方式: # 默认启动方式 java -jar zipkin.jar # 指定STORAGE_TYPE为mem STORAGE_TYPE=mem java -jar zipkin.jar 提供了参数MEM_MAX_SPANS来控制Span占用的内存大小.当碰到out-of-memory错误时,可以增大该参数或者调整堆大小(-Xmx). `MEM_MAX_SPANS`: Oldest traces (and their spans) will be purged first when this limit is exceeded. Default 500000 # 调整内存大小. MEM_MAX_SPANS=1000000 java -Xmx1G -jar zipkin.jar MySQL 基于MySQL5.7版本,需要先建库建表,脚本文件地址. #installtheschemaandindexes$mysql-uroot-e\"create database if not exists zipkin\"$mysql-uroot-Dzipkin\u003czipkin-storage/mysql-v1/src/main/resources/mysql.sql#根据traceid来查询select*fromzipkin_spanswheretrace_id=x'27960dafb1ea7454' 启动时可设置的相关参数 MYSQL_DB: The database to use. Defaults to “zipkin”. MYSQL_USER and MYSQL_PASS: MySQL authentication, which defaults to empty string. MYSQL_HOST: Defaults to localhost MYSQL_TCP_PORT: Defaults to 3306 MYSQL_MAX_CONNECTIONS: Maximum concurrent connections, defaults to 10 MYSQL_USE_SSL: Requires javax.net.ssl.trustStore and javax.net.ssl.trustStorePassword, defaults to false. 启动命令:STORAGE_TYPE=mysql MYSQL_USER=root java -jar zipkin.jar Cassandra Elasticsearch 启动时可设置的相关参数 ES_HOSTS: A comma separated list of elasticsearch base urls to connect to ex. http://host:9200. Defaults to “http://localhost:9200”. ES_PIPELINE: Indicates the ingest pipeline used before spans are indexed. No default. ES_TIMEOUT: Controls the connect, read and write socket timeouts (in milliseconds) for Elasticsearch API. Defaults to 10000 (10 seconds) ES_INDEX: The index prefix to use when generating daily index names. Defaults to zipkin. ES_DATE_SEPARATOR: The date separator to use when generating daily index names. Defaults to ‘-’. ES_INDEX_SHARDS: The number of shards to split the index into. Each shard and its replicas are assigned to a machine in the cluster. Increasing the number of shards and machines in the cluster will improve read and write performance. Number of shards cannot be changed for existing indices, but new daily indices will pick up changes to the setting. Defaults to 5. ES_INDEX_REPLICAS: The number of replica copies of each shard in the index. Each shard and its replicas are assigned to a machine in the cluster. Increasing the number of replicas and machines in the cluster will improve read performance, but not write performance. Number of replicas can be changed for existing indices. Defaults to 1. It is highly discouraged to set this to 0 as it would mean a machine failure results in data loss. ES_ENSURE_TEMPLATES: Installs Zipkin index templates when missing. Setting this to false can lead to corrupted data when index templates mismatch expectations. If you set this to false, you choose to troubleshoot your own data or migration problems as opposed to relying on the community for this. Defaults to true. ES_USERNAME and ES_PASSWORD: Elasticsearch basic authentication, which defaults to empty string. Use when X-Pack security (formerly Shield) is in place. ES_CREDENTIALS_FILE: The location of a file containing Elasticsearch basic authentication credentials, as properties. The username property is zipkin.storage.elasticsearch.username, password zipkin.storage.elasticsearch.password.This file is reloaded periodically, using ES_CREDENTIALS_REFRESH_INTERVAL as the interval. This parameter takes precedence over ES_USERNAME and ES_PASSWORD when specified. ES_CREDENTIALS_REFRESH_INTERVAL: Credentials refresh interval in seconds, which defaults to 1 second. This is the maximum amount of time spans will drop due to stale credentials. Any errors reading the credentials file occur in logs at this rate. ES_HTTP_LOGGING: When set, controls the volume of HTTP logging of the Elasticsearch API. Options are BASIC, HEADERS, BODY ES_SSL_NO_VERIFY: When true, disables the verification of server’s key certificate chain. This is not appropriate for production. Defaults to false. ES_TEMPLATE_PRIORITY: The priority value of the compo","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:5:2","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"Query Service 查询服务提供简单的JSON API来查询和检索追踪信息,主要是供Web UI使用. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:5:3","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"Web UI 以网页的形式来展示追踪数据. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:5:4","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"客户端代码 基于OpenTracing来使用zipkin,以Golang为例. 依赖于zipkin-go、zipkin-go-opentracing和opentracing-go这三个库. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:6:0","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"Reporter接口 接口定义: // Tracer依赖该接口来发送Span数据. type Reporter interface { Send(model.SpanModel) Close() error } 具体实现: noopReporter 具体实现都是空的,不做任何处理. // 创建接口. reporter := zipkin.NewNoopReporter() httpReporter 基于HTTP协议来发送Span数据. 函数原型: // url表示zipkin服务端接收数据的端点,如http://127.0.0.1:9411/api/v2/spans. func NewReporter(url string, opts ...ReporterOption) reporter.Reporter 采用了Option设计方式来对相关参数进行自定义处理. // 设置http的超时时间,默认为5秒. func Timeout(duration time.Duration) ReporterOption // 设置最大的待发送数量,当到达此阈值后就会触发收集操作,默认为100. func BatchSize(n int) ReporterOption // 设置触发收集操作的时间间隔,默认为1秒.可见触发收集操作有两个因素,当间隔时间达到指定时间或者待发送数量超过batchsize,就会立即触发. func BatchInterval(d time.Duration) ReporterOption // 设置最大的缓存数量,当超过此阈值,从队列开头到超出数量的Span将会被丢弃,默认为1000. func MaxBacklog(n int) ReporterOption // 设置回调,在发送span到zipkin之前会被调用,默认为nil,回调定义是type RequestCallbackFn func(*http.Request). func RequestCallback(rc RequestCallbackFn) ReporterOption // 设置日志对象,默认为Stderr func Logger(l *log.Logger) ReporterOption // 设置span的序列化方式,默认实现的是json格式. func Serializer(serializer reporter.SpanSerializer) ReporterOption // 设置发送请求到zipkin收集器的方式,默认为http.Client. func Client(client HTTPDoer) ReporterOption rmqReporter 把Span数据发送到rabbitmq消息总线上. 函数原型: // address表示rabbitmq的地址,如:amqp://guest:guest@localhost:5672/test func NewReporter(address string, opts ...ReporterOption) (reporter.Reporter, error) 采用了Option设计方式来对相关参数进行自定义处理. // 设置日志对象,默认为Stderr func Logger(logger *log.Logger) ReporterOption // 设置rabbitmq中交换器的名字,默认为zipkin. func Exchange(exchange string) ReporterOption // 设置rabbitmq队列的名字,默认为zipkin. func Queue(queue string) ReporterOption // 设置channel通道对象,默认为nil. func Channel(ch *amqp.Channel) ReporterOption // 设置与rabbitmq连接的对象,默认为nil. func Connection(conn *amqp.Connection) ReporterOption rabbitmq的交换器类型默认为direct,交换器和队列默认都是持久化的,非独占的. kafkaReporter 把Span数据发送到kafka中. 函数原型: // address为broker地址列表. func NewReporter(address []string, options ...ReporterOption) (reporter.Reporter, error) { 采用了Option设计方式来对相关参数进行自定义处理. // 设置日志对象,默认为Stderr func Logger(logger *log.Logger) ReporterOption // 设置生产者. func Producer(p sarama.AsyncProducer) ReporterOption // 设置topic名字,默认为zipkin func Topic(t string) ReporterOption // 设置span的序列化方式,默认实现的是json格式. func Serializer(serializer reporter.SpanSerializer) ReporterOption logReporter 把Span数据发送到日志对象中. 函数原型: func NewReporter(l *log.Logger) reporter.Reporter 只是把Span数据记录到指定的日志对象中,并不会发送到zipkin中. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:6:1","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"NewTracer函数 函数原型: // rep指定Reporter. func NewTracer(rep reporter.Reporter, opts ...TracerOption) (*Tracer, error) 当指定的rep为nil时,会默认创建为noopRepoerter. 采用了Option设计方式来对相关参数进行自定义处理. // 设置被追踪服务的本地endpoint,可调用zipkin.NewEndpoint来生成对应的Endpoint对象,默认为nil. func WithLocalEndpoint(e *model.Endpoint) TracerOption // 设置采样策略,默认为AlwaysSample. func WithSampler(sampler Sampler) TracerOption 针对采样策略,Sampler原型为type Sampler func(id uint64) bool,可按照需求来自定义采样策略.官方默认提供了如下几种方式: AlwaysSample // 所有span都会被发送到zipkin. func AlwaysSample(_ uint64) bool { return true } NewModuloSampler // 当mod小于2时采用AlwaysSmaple.当mod大于2时,若trace id对mod求余为0就会被发送zipkin. func NewModuloSampler(mod uint64) Sampler NewBoundarySampler // BoundarySampler适用于提供随机trace id且仅做出一次采样决定的高流量场景.它可以防止集群中的节点选择完全相同的ID. func NewBoundarySampler(rate float64, salt int64) (Sampler, error) NewCountingSampler // CountingSampler适用于低流量或不提供随机trace id的场景,由于采样决策不是幂等的(根据traceid一致),因此不适用于收集器. func NewCountingSampler(rate float64) (Sampler, error) ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:6:2","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"完整例子 import ( \"io\" \"github.com/opentracing/opentracing-go\" zipkinot \"github.com/openzipkin-contrib/zipkin-go-opentracing\" \"github.com/openzipkin/zipkin-go\" zipkinhttp \"github.com/openzipkin/zipkin-go/reporter/http\" ) type noopZkCloser struct{} func (noopZkCloser) Close() error { return nil } // newTracer 创建基于zipkin的tracer对象. func newTracer(tracingURL, serverName, localEndpoint string) (opentracing.Tracer, io.Closer, error) { // 基于HTTP的Reporter. zipkinReporter := zipkinhttp.NewReporter(tracingURL) // 创建localendpoint对象. endpoint, err := zipkin.NewEndpoint(serverName, localEndpoint) if err != nil { return nil, nil, err } // 创建zipkin原生的tracer对象,采样策略使用默认的AlwaysSample. nativeTracer, err := zipkin.NewTracer(zipkinReporter, zipkin.WithLocalEndpoint(endpoint)) if err != nil { return nil, nil, err } // 把zipkin原生tracer对象包装成OpenTracing的tracer对象. return zipkinot.Wrap(nativeTracer), noopZkCloser{}, nil } ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-zipkin-guide/:6:3","tags":["OpenTracing"],"title":"Zipkin指南","uri":"/2020/12/10/opentracing-zipkin-guide/"},{"categories":["Microservice"],"content":"回顾: OpenTracing的目标是什么? OpenTracing是一个位于应用程序/库代码和追踪系统之间的一个标准中间层.结构如下: +-------------+ +---------+ +----------+ +------------+ | Application | | Library | | OSS | | RPC/IPC | | Code | | Code | | Services | | Frameworks | +-------------+ +---------+ +----------+ +------------+ | | | | | | | | v v v v +-----------------------------------------------------+ | · · · · · · · · · · OpenTracing · · · · · · · · · · | +-----------------------------------------------------+ | | | | | | | | v v v v +-----------+ +-------------+ +-------------+ +-----------+ | Tracing | | Logging | | Metrics | | Tracing | | System A | | Framework B | | Framework C | | System D | +-----------+ +-------------+ +-------------+ +-----------+ ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:1:0","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"用例 下面列出一些OpenTracing的用例,并对其进行详细描述: Use case Description 应用程序代码 编写应用程序代码的开发人员可以使用OpenTracing来描述因果关系,划分控制流,并添加细粒度的日志记录信息. 库代码 对于请求进行中间控制的库也可以与OpenTracing集成.例如:Web中间件可以使用OpenTracing为每个请求创建spans,或者ORM库可以使用OpenTracing来描述更高级别的ORM语义并衡量特定SQL查询的执行. OSS服务 除了嵌入式库之后,整个OSS服务都可以采用OpenTracing来与分布式跟踪系统集成,在较大的分布式系统中启动或传播到其它进程.例如:HTTP负载均衡器可以使用OpenTracing包装所有请求,或者在分布式KV存储系统中使用OpenTracing来跟踪读写性能. RPC/IPC框架 任何跨进程边界的任务子系统都可以使用OpenTracing来标准化trace状态,OpenTracing提供了统一的Inject和Extract格式. 所有上述都可以使用OpenTracing来描述和传播分布式跟踪信息,而不用了解分布式追踪系统的底层实现. OpenTracing优先考虑易用性的问题,主要是站在调用者的角度,而不是分布式追踪系统的实现者上. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:2:0","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"举例 ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:0","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"追踪函数 func TopLevelFunction(ctx context.Context) { tracer := opentracing.GlobalTracer() span1 := tracer.StartSpan(\"toplevelfunction\") defer span1.Finish() subctx := opentracing.ContextWithSpan(ctx, span) // 业务逻辑. Function2(subctx) } 作为业务逻辑的一部分,调用了Function2方法,也想被追踪.为了让Function2里的追踪和TopLevelFunction里的追踪形成因果关系,必须在Function2里要能访问到span1,通过span1来创建一个子span. func Function2(ctx context.Context) { span1, ok := opentracing.SpanFromContext(ctx) if ok { tracer := opentracing.GlobalTracer() span2 := tracer.StartSpan(\"function2\", opentracing.ChildOf(span1.Context())) defer span2.Finish() } } 通过context.Context来传递span,可以让整个函数调用过程形成一个完整的调用链. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:1","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"服务端追踪 当服务端想要去跟踪每个请求的执行过程,一般需要以下几个步骤. 试图中从请求中获取SpanContext(客户端已经开启了trace),如果无法获取就开启一个新的trace. 在上下文中存储最新创建的Span,上下文会通过应用程序代码或RPC框架传播. 最后当处理完请求时需要调用span.Finish()来关闭Span. 从请求中获取SpanContext 假设有个HTTP服务器,SpanContext通过http头从客户端传播到服务端,可通过request.headers来访问. tracer := opentracing.GlobalTracer() carrier := opentracing.HTTPHeadersCarrier(r.Header) spanContext, err := tracer.Extract(opentracing.HTTPHeaders, carrier) 把headers转换成carrier,tracer对象知道需要headers中的哪些字段,用来重建tracer的状态及Baggage. 从请求中获取一个已经存在的trace,或者开启一个新的trace 假设客户端没有发送相应字段的值,在服务端就无法从Header中获取到,上文中的spanContext可能为nil.在这种情况下,服务端需要开启一个新的trace. tracer := opentracing.GlobalTracer() carrier := opentracing.HTTPHeadersCarrier(r.Header) spanContext, err := tracer.Extract(opentracing.HTTPHeaders, carrier) var span opentracing.Span if spanContext == nil || err != nil { span = tracer.StartSpan(r.RequestURI) } else { span = tracer.StartSpan(r.RequestURI, opentracing.ChildOf(spanContext)) } ext.HTTPMethod.Set(span, r.Method) ext.HTTPMethod.Set是给span设置一个附加信息,等同于span.SetTag(\"http.method\", r.Method). StartSpan的第一个参数是operationName,用来指定新创建的Span的名字.举例,若HTTP请求是POST类型且URI为/save_user/123,这此时Span的名字会被设置为/save_user/123.OpenTracing规范不会强制要求应用程序如何给Span命名. 进程内请求上下文传播 请求上下文是指:对于一个请求,所有处理这个请求的层都可以访问到同一个context(上下文).可以通过特定值,如用户id、token、请求截止时间等来获取这个context,也可以用这个方式来获取当前正在追踪的Span. OpenTracing规范中并没有规定请求上下文的传输实现方式,但这点是非常重要的,便于我们理解后面的章节.一般有两种常见的基数: 隐式传输,context需要被储存在平台特定的位置,允许应用程序在任何地方都能访问到.常用的RPC框架会利用thread-local或continuation-local来存储,或者是全局变量(在单线程程序中).这种方式的缺点是性能低下,并且有些平台如Go是不支持线程本地存储的,隐式传输就几乎不可能实现了. 显示传输,要求应用程序代码包装和传递一个context对象.这种方式的缺点在于向应用程序暴露了底层的实现,Go blog post这篇文章提供了这种方式的深层次解析. func HandleHttp(w http.ResponseWriter, req *http.Request) { ctx := context.Background() ... BusinessFunction1(ctx, arg1, ...) } func BusinessFunction1(ctx context.Context, arg1...) { ... BusinessFunction2(ctx, arg1, ...) } func BusinessFunction2(ctx context.Context, arg1...) { parentSpan := opentracing.SpanFromContext(ctx) childSpan := opentracing.StartSpan( \"...\", opentracing.ChildOf(parentSpan.Context()), ...) ... } ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:2","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"客户端追踪 当一个应用程序扮演RPC客户端的角色时,在调用外部接口前可以开启一个新的Span,在请求期间传播该Span.下面通过一个HTTP请求来展示如何处理. func tracedPost(ctx context.Context, operation, url string, body []byte) error { parent_span := opentracing.SpanFromContext(ctx) tracer := opentracing.GlobalTracer() span := tracer.StartSpan( operation, opentracing.ChildOf(parent_span.Context()), opentracing.Tag{\"http.url\", url}, opentracing.Tag{\"http.method\", http.MethodPost}) defer span.Finish() reader := bytes.NewBuffer(body) req, err := http.NewRequest(http.MethodPost, url, reader) if err != nil { ext.LogError(span, err) return err } req.Header.Set(\"Content-Type\", \"application/json\") err = tracer.Inject(span.Context(), opentracing.HTTPHeaders, req.Header) if err != nil { ext.LogError(span, err) return err } cli := http.Client{Timeout: time.Second * 5} rsp, err := cli.Do(req) if err != nil { ext.LogError(span, err) return err } ext.HTTPStatusCode.Set(span, uint16(rsp.StatusCode)) return nil } 首先从context中获取parent_span,可以跟上游Span构成一个链条. 针对http请求创建一个新的Span,设置相应的Tag,然后调用Inject把需要传播的信息注入到req.Header中,在服务端就可以利用Header来重组Span. 当有错误发生的时候,调用ext.LogError把错误信息关联到Span上. 最后把相应的状态码作为Tag设置到Span上. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:3","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"使用Baggage/分布式上下文传输 上面的例子都是通过网络在客户端和服务端之间传递Span/Tracer,包含任意的Baggage.客户端可以利用Baggage来传播一些附加信息到服务端及任何其下游服务. // 客户端. span = span.SetBaggageItem(\"auto_token\", \"token\") // 服务端. token := span.BaggageItem(\"auto_token\") ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:4","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"Logging事件 在上面客户端追踪的例子中已经使用过Log了.可以记录事件而不需要有payload,而不仅在Span被创建和完成时.举个例子,应用程序在执行过程中可能会需要记录cache miss事件,可以通过在请求上下文中来获取当前Span然后把该事件附加到Span中. span := opentracing.SpanFromContext(ctx) // 不带payload的. span.LogEvent(\"cache_miss\") // 带payload的. span.LogEventWithPayload(\"cache_miss\", 1) Tracer会自动记录该事件的时间戳,与应用于整个Span的tags相反.也可以将外部提供的时间戳与事件想关联,可以查看opentracing-go中Span接口的LogFields方法. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:5","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"使用外部时间戳记录Span 因为各种各样的原因,在有些场景下会将OpenTracing兼容的tracer集成到服务中.比如一个用户有一个日志文件,其中包含大量来自黑盒系统(如HAProxy)产生的Span数据,为了把这些数据导入到OpenTracing兼容的系统中,API必须提供一种方法通过外部自定义时间戳来记录Span. span := tracer.StartSpan(\"operationname\", opentracing.StartTime(time.Now())) span.FinishWithOptions(opentracing.FinishOptions{FinishTime: time.Now()}) ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:6","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"开启tracer之前设置好采样策略 大多数分布式追踪系统都会通过应用不同的采样策略来减少需要记录和处理的追踪数据的总量.有时开发人员希望有一种方式来确保一个tracer数据会被系统记录(采样),如在HTTP请求中包含一个特殊的参数(debug=true).OpenTracing API标准化了一些有用的tags,其中一个叫sampling.priority(采样优先级):精确的实现是由追踪系统的实现者决定的,但任何大于0(默认)代表一条tracer的高优先级.为了传递这个属性到追踪系统中,需要在追踪前进行预处理,如下: b, err := strconv.ParseBool(req.Header.Get(\"debug\")) if b \u0026\u0026 err == nil { span := tracer.StartSpan( \"operationname\", opentracing.Tag{string(ext.SamplingPriority), 1}, ) } ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:7","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"追踪消息总线的方案 有两种类别的消息总线需要处理,包括消息队列和发布/订阅(主题). 从追踪的视角来看,消息总线的类型并不重要,只是要将生产者关联的SpanContext传播到零个或多个消费者中.然后消费者就有责任创建Span来封装都消息的处理,并建立对传播来的SpanContext的FollowsFrom引用. 以RPC客户端为例,生产者在发送消息之前开启了一个新Span,并跟随消息传播该Span的Context.在消息成功发布到消息总线上后这个Span就完成了.下面展示代码是如何实现的: def traced_send(message, operation): # retrieve current span from propagated message context parent_span = get_current_span() # start a new span to represent the message producer span = tracer.start_span( operation_name=operation, child_of=parent_span.context, tags={'message.destination': message.destination} ) # propagate the Span via message headers tracer.inject( span.context, format=opentracing.TEXT_MAP_FORMAT, carrier=message.headers) with span: messaging_client.send(message) except Exception e: ... raise 接下来消费者会判断消息中是否包含了SpanContext,如果有,就会用它来与生产者的Span建立联系. extracted_context = tracer.extract( format=opentracing.TEXT_MAP_FORMAT, carrier=message.headers ) span = tracer.start_span(operation_name=operation, references=follows_from(extracted_context)) span.set_tag('message.destination', message.destination) ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:8","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"基于队列的同步请求-响应 尽管使用不多,但有些消息平台/标准(如JMS)支持在消息头中提供ReplyTo目标的功能.消费者收到消息后,它会将结果返回到指定的目的地. 这种模式常用来模拟同步请求/响应,这种情况下消费者和生产者之间是ClildOf的关系. 但此模式也可以用于委托来指示将结果告知第三方.在这种情况下,它将被视为两个单独的消息交换并具有链接每个阶段的Follows From关系类型(A-\u003eB-\u003eC). 由于很难区分这两种情况,不建议将面向消息中间件用于同步的请求/响应模式,因此从建议跟踪角度忽略请求/响应方案. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:3:9","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"参考 语义约定 Best Practices 最佳实践 opentracing-tutorial ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-best-practices/:4:0","tags":["OpenTracing"],"title":"OpenTracing最佳实践","uri":"/2020/12/10/opentracing-best-practices/"},{"categories":["Microservice"],"content":"概述 ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:1:0","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"简介 虽然微服务是一种强大的系统架构,但也伴随着新的问题,就是当微服务数量众多且调用链条过长时,在复杂的网络环境下是很难调试和观察分布式事务,无法直接在内存或堆栈中来调试或观察. 在这种情况下,分布式追踪系统进入到视野之中,分布式追踪系统对于描述和分析跨进程事务的问题提供了解决方案.大部分的分布式追踪系统的思想都来源于Google’s Dapper paper ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:1:1","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"分布式追踪系统的模型 大多数分布式追踪系统的模型来自于Google’s Dapper paper.OpenTracing也是一样的,采用了相同的名词和动词. tracing模型模型 \" tracing模型 Trace: 用来描述在分布式系统中一个完整的事务(这里的事务不是指数据库中的事务,而是指一个完整的业务流). Span: 可命名的、记录耗时的一个工作流片段,Span上可设置多个key:value的tags,也可以记录某个时间点的结构化的log. SpanContext: 追踪信息会伴随着整个分布式事务,会通过网络或者消息总线来传递到下游服务中.包含了trace id、span id和其它需要传播(分布式追踪系统需要传播到下游的)的数据. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:1:2","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"四个主要的问题 从应用程序层分布式跟踪系统的角度来看，现代软件系统如下图所示： 系统结构系统 \" 系统结构 现代软件系统中的组件可分为三大类: 应用程序和业务逻辑: 自己的代码. 广泛使用的共享库: 别人的代码. 广泛使用的服务: 别人的基础设施. 这三类组件有不同的需求,并驱动着负责监控应用程序的分布式追踪系统的设计.最终有四个非常重要的设计要点: 追踪系统的API: 应用程序如何使用? 传播协议: 在RPC请求中与应用程序一起发送的内容(传递到下游服务中). 数据协议: 异步(带外)发送到分析系统中的内容. 分析系统: 用于处理追踪数据的数据库和交互式UI. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:1:3","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"OpenTracing是如何解决的? OpenTracing API提供了标准的、与厂商无关的工具框架.当开发人员想尝试不同的分布式追踪系统时,只需要简单的更改Tracer的配置,而不用为了适配新的分布式追踪系统而重复开发整个追踪过程. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:1:4","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"什么是分布式追踪? 分布式追踪是一种用来分析和监控应用程序的方法,特别是使用微服务架构的系统.分布式追踪有助于查明发生故障的位置以及导致性能下降的原因. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:2:0","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"分布式追踪的使用场景 IT和DevOps团队可以用分布式追踪来监控整个应用程序.分布式追踪特别适合用来调试和监控现代分布式软件体系结构,如微服务. 开发人员可以利用分布式追踪来帮助调试和优化代码. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:2:1","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"什么是OpenTracing? 首先从什么不是OpenTracing开始可能更容易. OpenTracing不是一个下载文件或程序.分布式追踪系统要求软件开发人员将追踪代码添加到应用程序的代码中,或者应用程序所使用的框架中. OpenTracing不是一个标准,CNCF不是一个标准化组织.OpenTracing API项目正在努力为分布式追踪系统创建更加标准的API和工具. OpenTracing是由API规范,已实现该规范的框架和库以及该项目的文档组成.OpenTracing允许开发人员使用不会将其受限于任何一种特定的产品或供应商的API来将追踪代码添加到应用程序中. 关于更多已实现OpenTracing规范的信息,可以查看已支持的语言列表和已支持的分布式追踪系统 ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:2:2","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Spans Span是分布式追踪的主要构建对象,代表分布式系统中已完成的单个工作单元. 分布式系统中的每个组件都会构建一个Span(命名的、记录耗时的一个工作流片段). Spans可以包含对其它Spans的引用,这样就允许多个Span关联到一个已完成的Trace(把一个请求在分布式系统中的生命周期可视化). 根据OpenTracing规范,每个Span会封装以下内容: Operation Name(操作名称). 开始时间和结束时间. Tags,key:value的集合,伴随整个Span. Logs,key:value的集合,记录某个时间点的日志. SpanContext. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:3:0","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Tags key:value的集合,对Span的自定义标记,可以用来查询、过滤和理解追踪数据. tags是伴随Span的整个生命周期,在文件semantic_conventions.md定义了常见场景中Span的常规tags.如db.instance表示数据库主机地址,http.status_code表示HTTP的响应码,error可以设置为True表示Span所代表的操作失败了. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:3:1","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Logs key:value的集合,可用于抓取Span的特定的日志信息以及应用程序本身的其它调试信息或输出信息.也常用于记录Span某个特定时刻或事件(和tags应用与Span的整个生命周期不同). ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:3:2","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"SpanContext SpanContext用于跨进程边界时携带数据,主要包含两个方面的数据: 依赖于实现的状态来引用trace中不同的span. Tracer定义的spanID和traceID. 任何Baggage Items. 需要跨进程边界传播的key:value数据对. 其它对整个追踪访问有用的数据. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:3:3","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"举例 t=0 operation name: db_query t=x +-----------------------------------------------------+ | · · · · · · · · · · Span · · · · · · · · · · | +-----------------------------------------------------+ Tags: - db.instance:\"customers\" - db.statement:\"SELECT * FROM mytable WHERE foo='bar'\" - peer.address:\"mysql://127.0.0.1:3306/customers\" Logs: - message:\"Can't connect to mysql server on '127.0.0.1'(10061)\" SpanContext: - trace_id:\"abc123\" - span_id:\"xyz789\" - Baggage Items: - special_id:\"vsid1738\" ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:3:4","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Tracers ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:4:0","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"简介 OpenTracing提供了一个开放的、与厂商无关的标准API,用来描述分布式事务,尤其是因果关系、语义和时间.它提供了一个通用的分布式上下文传播框架,该框架由以下API原语组成: 在进程间传播元数据上下文. 编码和解码元数据上下文之后,通过网络传输它用来进行进程间通信. 因果关系追踪: 父子关系、分叉和连接. OpenTracing消除了众多分布式追踪系统之间的差异.这意味着无论开发人员使用哪个分布式追踪系统,追踪代码都将保持不变.为了在应用程序中使用OpenTracing规范的追踪代码,必须部署兼容OpenTracing的追踪系统,已支持OpenTracing规范的追踪系统. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:4:1","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Tracer接口 Tracer接口能创建Spans,还知道如何跨进程边界注入(序列化)和提取(反序列化)元数据,主要包含三个方面的能力: 开启一个新的Span. 将SpanContext注入到carrier中. 从carrier中提取出SpanContext. 以Golang语言为例: // Tracer is a simple, thin interface for Span creation and SpanContext // propagation. type Tracer interface { // Create, start, and return a new Span with the given `operationName` and // incorporate the given StartSpanOption `opts`. (Note that `opts` borrows // from the \"functional options\" pattern, per // http://dave.cheney.net/2014/10/17/functional-options-for-friendly-apis) // // A Span with no SpanReference options (e.g., opentracing.ChildOf() or // opentracing.FollowsFrom()) becomes the root of its own trace. // // Examples: // // var tracer opentracing.Tracer = ... // // // The root-span case: // sp := tracer.StartSpan(\"GetFeed\") // // // The vanilla child span case: // sp := tracer.StartSpan( // \"GetFeed\", // opentracing.ChildOf(parentSpan.Context())) // // // All the bells and whistles: // sp := tracer.StartSpan( // \"GetFeed\", // opentracing.ChildOf(parentSpan.Context()), // opentracing.Tag{\"user_agent\", loggedReq.UserAgent}, // opentracing.StartTime(loggedReq.Timestamp), // ) // StartSpan(operationName string, opts ...StartSpanOption) Span // Inject() takes the `sm` SpanContext instance and injects it for // propagation within `carrier`. The actual type of `carrier` depends on // the value of `format`. // // OpenTracing defines a common set of `format` values (see BuiltinFormat), // and each has an expected carrier type. // // Other packages may declare their own `format` values, much like the keys // used by `context.Context` (see https://godoc.org/context#WithValue). // // Example usage (sans error handling): // // carrier := opentracing.HTTPHeadersCarrier(httpReq.Header) // err := tracer.Inject( // span.Context(), // opentracing.HTTPHeaders, // carrier) // // NOTE: All opentracing.Tracer implementations MUST support all // BuiltinFormats. // // Implementations may return opentracing.ErrUnsupportedFormat if `format` // is not supported by (or not known by) the implementation. // // Implementations may return opentracing.ErrInvalidCarrier or any other // implementation-specific error if the format is supported but injection // fails anyway. // // See Tracer.Extract(). Inject(sm SpanContext, format interface{}, carrier interface{}) error // Extract() returns a SpanContext instance given `format` and `carrier`. // // OpenTracing defines a common set of `format` values (see BuiltinFormat), // and each has an expected carrier type. // // Other packages may declare their own `format` values, much like the keys // used by `context.Context` (see // https://godoc.org/golang.org/x/net/context#WithValue). // // Example usage (with StartSpan): // // // carrier := opentracing.HTTPHeadersCarrier(httpReq.Header) // clientContext, err := tracer.Extract(opentracing.HTTPHeaders, carrier) // // // ... assuming the ultimate goal here is to resume the trace with a // // server-side Span: // var serverSpan opentracing.Span // if err == nil { // span = tracer.StartSpan( // rpcMethodName, ext.RPCServerOption(clientContext)) // } else { // span = tracer.StartSpan(rpcMethodName) // } // // // NOTE: All opentracing.Tracer implementations MUST support all // BuiltinFormats. // // Return values: // - A successful Extract returns a SpanContext instance and a nil error // - If there was simply no SpanContext to extract in `carrier`, Extract() // returns (nil, opentracing.ErrSpanContextNotFound) // - If `format` is unsupported or unrecognized, Extract() returns (nil, // opentracing.ErrUnsupportedFormat) // - If there are more fundamental problems with the `carrier` object, // Extract() may return opentracing.ErrInvalidCarrier, // opentracing.ErrSpanContextCorrupted, or implementation-specific // errors. // // See Tracer.Inject(). Extract(format interface{}, carrier interface{}) (SpanContext, error) } ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:4:2","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"设置Tracer 实现了Tracer接口的对象,主要用来记录Spans并发布到某个位置.应用程序如何处理Tracer对象取决于开发人员:可以直接在整个应用程序中使用它,或将其存储在GlobalTracer中. 不同的Tracer实现在初始化时接收参数的方式和接收的参数有所不同,如下: 应用程序的追踪组件名称. 分布式追踪系统的Endpoint. 分布式追踪系统的安全连接. 采样策略. 一旦Tracer对象实例被创建出来,就可以用来手工创建Span,或传递该对象到框架或库中. 为了不强制用户传递Tracer对象,提供了一个全局的GlobalTracer实例来存储Tracer对象,在任何地方都可以通过该全局实例来获取Tracer对象. type registeredTracer struct { tracer Tracer isRegistered bool } var ( globalTracer = registeredTracer{NoopTracer{}, false} ) // SetGlobalTracer sets the [singleton] opentracing.Tracer returned by // GlobalTracer(). Those who use GlobalTracer (rather than directly manage an // opentracing.Tracer instance) should call SetGlobalTracer as early as // possible in main(), prior to calling the `StartSpan` global func below. // Prior to calling `SetGlobalTracer`, any Spans started via the `StartSpan` // (etc) globals are noops. func SetGlobalTracer(tracer Tracer) { globalTracer = registeredTracer{tracer, true} } // GlobalTracer returns the global singleton `Tracer` implementation. // Before `SetGlobalTracer()` is called, the `GlobalTracer()` is a noop // implementation that drops all data handed to it. func GlobalTracer() Tracer { return globalTracer.tracer } ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:4:3","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"开启一个新的Trace 当创建一个新的Span且该Span没有关联到一个父Span时,一个新的trace就开启了.当创建一个新的Span时,需要为其定义一个operation name,主要用来帮助确定Span与代码的关联关系. Span之间的关联关系目前支持ChildOf和FollowsFrom: ChildOf,表示两个Span之间存在父子关系.子Span是在父Span内执行的一个子流程. FollowsFrom,表示两个Span之间是独立的,父Span不依赖新的Span的执行结果,主要用于pipiline. // ChildOfRef refers to a parent Span that caused *and* somehow depends // upon the new child Span. Often (but not always), the parent Span cannot // finish until the child Span does. // // An timing diagram for a ChildOfRef that's blocked on the new Span: // // [-Parent Span---------] // [-Child Span----] // // See http://opentracing.io/spec/ // // See opentracing.ChildOf() ChildOfRef SpanReferenceType = iota // FollowsFromRef refers to a parent Span that does not depend in any way // on the result of the new child Span. For instance, one might use // FollowsFromRefs to describe pipeline stages separated by queues, // or a fire-and-forget cache insert at the tail end of a web request. // // A FollowsFromRef Span is part of the same logical trace as the new Span: // i.e., the new Span is somehow caused by the work of its FollowsFromRef. // // All of the following could be valid timing diagrams for children that // \"FollowFrom\" a parent. // // [-Parent Span-] [-Child Span-] // // // [-Parent Span--] // [-Child Span-] // // // [-Parent Span-] // [-Child Span-] // // See http://opentracing.io/spec/ // // See opentracing.FollowsFrom() FollowsFromRef ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:4:4","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"传播追踪信息 为了在分布式系统中跨进程边界进行追踪,服务需要具备继续追踪每个被客户端注入追踪信息的请求.OpenTracing通过提供了Inject和Extract方法来实现此目标,将Span的上下文编码为载体.Inject方法可以将SpanContext传递到carrier中.举例,传递追踪信息到客户端请求中,这样下游服务就能继续进行跟踪了.Extract方法作用是相反的,从carrier中提取出SpanContext. 跨进程边界追踪跨进程追踪 \" 跨进程边界追踪 ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:4:5","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Inject和Extract 开发人员在添加跨进程边界的追踪代码时必须懂得OpenTracing规范中定义的Tracer.Inject和Tracer.Extract的能力.它们在概念上很强大,允许开发人员编写正确和通用的跨进程传播代码,而不用绑定到某种特定的OpenTracing实现上. 无论特定的OpenTracing语言或具体的实现如何,下面会简要介绍Inject和Extract的设计以及正确使用. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:5:0","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"用于追踪传播的全景图 对于分布式追踪系统来说最困难的部分是分布式.任何追踪系统都需要一种了解许多不同进程中活动之间的因果关系的方式,不论这些进程是通过RPC框架、订阅/发布系统、通用消息队列、HTTP调用、UDP或其它方式连接的. 有些分布式追踪系统(2003年的Project5,或2006年的WAP5或2014年的The Mystery Machine)可以推断出跨进程边界的因果关系. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:5:1","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"OpenTracing传播方案的要求 为了使Inject和Extract方案有效,必须满足以下所有条件: 使用OpenTracing在跨进程传播时必须不能依赖特定分布式追踪系统的代码. 实现OpenTracing规范的系统必须不能为每种已知的进程间通信机制做特殊处理,否则会有太多的工作,甚至定义不明确. 传播机制为了优化可扩展. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:5:2","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"基本元素:Inject,Extract和Carriers trace中的任何SpanContext都可以注入到OpenTracing称为Carriers之中,Carriers可以是接口或者结构,可用于进程间通信(IPC).把trace的状态从一个进程传递到另一个进程.OpenTracing规范包含两种Carriesrs格式,但也可以自定义格式. 类似的,给定一个被注入了trace的Carriers,可以被提取出来从而生成一个SpanContext实例,该实例在语义上与被注入到Carriers中的保持一致. Inject代码 carrier := make(opentracing.TextMapCarrier) err := tracer.Inject(span.Context(), opentracing.TextMap, carrier) Extract代码 carrier := make(opentracing.TextMapCarrier) for k, v := range md { carrier.Set(k, v[0]) } spanctx, err := tracer.Extract(opentracing.TextMap, carrier) span := tracer.StartSpan(info.FullMethod, opentracing.ChildOf(spanctx)) ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:5:3","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["Microservice"],"content":"Inject/Extract格式 支持OpenTracing规范的所有追踪系统都必须支持两个格式:text map格式和binary格式. text map格式是一个string-\u003estring的映射. binary格式是不透明的字节数组(并且可能更紧凑和高效). OpenTracing规范并没有规定怎么去存储这些Carriers,但前提是要找到一种方法对传播的SpanContext的trace状态进行编码(例如,在Dapper中定义了trace_id,span_id,还有采样状态掩码)以及任何key:value的Baggage Items. 不能指望不同的分布式追踪系统(实现了OpenTracing规范的)以兼容的方式注入和提取SpanContext,虽然OpenTracing对于跨整个分布式系统的跟踪的具体实现是不可知的,但对于传播双方的进程都使用相同的实现. 一个端到端的传播例子 客户端进程拥有一个SpanContext实例,准备发起一个基于HTTP协议的RPC请求. 客户端调用Tracer.Inject(...),传递当前的SpanContext实例,采用text map格式,把其作为参数. Inject把text map注入到Carrier中,客户端程序把数据编码写入HTTP协议中(一般是放入headers中). 发起HTTP请求,数据跨进程边界传输. 在服务端,应用程序从HTTP协议中提取text map数据,并初始化为一个Carrier. 服务端程序调用Tracer.Extract(...),传入text map格式的名称和上面生成的Carrier. 在没有数据损坏或其它错误的情况下,服务端获取了一个SpanContext实例,和客户端的是同一个. ","date":"2020-12-10","objectID":"/2020/12/10/opentracing-distributed/:5:4","tags":["OpenTracing"],"title":"OpenTracing分布式链路追踪","uri":"/2020/12/10/opentracing-distributed/"},{"categories":["C++"],"content":"预定义变量 PROJECT_SOURCE_DIR 工程的根目录 PROJECT_BINARY_DIR 运行cmake命令的目录,通常是${PROJECT_SOURCE_DIR}/build CMAKE_INCLUDE_PATH 环境变量,非CMake变量 CMAKE_LIBRARY_PATH 环境变量 CMAKE_CURRENCT_SOURCE_DIR 当前处理的CMakeLists.txt所在的路径 CMAKE_CURRENT_BINARY_DIR target编译目录 使用ADD_SUBDIRECTORY(src bin)可以更改此变量的值 使用SET(EXECUTABLE_OUTPUT_PATH \u003c新路径\u003e)并不会对此变量有影响,只改变最终目标文件的存储路径 CMAKE_CURRENT_LIST_FILE 输出调用这个变量的CMakeLists.txt的完整路径 CMAKE_CURRENT_LIST_LINE 输出这个变量所在的行 CMAKE_MODULE_PATH 定义自己的cmake模块所在的路径 EXECUTABLE_OUTPUT_PATH 重新定义目标二进制可执行文件的存储路径 LIBRARY_OUTPUT_PATH 重新定义链接库的存储路径 PROJECT_NAME 返回通过PROJECT指令定义的项目名称 CMAKE_ALLOW_LOOSE_LOOP_CONSTRUCTS 用来控制 ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:1:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"系统信息 CMAKE_MAJOR_VERSION cmake的主版本号,如2.8.6中的2 CMAKE_MINOR_VERSION cmake的次版本号,如2.8.6中的8 CMAKE_PATCH_VERSION cmake的补丁等级,如2.8.6中的6 CMAKE_SYSTEM 系统名称,如Linux-2.6.22 CMAKE_SYSTEM_NAME 不包含版本的系统名,如Linux CMAKE_SYSTEM_VERSION 系统版本,如2.6.22 CMAKE_SYSTEM_PROCESSOR 处理器名称,如i386 UNIX 在所有的类UNIX平台为TRUE,包括OS x和cygwin WIN32 在所有的win32平台为TRUE,包括cygwin ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:2:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"常用命令 PROJECT 指定工程名称,PROJECT(projectname) SET 定义变量,SET(VAR [VALUE]),可以定义多个value,空格分隔 MESSAGE 向终端输出用户定义的信息或变量的值,MESSAGE([SEND_ERROR|STAUTS|FATAL_ERROR] “display”) SEND_ERROR 产生错误,生成过程被跳过 STATUS 输出前缀为–的信息 FATAL_ERROR 立即终止所有cmake过程 ADD_EXECUTABLE 生成可执行文件,ADD_EXECUTABLE(bin_file_name SRC_LIST) ADD_LIBRARY 生成动态库或静态库,ADD_LIBRARY(libname [SHARED|STATIC|MODULE] [EXCLUED_FROM_ALL] SRC_LIST) SHARED 动态库 STATIC 静态库 MODULE 在使用dyld的系统有效,否则等同于SHARED EXCLUED_FROM_ALL 表示该库不会被默认构建 SET_TARGET_PROPERTIES 设置输出的名称,设置动态库的版本和API的版本 CMAKE_MINIMUN_REQUIRED 声明CMake的版本要求 ADD_SUBDIRECTORY 添加子目录,ADD_SUBDIRECTORY(dir [binary_dir][EXCLUDE_FROM_ALL]) binary_dir 指定中间二进制和目标二进制文件的存储位置 EXCLUDE_FROM_ALL 将这个目录中编译过程中排除 INCLUDE_DIRECTORIES 向工程添加多个特定的头文件搜索路径,路径之间用空格分隔 LINK_DIRECTORIES 添加非标准的共享库搜索路径 TARGET_LINK_LIBRARIES 为target添加需要链接的共享库 ADD_DEFINITIONS 向C/C++编译器添加-D定义,参数之间用空格分隔 ADD_DEPENDENCIES 定义target依赖的其它target,确保target在构建之前,其依赖的target已构建完毕 AUX_SOURCE_DERICTORY 发现一个目录下的所有源代码文件并将列表存储在一个变量中 EXEC_PROGRAM 用于在指定目录运行某个程序(默认为当前CMakeLists.txt目录) INCLUDE 用来载入CMakeLists.txt或预定义的cmake模块 FIND_FILE 查找文件,FIND_FILE( name path1 path2 …),VAR表示找到的文件全路径,包括文件名 FIND_LIBRARY 查找库 FIND_PATH 查找路径 FIND_FILE IF 语法 IF (expression) 判断条件是否为真 IF (not exp) 与上面相反 IF (var1 and var2) 判断2个条件是否都为真 IF (var1 or var2) 判断2个条件是否至少有1个为真 IF (COMMAND cmd) 判断cmd是否为命令并可调用 IF (EXISTS dir) 判断dir目录是否存在 IF (EXISTS file) 判断file文件是否存在 IF (file1 IS_NEWER_THAN file2) 当file1比file2新,或file1/file2中有一个不存在时为真,使用全路径 IF (IS_DIRECTORY dir) 当dir时路径时为真 IF (DEFINED var) 若var被定义,为真 IF (var MATCHES regex) 当变量var匹配正则表达式regex时,为真 IF (\"hello\" MATCHES \"ell\") MESSAGE(\"true\") ENDIF (\"hello\" MATCHES \"ell\") WHILE FOREACH 列表格式 FOREACH(loop_var arg1 arg2 ...) COMMAND1(ARGS ...) COMMAND2(ARGS ...) ENDFOREACH(loop_var) AUX_SOURCE_DERICTORY(. SRC_LIST) FOREACH(F ${SRC_LIST}) MESSAGE(${F}) ENDFOREACH(F) 范围格式 FOREACH(loop_var RANGE total) COMMAND1(ARGS ...) COMMAND2(ARGS ...) ENDFOREACH(loop_var) FOREACH(VAR RANGE 10) MESSAGE(${VAR}) ENDFOREACH(VAR) 范围和步进格式 FOREACH(loop_var RANGE start stop [step]) COMMAND1(ARGS ...) COMMAND2(ARGS ...) ENDFOREACH(loop_var) FOREACH(A RANGE 5 15 3) MESSAGE(${A}) ENDFOREACH(A) ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:3:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"开关选项 BUILD_SHARED_LIBS 控制默认的库编译方式.若未设置,使用ADD_LIBRARY时又没有指定库类型,默认编译生成的都是静态库 CMAKE_C_FLAGS 设置C编译选项 CMAKE_CXX_FLAGS 设置C++编译选项 ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:4:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"添加子文件夹 # 设置查找目录 set(plugins_dir ${CMAKE_CURRENT_LIST_DIR}/plugins/) # 运行脚本查找对应的子目录,并存放到变量dirs中 execute_process( COMMAND sh ${CMAKE_CURRENT_LIST_DIR}/findplugin.sh ${plugins_dir} OUTPUT_VARIABLE dirs) # 把字符串变量转换为列表RPLACE_LIST string(REPLACE \"\\n\" \";\" RPLACE_LIST ${dirs}) # 循环,把每个plugin加入到编译中 foreach (miapi ${RPLACE_LIST}) ADD_SUBDIRECTORY(${plugins_dr}${miapi} ${CMAKE_BINARY_DIR}/${miapi}) endforeach(miapi) ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:5:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"add_custom_command 增加定制化的构建规则到构建系统中,有两种使用方式 ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:6:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"增加一个定制化命令来产生一个输出 语法格式 ADD_CUSTOM_COMMAND(OUTPUT output1 [output2 ...] COMMAND command1 [ARGS] [arg1 ...] [COMMAND command2 [ARGS] [arg2 ...]...] [MAIN_DEPENDENCY depend] [DEPENDS [depends ...]] [IMPLICIT_DEPENDS \u003clang1\u003e depend1 ...] [WORKING_DIRECTORY dir] [COMMENT comment] [VERBATIM] [APPEND]) # 不要同时在多个相互独立的目标中执行上述命令产生相同的文件,主要是为了防止产生冲突. # 如果有多条命令,会按顺序执行. # ARGS是为了向后兼容,使用过程中可以忽略 # MAIN_DEPENDENCY完全是可选的,是针对VS给出的一个建议 # 例子,copy复制文件. ADD_CUSTOM_COMMAND(OUTPUT ${dst_sql_xml} COMMAND ${CMAKE_COMMAND} -E copy ${src_sql_xml} ${dst_sql_xml} COMMENT \"copy ${src_sql_xml} \\nto ${dst_sql_xml}\" DEPENDS ${src_sql_xml}) ADD_CUSTOM_TARGET(syncxml ALL DEPENDS ${dst_sql_xml}) # 例子,copy_directory复制文件夹. ADD_CUSTOM_COMMAND(OUTPUT ${dst_sql_xml} COMMAND ${CMAKE_COMMAND} -E copy_directory ${src_sql_xml} ${dst_sql_xml} COMMENT \"copy_directory ${src_sql_xml} \\nto ${dst_sql_xml}\" DEPENDS ${src_sql_xml}) ADD_CUSTOM_TARGET(syncxml ALL DEPENDS ${dst_sql_xml}) ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:6:1","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"标记在什么时候执行命令:编译前、编译后、链接前 语法格式 ADD_CUSTOM_COMMAND(TARGET target PRE_BUILD | PRE_LINK | POST_BUILD COMMAND command1 [ARGS] [args1 ...] [COMMAND command2 [ARGS] [args2 ...]...] [WORKINGDIRECTORY dir] [COMMENT comment] [VERBATIM]) # PRE_BUILD 命令将会在其它依赖项执行前执行,只被VS7及之后的版本支持,其它会将其等同于PRE_LINK # PRE_LINK 命令将会在其它依赖项执行完后执行 # POST_BUILD 命令将会在目标构建完后执行 # 如果指定了WORKINGDIRECTORY,命令将会在指定目录运行 # 如果指定了COMMENT,命令执行前会把COMMENT的内容当做信息输出 # 如果指定了APPEND,COMMANDS和DEPENDS的值会追加到第一个指定的命令中 # 如果指定了APPEND,COMMENT、WORKINGDIRECTORY和MAIN_DEPENDENCY将会被忽略 # 如果指定了VERBATIM,所传递的命令参数会被适当地转义 # 如果指定命令的输出不是创建一个存储在磁盘上的文件,需使用SET_SOURCE_FILE_PROPERTIES把它标记为SYMBOLIC # 如果COMMAND指定了一个可执行的目标(由ADD_EXECUTABLE创建),则会 # DEPENDS指定了该命令依赖的文件 ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:6:2","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"add_custom_target 使用该命令增加一个没有输出的目标,使得它总是被构建 ADD_CUSTOM_TARGET(Name [ALL] COMMAND command1 [ARGS] [args1 ...] [DEPENDS depend depend ...] [WORKINGDIRECTORY dir] [COMMENT comment] [VERBATIM] [SOURCES src1 [src2 ...]]) # 该目标没有输出,总是被认为过期的 # 如果指定了ALL,表明目标会被添加到默认的构建目标,使得它每次都会被运行 # 具体可以参见上面的例子 ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:6:3","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"FAQ 设置条件编译 option(DEBUG_mode \"ON for debug or OFF for release\" ON) IF(DEBUG_mode) add_definitions(-DDEBUG) ENDIF() 根据OS指定编译选项 IF(WIN32) IF(APPLE) IF(UNIX) ","date":"2020-11-28","objectID":"/2020/11/28/cmake-description/:7:0","tags":["cmake"],"title":"CMake语法说明","uri":"/2020/11/28/cmake-description/"},{"categories":["C++"],"content":"CFLAGS 表示用于C编译器的选项. 指定头文件的路径. INCLUDES := -I./ INCLUDES += -I/usr/include INCLUDES += -I/usr/local/include INCLUDES += -I../../../3rd/curl-7.65.0/include INCLUDES += -I../../../3rd/mimetic-0.9.8/include CFLAGS := -m64 -std=c++11 -g -Wall -O3 $(INCLUDES) ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:1:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"CXXFLAGS 表示用于C++编译器的选项,基本同CFLAGS ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:2:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"LDFLAGS 编译器会用到的一些优化参数,也可指定库文件的位置,告诉链接器从哪里寻找库文件. LDFLAGS := -L/usr/lib LDFLAGS += -L/usr/local/lib LDFLAGS += -L/usr/local/ssl/lib LDFLAGS += -L../../../3rd/curl-7.65.0/lib LDFLAGS += -L../../../3rd/mimetic-0.9.8/lib ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:3:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"LIBS 告诉链接器需要链接哪些文件. LIBS = -lmimetic -lcurl -lrt -static-libgcc -static-libstdc++ ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:4:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"include 语法: include ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:5:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"wildcard 隐晦规则 make看到一个.o文件,就会自动把对应的.c文件加在依赖关系中,并且cc -c xx.c也会被推导. 伪目标 .PHONY : clean clean : -rm xx ${objects} 表示clean是个伪目标,在rm前面加个小减号表示当某些文件出现问题时跳过,继续往下执行. 工作方式 读入所有的Makefile. 读入被include的其它Makefile. 初始化文件中的变量. 推导隐晦规则,并分析所有规则. 为所有的目标文件创建依赖关系链. 根据依赖关系,决定哪些目标要重新生成. 执行生成命令. 书写规则 Makefile中只应该有一个最终目标,其它目标都是被连带出来的. 规则语法,targets是目标,prerequisites表示目标所依赖的文件或目标,command表示生成目标文件所需要执行的命令. targets : prerequisites command 在规则中使用通配符,支持三种(* ? […]),*表示任意长度的字符串,转义字符为'/' clean : rm -f *.o //　上面表示删除任意已.o结尾的文件. // 需要注意用在变量中. objects = *.o // objects的值就是\"*.o\",并不会被展开,若想让objects的值是所有.o文件的集合，如下书写 objects := $(wildcard *.o) 文件搜寻 VPATH: Makefile中的特殊变量,指定文件搜寻目录. VPATH = src : ../headers // make会自动去src和../headers目录搜寻依赖文件. vpath: make的关键字,按照某种模式去搜寻目录,多个目录以:分隔. vpath \u003cpattern\u003e \u003cdirectories\u003e // 为符合pattern模式的文件指定搜索目录directories. vpath \u003cpattern\u003e // 清除pattern模式的搜寻目录. vpath //　清除所有已被设置好的搜寻目录. vpath *.h ../headers // 在../headers目录下搜索所有以.h结尾的文件. 自动变量 “$@\",表示目前规则中所有目标的集合,主要用于有多个目标的规则中. bigoutput littleoutput : text.g generate text.g -$(substr output,,$@) \u003e $@ //　上述规则等价于 bigoutput : text.g generate text.g -big \u003e bigoutput littleoutput : text.g generate text.g -little \u003e littleoutput // $@表示目标的集合,就像一个数组(2个元素bigoutput、littleoutput),$@依次取出目标并执行命令. “$\u003c\",表示所有的依赖目标集,是一个一个取出来的. “$%\",仅当目标是函数库文件时(.a或.lib),表示规则中的目标成员名. 如果目标\"foo.a(bar.o)\",那”$%“就是bar.o,\"$@“就是foo.a “$?\",所有比目标新的依赖目标的集合,以空格分隔. “$^\",所有的依赖目标的集合,以空格分隔,会去除重复的. “$*\",表示目标模式中”%“及其之前的部分.如果目标是\"dir/a.foo.b”,并且目标的模式是\"a.%.b”,那么值就是\"dir/a.foo” 静态模式 更容易定义多目标的规则. // 语法. \u003ctargets ...\u003e : \u003ctarget-pattern\u003e : \u003cprereq-patterns ...\u003e \u003ccommand\u003e // targets: 定义了一系列的目标文件,可以有通配符.是目标的一个集合. // target-pattern: 指明了targets的模式,也就是目标集的模式. // prereq-patterns: 目标的依赖模式.对target-pattern形成的模式再进行一次依赖目标的定义. objects = foo.o bar.o all : $(objects) $(objects) : %.o : %.c $(CC) -c $(CFLAGS) $\u003c -o $@ // 目标从objects中获取,%.o表明所有以\".o\"结尾的目标,也就是\"foo.o和bar.o\",这也是变量$ojbects集合的模式 // 而依赖模式\"%.c\"则取模式\"%.o\"的\"%\",即\"foo和bar\",并为其加上.c后缀,则依赖的目标是\"foo.c和bar.c\" // $\u003c为自动化变量,表示所有的依赖目标集,即\"foo.c和bar.c\" // $@为自动化变量,表示目标集. //　上面规则等价于: foo.o : foo.c $(CC) -c $(CFLAGS) foo.c -o foo.o bar.o : bar.c $(CC) -c $(CFLAGS) bar.c -o bar.o //　例子: files = foo.elc bar.o lose.o $(filter %.o,$(files)) : %.o : %.c $(CC) -c $(CFLAGS) $\u003c -o $@ // filter为过滤函数. 变量 操作符”=\",右侧变量的值可以不用提前定义. 操作符”:=\",右侧变量的值必须在这之前定义. 操作符”?=\",如果变量之前没有被定义过,那变量的值就是右侧的值;否则什么也不做. 变量值的替换,$(var:a=b),把变量var中所有以\"a\"子串结尾的\"a\"替换成\"b\"子串,结尾指空格或结束符 操作符”+=\",给变量追加值. 条件表达式 ifeq ifneq ifdef ifndef 函数 $(subst ,,),把字符串text中的\"from\"替换为\"to\" $(patsubst ,,),查找字符串text中是否有符合模式\"pattern\",如果匹配则以\"replacement\"替换. “replacement\"中也可以包含”%\",指\"pattern\"中那个\"%“所代表的子串. $(stip ),去掉字符串string中开头和结尾的空字符. $(findstring ,),在字符串\"in\"中查找\"find”,如果找到返回\"find\",否则返回空字符串. $(filter \u003cpattern…\u003e,),以\"pattern\"模式过滤\"text\"字符串中的单词,保留符合\"pattern\"的单词.可以有多个模式. sources := foo.c bar.c baz.s ugh.h foo : $(sources) cc $(filter %.c %.s,$(sources)) -o foo // filter返回的是foo.c bar.c baz.s,有２个模式%.c和%.s $(filter-out \u003cpattern…\u003e,),反过滤函数. $(sort ),排序函数,给\"list\"中的单词升序的方式排序.会去掉相同的单词. $(word ,),取字符串\"text\"中的第n个单词(从1开始). $(wordlist ,,),取单词串函数,从字符串\"text\"中取从到的单词串,s和e是一个数字. $(words ),单词个数统计函数. $(firstword ),首单词函数. $(dir \u003cnames…\u003e),取目录函数,从文件名序列中取出目录部分. $(notdir \u003cnames…\u003e),取文件函数,从文件名序列中取出非目录部分. $(suffix \u003cnames…\u003e),取后缀函数,从文件名序列中取出各个文件名的后缀. $(basename \u003cnames…\u003e),取前缀函数. $(addsuffix ,\u003cnames…\u003e),加后缀函数. $(addprefix ,\u003cnames…\u003e),加前缀函数. $(join ,),连接函数,把\"list2\"中的单词对应地加到\"list1\"的单词后面. $(join aaa bbb,111 222 333)-\u003e\"aaa111 bbb222 333\" $(foreach ,,),把参数list中的单词逐一取出放到参数var中,然后再执行所包含的表达式. ``` go names := a b c d files := $(foreach n,$(names),$(n).o) // files的值就是a.o b.o c.o d.o $(if ,,),else是可选的. $(call ,,,…),用参数依次取代\"expression\"中的变量. $(origin ),变量\"variable\"是从哪来的. undefined: 表示该变量未定义. default: 表示是默认定义的. environment: 表示是环境变量. file: 表示该变量定义在Makefile中 command line: 表示是被命令行定义的 override: 表示是被override指示符重新定义的 automatic:　表示是自动化变量 隐含规则 编译c程序的隐含规则,.o会自动推导为.c,命令是$(CC) -c $(CPPFLAGS) $(CFLAGS) 编译C++的隐含规则,.o会自动推导为.cc,命令是$(CXX) -c $(CPPFLAGS) $(CFLAGS) 链接Object文件的隐含规则,$(CC) $(LDFLAGS) .o $(LOADLIBES) $(LDLIBS) ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:6:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"变量 CC,C语言编译程序,默认命令是\"cc\" CXX,C++语言编译程序,默认命令是\"g++\" RM,删除文件命令,默认命令是\"rm -f\" CFLAGS,C语言编译器参数 CXXFLAGS,C++语言编译器参数 CPPFLAGS,C预处理器参数 LDFLAGS,链接器参数,如\"ld\" 模式规则 使用模式规则定义一个隐含规则. ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:7:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"介绍 模式规则中,至少在规则的目标定义中要包含\"%\",否则就是一般的规则.目标中的\"%“定义表示对文件名的匹配. 例子: %.o : %.c; ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:8:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["C++"],"content":"示例 %.o : %.c $(CC) -c $(CFLAGS) $(CPPFLAGS) $\u003c -o $@ // $\u003c 表示依赖目标 // $@ 表示目标 一个完整的例子: SOURCE := $(wildcard *.cpp) OBJS := $(patsubst %.cpp,%.o,$(SOURCE)) EXENAME := mail_agent TARGET := $(EXENAME) CC := g++ INCLUDES := -I./ INCLUDES += -I/usr/include INCLUDES += -I/usr/local/include INCLUDES += -I../../../3rd/curl-7.65.0/include INCLUDES += -I../../../3rd/mimetic-0.9.8/include LDFLAGS := -L/usr/lib LDFLAGS += -L/usr/local/lib LDFLAGS += -L/usr/local/ssl/lib LDFLAGS += -L../../../3rd/curl-7.65.0/lib LDFLAGS += -L../../../3rd/mimetic-0.9.8/lib LIBS = -lmimetic -lcurl -lrt -static-libgcc -static-libstdc++ CFLAGS := -m64 -std=c++11 -g -Wall -O3 $(INCLUDES) CXXFLAGS := $(CFLAGS) $(TARGET) : $(OBJS) $(CC) $(CXXFLAGS) -o $@ $(OBJS) $(LDFLAGS) $(LIBS) .PHONY : clean clean: -rm $(OBJS) $(TARGET) ","date":"2020-11-28","objectID":"/2020/11/28/makefile-description/:9:0","tags":["makefile"],"title":"makefile语法说明","uri":"/2020/11/28/makefile-description/"},{"categories":["MySQL"],"content":"数据准备 /*mysql版本*/Serverversion:5.7.27-logMySQLCommunityServer(GPL)/*创建表t*/CREATETABLE`t`(`id`int(11)NOTNULLAUTO_INCREMENT,`city`varchar(16)NOTNULL,`name`varchar(16)NOTNULL,`age`int(11)NOTNULL,`addr`varchar(128)DEFAULTNULL,PRIMARYKEY(`id`),KEY`city`(`city`))ENGINE=InnoDB;/*数据分布*/mysql\u003eselectcity,count(*)fromtgroupbycity;+-----------+----------+ |city|count(*)|+-----------+----------+ |beijing|5590||gongan|5398||guangzhou|5557||hangzhou|5505||ouchi|5402||shanghai|5375||shenzhen|5591||wuhan|5582|+-----------+----------+ 8rowsinset(0.02sec) ","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:1:0","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"带条件orderby过程分析 主要针对sql语句select city, name, age from t where city='hangzhou' order by name limit 1000来分析. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:2:0","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"全字段排序 mysql\u003eexplainselectcity,name,agefromtwherecity='hangzhou'orderbynamelimit1000;+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+---------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+---------------------------------------+ |1|SIMPLE|t|NULL|ref|city|city|50|const|5505|100.00|Usingindexcondition;Usingfilesort|+----+-------------+-------+------------+------+---------------+------+---------+-------+------+----------+---------------------------------------+ 1rowinset,1warning(0.00sec) 通过执行计划可以看出，rows扫描行数为5505，符合预期。Using index condition(索引下推,Index Condition Pushdown)使用了索引city来遍历。Using filesort使用了排序. 开启optimizer_trace，来跟踪执行结果 /*打开optimizer_trace，只对本线程有效*/setoptimizer_trace='enabled=on';/*@a保存Innodb_rows_read的初始值*/selectvariable_valueinto@afromperformance_schema.session_statuswherevariable_name='Innodb_rows_read';/*执行sql语句*/selectcity,name,agefromtwherecity='hangzhou'orderbynamelimit1000;/*查看optimizer_trace输出*/select*frominformation_schema.OPTIMIZER_TRACE\\G/*截图部分结果*/\"filesort_execution\":[],\"filesort_summary\":{\"rows\":5505,\"examined_rows\":5505,\"number_of_tmp_files\":9,\"sort_buffer_size\":262000,\"sort_mode\":\"\u003csort_key, packed_additional_fields\u003e\"}/*@b保存Innodb_rows_read的当前值*/selectvariable_valueinto@bfromperformance_schema.session_statuswherevariable_name='Innodb_rows_read';/*计算Innodb_rows_read的差值*/select@b-@a;mysql\u003eselect@b-@a;+-------+ |@b-@a|+-------+ |5506|+-------+ 1rowinset(0.00sec) optimizer_trace结果分析： optimizer_trace结果中的number_of_tmp_files可以看到使用了临时文件来排序.说明排序过程中内存放不下所有待排序数据,需要使用外部排序(一般采用归并排序). Mysql把待排序的数据分成了9份,每一份单独排序存放在临时文件中,最后把这9个有序文件再合并成一个有序的大文件. 若number_of_tmp_files为0,表示排序是在内存中完成的.Mysql通过参数sort_buffer_size来定义排序缓存的大小,若sort_buffer_size越小,需要分成的份数就越多. optimizer_trace结果中的examined_rows表示参与排序的行数. optimizer_trace结果中的sort_mode里的packed_additional_fields表示对字符串进行了紧凑处理,针对varchar字段是按照实际长度来分配空间的. packed_additional_fields也表明排序采用的是全字段排序,即排序时包含所有查询字段(先把city、name和age字段查询出来放入到临时文件中,然后再根据name排序).Mysql通过参数max_length_for_sort_data来控制排序字段的长度,默认是1024. 整个排序过程: 根据参数max_length_for_sort_data来判断,放入sort_buffer的字段(字段为city、name和age). 从索引city中找到满足条件的主键id. 根据主键id到主键索引中取出整行数据,取city、name、age三个字段的值,存入sort_buffer中. 从索引city中获取下一个满足条件的主键id. 循环3、4直到city不满足条件为止. 对sort_buffer中的数据按照字段name做排序. 排序完成sort_buffer内存空间就会被释放. Innodb_rows_read的差值为什么是5506而不是5505？ 因为在查询OPTIMIZER_TRACE表时，需要用到临时表，而临时表的存储引擎默认是InnoDB(通过参数internal_tmp_disk_storage_engine来控制的)，再把数据从临时表取出来时，会让Innodb_rows_read的值加1 ","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:2:1","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"rowid排序 若查询的字段很多,总长度超过了max_length_for_sort_data所规定的长度,排序的过程是如何的? 修改参数set max_length_for_sort_data=16;,查询的三个字段的总长度为为36,则可触发上面说的情况.然后再来看看排序的过程. /*@a保存Innodb_rows_read的初始值*/selectvariable_valueinto@afromperformance_schema.session_statuswherevariable_name='Innodb_rows_read';/*执行sql语句*/selectcity,name,agefromtwherecity='hangzhou'orderbynamelimit1000;/*查看optimizer_trace输出*/select*frominformation_schema.OPTIMIZER_TRACE\\G/*截图部分结果*/\"filesort_execution\":[],\"filesort_summary\":{\"rows\":5505,\"examined_rows\":5505,\"number_of_tmp_files\":9,\"sort_buffer_size\":261760,\"sort_mode\":\"\u003csort_key, rowid\u003e\"}/*@b保存Innodb_rows_read的当前值*/selectvariable_valueinto@bfromperformance_schema.session_statuswherevariable_name='Innodb_rows_read';/*计算Innodb_rows_read的差值*/select@b-@a;mysql\u003eselect@b-@a;+-------+ |@b-@a|+-------+ |6506|+-------+ 1rowinset(0.00sec) optimizer_trace结果分析: 依然会采用外部排序,使用了9个临时文件来排序 sort_mode变更为rowid,表明排序时的列只有要排序的列(name字段)和主键id.根据name排序完后还要根据对应的主键id去获取字段的值. rowid排序比全字段排序会多了回表操作,必定会影响排序的性能. select @b-@a的结果为6506,比之前的多了1000,为什么? 全字段排序时回表是在引擎层内部自动完成的,server层并不感知,server层只是调用了5505次引擎的读接口获取city、name、age的值,然后在server层完成排序,所以是5505次读. rowid排序时server层会先调用5505次引擎的读接口获取name、id的值,然后在server层完成排序,然后取前1000条记录中的id调用引擎的读接口获取对应的city、name、age的值,所以是6505次读. 整个排序过程: 根据参数max_length_for_sort_data来判断,放入sort_buffer的字段(字段为name和主键id). 从索引city中找到满足条件的主键id. 根据主键id到主键索引中取出整行数据,取id和name字段的值,存入sort_buffer中. 从索引city中获取下一个满足条件的主键id. 循环3、4直到city不满足条件为止. 对sort_buffer中的数据按照字段name做排序. 再根据排序结果中的主键id去主键索引中获取city、name和age字段的值. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:2:2","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"利用索引有序的特性 order by name需要排序是因为name字段是无序的,如果为有序的,order by又是怎样处理的? /*增加city+name的索引*/mysql\u003ealtertabletaddindexcity_name(city,name);QueryOK,0rowsaffected(0.34sec)Records:0Duplicates:0Warnings:0/*查看执行计划*/mysql\u003eexplainselectcity,name,agefromtwherecity='hangzhou'orderbynamelimit1000;+----+-------------+-------+------------+------+----------------+-----------+---------+-------+------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+----------------+-----------+---------+-------+------+----------+-----------------------+ |1|SIMPLE|t|NULL|ref|city,city_name|city_name|50|const|5505|100.00|Usingindexcondition|+----+-------------+-------+------------+------+----------------+-----------+---------+-------+------+----------+-----------------------+ 1rowinset,1warning(0.00sec) 可以看到执行计划Extra中只有Using index condition,使用索引city_name而不用再排序了.虽然不用再排序,但仍然需要回表操作来获取city、name和age的值. 如果使用了覆盖索引,该查询语句又会是怎样的列? /*增加city+name+age的索引,覆盖所有查询字段*/mysql\u003ealtertabletaddindexcity_name_age(city,name,age);QueryOK,0rowsaffected(0.32sec)Records:0Duplicates:0Warnings:0/*查看执行计划*/mysql\u003eexplainselectcity,name,agefromtwherecity='hangzhou'orderbynamelimit1000;+----+-------------+-------+------------+------+------------------------------+-----------+---------+-------+------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+------------------------------+-----------+---------+-------+------+----------+-----------------------+ |1|SIMPLE|t|NULL|ref|city,city_name,city_name_age|city_name|50|const|5505|100.00|Usingindexcondition|+----+-------------+-------+------------+------+------------------------------+-----------+---------+-------+------+----------+-----------------------+ 1rowinset,1warning(0.00sec) 从执行计划中可以看出仍然是使用的索引city_name,为什么不使用索引city_name_age来避免回表操作列? 强制使用索引city_name_age来看看具体情况. mysql\u003eexplainselectcity,name,agefromtforceindex(city_name_age)wherecity='hangzhou'orderbynamelimit1000;+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+----------+--------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+----------+--------------------------+ |1|SIMPLE|t|NULL|ref|city_name_age|city_name_age|50|const|9940|100.00|Usingwhere;Usingindex|+----+-------------+-------+------------+------+---------------+---------------+---------+-------+------+----------+--------------------------+ 1rowinset,1warning(0.01sec) 从执行计划中可以看出使用覆盖索引(Extra中的Using index就是覆盖索引).但rows为什么是9940? 再来看看使用order by name, age的情况. mysql\u003eexplainselectcity,name,agefromtwherecity='hangzhou'orderbyname,agelimit1000;+----+-------------+-------+------------+------+------------------------------+---------------+---------+-------+------+----------+--------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+------------------------------+---------------+---------+-------+------+----------+--------------------------+ |1|SIMPLE|t|NULL|ref|city,city_name,city_name_age|city_name_age|50|const|5505|100.00|Usingwhere;Usingindex|+----+-------------+-------+------------+------+------------------------------+---------------+---------+-------+------+----------+--------------------------+ 1rowinset,1warning(0.00sec) 这时和预期是一样的,使用了索引city_name_age(覆盖索引Using index). 再来看看当把索引city_name删除后的情况. /*删除索引city_name*/mysql\u003ealtertabletdropindexcity_name;QueryOK,0rowsaffected(0.02sec)Records:0Duplicates:0Warnings:0mysql\u003eexplainselectcity,name,agefromtwherecity='hangzhou'orderbynamelimit1000;+----+-------------+-------+------------+------+--------------------+---------------+---------+-------+------+-----","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:2:3","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"不带条件的orderby分析 针对语句select * from t order by city limit 1000,会使用索引吗?如果会是哪个索引? 针对语句select * from t order by city limit 10000列? mysql\u003eexplainselect*fromtorderbycitylimit1000;+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------+ |1|SIMPLE|t|NULL|index|NULL|city|50|NULL|1000|100.00|NULL|+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-------+ 1rowinset,1warning(0.00sec)mysql\u003eexplainselect*fromtorderbycitylimit10000;+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+ |1|SIMPLE|t|NULL|ALL|NULL|NULL|NULL|NULL|44064|100.00|Usingfilesort|+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+----------------+ 1rowinset,1warning(0.00sec) 从执行计划中可以看出,语句1是有使用索引city且不用排序.而语句2是全表扫描且使用了排序(Extra的Using filesort). 当把limit从1000改成10000时查询为什么造成这种差异? 使用索引city是需要进行回表操作的,当limit数据量大时优化器认为回表操作代价太大,还不如直接全表扫描. 针对回表操作,从索引city中获取到的id不是有序的,回表会造成随机读,这也是优化器认为代价太大的原因(Mysql其实有提供MRR机制来优化这种情况),还不如直接全表扫描,使用顺序读. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:3:0","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"总结 可以通过合理的创建索引来避免order by排序,提高查询性能. 当排序不可避免时,有两个系统参数max_length_for_sort_data和sort_buffer_size会对排序过程产生影响. 当排序不可避免时,尽量使用sort_buffer内存+全字段排序,这样性能最好.可以考虑优化上面两个参数. Mysql设计思想之一:当内存足够时,就要多利用内存,尽量减少磁盘访问. Mysql设计思想之一:尽量避免磁盘随机IO. Using filesort表示需要排序. Using where; Using index表示使用覆盖索引,需要的数据都在索引列中,不需要回表. Using index condition表示索引下推来过滤条件,但需要回表查询数据. Using where表示使用索引的情况下,需要回表查询数据. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-orderby/:4:0","tags":["mysql"],"title":"MySQL的orderby分析","uri":"/2020/11/24/mysql-orderby/"},{"categories":["MySQL"],"content":"全局锁 全局锁就是对整个数据库实例加锁.MySQL提供了命令Flush tables with read lock(FTWRL),可使整库处于只读状态,其它线程的数据更新语句(数据的增删改)、数据定义语句(DDL)和更新类事务的提交语句都会被阻塞. 全局锁的典型使用场景是做全库的逻辑备份.使用FTWRL可确保不会有其它线程对数据库做更新,然后再对整个库做备份,这样可以保证数据库的数据逻辑一致性.但让整库只读,是危险操作: 如果在主库上备份,那么在备份期间都不能执行更新,业务基本上停摆. 如果在从库上备份,那么在备份期间不能执行主库同步过来的binlog,会主从延迟. 在针对InnoDB引擎的表做全库备份时,可以采用可重复读隔离级别下来进行备份,不会对其它线程的操作造成堵塞,还可以保证数据的逻辑一致性.是基于一致性视图+MVCC来实现的. 可以使用官方工具mysqldump,使用参数-single-transaction时,会在可重复读隔离级别下启动一个事务,确保拿到一致性视图.但该方法只适用于所有的表都使用事务引擎的库. 使用命令set global readonly=true也可使整库处于只读状态,但这个操作更加的危险. 在有些系统中,readonly的值会被用来做其它逻辑,比如判断一个库是主库还是备库.修改该值影响面更广. 在异常处理机制上有差异.执行FTWRL后客户端异常断开,MySQL会自动释放这个全局锁,使数据库恢复到正常状态.而readonly修改后会一直有效,使库一直处于只读状态,风险更高. ps: 在从库上如果用户有超级权限,readonly是失效的. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:1:0","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"表级锁 MySQL里表级别的锁有两种,表锁和元数据锁(meta data lock,简称MDL锁) ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:2:0","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"表锁 加锁语法为lock tables ... read/write,解锁语句为unlock tables -- session1 mysql\u003elocktablestread,t1write;QueryOK,0rowsaffected(0.00sec)mysql\u003eselect*fromtlimit1;+----+--------+------+-----+------+ |id|city|name|age|addr|+----+--------+------+-----+------+ |1|gongan|??|80|test|+----+--------+------+-----+------+ 1rowinset(0.00sec)mysql\u003eselect*fromt1limit1;+----+------+------+ |id|a|b|+----+------+------+ |1|1|1|+----+------+------+ 1rowinset(0.00sec)mysql\u003einsertintotvalues(null,'gongan','guanguan',1,'test');ERROR1099(HY000):Table't'waslockedwithaREADlockandcan't be updated mysql\u003e insert into t1 values(101, 101, 101); Query OK, 1 row affected (0.00 sec) mysql\u003e select * from t2 limit 1; ERROR 1100 (HY000): Table 't2' was not locked with LOCK TABLES mysql\u003e insert into t2 values(1001, 1001, 1001); ERROR 1100 (HY000): Table 't2' was not locked with LOCK TABLES 针对表t读锁,表t1为写锁.本线程两个表的查询操作都正常,表t1插入正常,但针对表t的插入操作报错. 本线程只能操作表t和表t1,操作其它表都会报错. 本线程只能读表t,写会报错. 本线程能读写表t1. -- session2 mysql\u003eselect*fromtlimit1;+----+--------+------+-----+------+ |id|city|name|age|addr|+----+--------+------+-----+------+ |1|gongan|??|80|test|+----+--------+------+-----+------+ 1rowinset(0.00sec)mysql\u003eselect*fromt2limit1;+----+------+------+ |id|a|b|+----+------+------+ |1|1|1|+----+------+------+ 1rowinset(0.00sec)mysql\u003einsertintot2values(1001,1001,1001);QueryOK,1rowaffected(0.03sec)-- block mysql\u003einsertintotvalues(null,'gongan','guanguan',1,'test'); 其它线程能正常读表t,但写表t时被阻塞 其它线程能正常读写其它表. -- session3(block) mysql\u003eselect*fromt1limit1; 其它线程读表t1时被阻塞. -- session4(block) mysql\u003einsertintot1values(102,102,102); 其它线程写表t1时被阻塞. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:2:1","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"元数据锁 MySQL5.5版本之后的功能,会自动加锁、解锁.主要是为了解决DDL和DML并发的问题.DML时会加MDL读锁,DDL时会加MDL写锁,读读之间不互斥,读写、写写之间互斥. -- session1 mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eselect*fromt1limit1;+----+------+------+ |id|a|b|+----+------+------+ |1|1|1|+----+------+------+ 1rowinset(0.00sec)-- session2 mysql\u003ealtertablet1addfint;-- session3 mysql\u003eselect*fromt1limit1; 在session1提交事务之前,session2和session3都被阻塞了. session1是获取了表t1的MDL读锁,但由于事务未提交,导致该MDL读锁未被释放. session2是给表t1增加列,需要MDL写锁,由于读写互斥,导致该操作被堵塞. session3是读表t1,需要MDL读锁,但之前已经有MDL写锁在等待了,也导致获取不到MDL读锁,从而被堵塞. -- session1 mysql\u003ecommit;QueryOK,0rowsaffected(0.00sec)-- session2 mysql\u003ealtertablet1addfint;QueryOK,0rowsaffected(4min23.27sec)Records:0Duplicates:0Warnings:0-- session3 mysql\u003eselect*fromt1limit1;+----+------+------+ |id|a|b|+----+------+------+ |1|1|1|+----+------+------+ 1rowinset(4min20.25sec) session1提交之后,session2和session3才执行完,但注意session3在session2之前执行,也就是session3先获取到了MDL读锁. 当有多个线程在等待MDL锁时,获取锁的规则是什么?会由哪个线程得到锁?读锁优先还是写锁优先? MDL锁是在需要的时候由MySQL自动加的,但要等待事务被提交后才会被释放. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:2:2","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"行锁 全局锁和表级锁是在server层实现的,而行锁是由引擎层实现的,这里主要关注InnoDB的行锁.行锁是自动加的. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:0","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"两阶段协议锁 -- session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eupdatet2seta=2whereid=1;QueryOK,1rowaffected(0.01sec)Rowsmatched:1Changed:1Warnings:0mysql\u003eupdatet2seta=3whereid=2;QueryOK,1rowaffected(0.00sec)Rowsmatched:1Changed:1Warnings:0-- session B(block) mysql\u003eupdatet2seta=1whereid=1; 可以看到session A未提交,会导致session B阻塞.当session A在commit提交之后,session B才能开始执行. 行锁是在需要的时候由InnoDB自动加上,但直到事务结束时锁才会被释放.这就是两阶段协议锁. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:1","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"死锁和死锁检测 在并发系统中当不同的线程出现循环资源依赖,就会导致这些线程都进入无限等待的状态,称为死锁.如下session A和session B就出现了循环资源依赖,导致死锁. session A session B begin;update t2 set a = 2 where id = 1; begin; update t2 set a = 3 where id = 2; update t2 set a = 4 where id = 2; update t2 set a = 5 where id = 1; 在InnoDB中,当处于锁等待状态时,就有可能会触发死锁检测,是参数innodb_deadlock_detect控制的. -- 默认值为on,表示开启死锁检测. mysql\u003eshowvariableslike'innodb_deadlock%';+------------------------+-------+ |Variable_name|Value|+------------------------+-------+ |innodb_deadlock_detect|ON|+------------------------+-------+ 1rowinset(0.00sec) 每个新来的被堵住的线程,都要判断会不会由于自己的加入导致了死锁,这是一个时间复杂度为O(n)的操作.假设有1000个线程要同时更新同一行,那么死锁检测操作就是100万量级的,这期间会消耗大量的CPU资源.注意死锁检测只会检测相关联的线程.比如当前session A在等待session B,而session B在等待session C;session D在等待session E.当session F加入需要等待session A时,只会检测F-\u003eA-\u003eB-\u003eC,D和E是不会检测的. 怎么解决由热点行更新导致的性能问题? 若在事务中需要锁多个行,把最可能造成锁冲突、最可能影响并发度的锁的申请时机尽量往后放. 另外需要注意的是,当等待锁一定时间后,会出现超时现象,是参数innodb_lock_wait_timeout控制的. -- 等待锁超时. mysql\u003eupdatet2seta=1whereid=1;ERROR1205(HY000):Lockwaittimeoutexceeded;tryrestartingtransaction-- 锁等待超时时间,默认为50s. mysql\u003eshowvariableslike'innodb_lock_wait_timeout';+--------------------------+-------+ |Variable_name|Value|+--------------------------+-------+ |innodb_lock_wait_timeout|50|+--------------------------+-------+ 1rowinset(0.01sec) 死锁时可以通过show engine innodb status命令查看 mysql\u003eshowengineinnodbstatus\\G***************************1.row***************************Type:InnoDBName:Status:=====================================2020-07-1414:49:160x7fa8507f8700INNODBMONITOROUTPUT=====================================Persecondaveragescalculatedfromthelast26seconds----------------- BACKGROUNDTHREAD----------------- srv_master_threadloops:521srv_active,0srv_shutdown,3098384srv_idlesrv_master_threadlogflushandwrites:3098905---------- SEMAPHORES---------- OSWAITARRAYINFO:reservationcount6262OSWAITARRAYINFO:signalcount28687RW-sharedspins0,rounds27977,OSwaits2270RW-exclspins0,rounds159415,OSwaits2170RW-sxspins4327,rounds34223,OSwaits205Spinroundsperwait:27977.00RW-shared,159415.00RW-excl,7.91RW-sx-- 死锁信息 ------------------------ LATESTDETECTEDDEADLOCK------------------------ 2020-07-1412:00:340x7fa8507f8700***(1)TRANSACTION:TRANSACTION3667,ACTIVE7secstartingindexreadmysqltablesinuse1,locked1LOCKWAIT2lockstruct(s),heapsize1136,1rowlock(s)MySQLthreadid73,OSthreadhandle140361007904512,queryid1776328localhostwebupdatingupdatetsetd=d+1wherec=10***(1)WAITINGFORTHISLOCKTOBEGRANTED:RECORDLOCKSspaceid43pageno4nbits80indexcoftable`web`.`t`trxid3667lock_modeXwaitingRecordlock,heapno4PHYSICALRECORD:n_fields2;compactformat;infobits00:len4;hex8000000a;asc;;1:len4;hex8000000a;asc;;***(2)TRANSACTION:TRANSACTION3668,ACTIVE12secinsertingmysqltablesinuse1,locked15lockstruct(s),heapsize1136,3rowlock(s),undologentries1MySQLthreadid72,OSthreadhandle140360881768192,queryid1776329localhostwebupdateinsertintotvalues(8,8,8)***(2)HOLDSTHELOCK(S):RECORDLOCKSspaceid43pageno4nbits80indexcoftable`web`.`t`trxid3668lockmodeSRecordlock,heapno4PHYSICALRECORD:n_fields2;compactformat;infobits00:len4;hex8000000a;asc;;1:len4;hex8000000a;asc;;***(2)WAITINGFORTHISLOCKTOBEGRANTED:RECORDLOCKSspaceid43pageno4nbits80indexcoftable`web`.`t`trxid3668lock_modeXlocksgapbeforerecinsertintentionwaitingRecordlock,heapno4PHYSICALRECORD:n_fields2;compactformat;infobits00:len4;hex8000000a;asc;;1:len4;hex8000000a;asc;;***WEROLLBACKTRANSACTION(1) ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:2","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"锁等待分析 -- session A,开启事务,在di=1加行锁. mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eupdatet2seta=2whereid=1;QueryOK,1rowaffected(0.00sec)Rowsmatched:1Changed:1Warnings:0-- session B(block),等待session A的锁. mysql\u003eselect*fromt2whereid=1lockinsharemode;-- session C,查看阻塞情况. mysql\u003eshowprocesslist;+----+------+-----------+------+---------+------+------------+--------------------------------------------------+ |Id|User|Host|db|Command|Time|State|Info|+----+------+-----------+------+---------+------+------------+--------------------------------------------------+ |68|web|localhost|web|Sleep|104||NULL||69|web|localhost|web|Query|20|statistics|select*fromt2whereid=1lockinsharemode||70|web|localhost|web|Query|0|starting|showprocesslist||71|web|localhost|web|Sleep|3724||NULL|+----+------+-----------+------+---------+------+------------+--------------------------------------------------+ 4rowsinset(0.00sec)-- 从下面可以看出69在等待68的锁. mysql\u003eselect*fromsys.innodb_lock_waitswherelocked_table='`web`.`t2`'\\G***************************1.row***************************wait_started:2020-07-1316:37:31wait_age:00:00:02wait_age_secs:2locked_table:`web`.`t2`locked_index:PRIMARYlocked_type:RECORDwaiting_trx_id:421836271774456waiting_trx_started:2020-07-1316:37:31waiting_trx_age:00:00:02waiting_trx_rows_locked:1waiting_trx_rows_modified:0waiting_pid:69waiting_query:select*fromt2whereid=1lockinsharemodewaiting_lock_id:421836271774456:33:5:2waiting_lock_mode:Sblocking_trx_id:3553blocking_pid:68blocking_query:NULLblocking_lock_id:3553:33:5:2blocking_lock_mode:Xblocking_trx_started:2020-07-1316:36:07blocking_trx_age:00:01:26blocking_trx_rows_locked:1blocking_trx_rows_modified:1sql_kill_blocking_query:KILLQUERY68sql_kill_blocking_connection:KILL681rowinset,3warnings(0.06sec) 在MySQL5.7及之后版本可以通过sys.innodb_lock_waits表获取锁占用情况. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:3","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"幻读 什么是幻读? 在一个事务内,前后看到的数据不一致,不一致特指后面看到的数据行数多了,这就是幻读(幻读特指新插入的行). 在读已提交隔离级别下,是允许存在幻读现象的. 在可重复读隔离级别下,MySQL是不存在幻读现象的. 在可重复读隔离级别下,普通的查询都是快照读,是不能看到别的事务插入的数据,因此必须是针对当前读的语句. 幻读有什么问题? 在可重复隔离级别下,如果允许幻读会出现什么现象? -- 隔离级别为可重复读. mysql\u003eselect@@transaction_isolation;+-------------------------+ |@@transaction_isolation|+-------------------------+ |REPEATABLE-READ|+-------------------------+ 1rowinset(0.00sec)-- session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eupdatet2setb=a+1wherea=4;QueryOK,1rowaffected(0.00sec)Rowsmatched:1Changed:1Warnings:0-- session B,假定此处不会block mysql\u003einsertintot2values(1002,4,1002);-- session A,提交事务 mysql\u003ecommit;QueryOK,0rowsaffected(0.01sec) 如上所示的流程,假如session A只是对a=4这行加了行锁. session B的事务是先提交,session A的事务是后提交.此时数据库的数据为(4, 4, 5)和(1002, 4, 1002).但在binlog中先是session B的操作,然后是session A的操作,该binlog同步到从库上开始执行,会得到什么结果? session B先执行,插入一行数据(1002, 4, 1002),然后更新a=4的行,从库的数据为(4, 4, 5)和(1002, 4, 5),此时主库和从库的数据不一致了. session A本来是想更新所有a=4的行的,但在这之后session B插入了一行a=4的数据,导致session A的语义被破坏了. 主从数据不一致和语义被破坏,这就是幻读的问题. 实际上InnoDB在可重复读级别下是不会出现幻读的现象的,上面的sql语句,session B的insert操作会被阻塞,直到session A的事务提交后才能执行. 如何解决幻读? 通过上例,只是加行锁,无法阻止幻读的出现.InnoDB是通过加间隙锁(Gap Lock),来锁住a=4的间隙,这样可以阻塞别的线程的插入操作. 间隙锁,锁的就是两个值之间的间隙,以上例session A来说,会锁住(3, 4)和(4, 5)的间隙.这样再插入session B的数据,会落入到(4, 5)间隙,导致被阻塞. 间隙锁一般是针对可重复隔离级别的.读已提交一般情况下只有行锁(说明该隔离级别会出现幻读的现象). 间隙锁的引入会导致锁的范围变大,这样其实会影响并发度的. 需要注意,间隙锁之间是不互斥的,不同的session之间可以锁住同样的间隙 Next-key lock 行锁+间隙锁合称为next-key lock,每个next-key lock都是前开后闭区间. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:4","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"InnoDB加锁规则(可重复读隔离级别) 原则一: 加锁的基本单位是next-key lock,是前开后闭区间. 原则二: 查找过程中访问到的对象才会加锁. 优化一: 索引上的等值查询,给唯一索引加锁的时候,next-key lock会退化为行锁. 优化二: 索引上的等值查询,向右遍历时且最后一个值不满足等值条件的时候,next-key lock会退化为间隙锁. bug一: 唯一索引上的范围查询会访问到不满足条件的第一个值为止. 数据准备. -- 创建表t. CREATETABLE`t`(`id`int(11)NOTNULL,`c`int(11)DEFAULTNULL,`d`int(11)DEFAULTNULL,PRIMARYKEY(`id`),KEY`c`(`c`))ENGINE=InnoDB-- 插入数据. mysql\u003einsertintotvalues(0,0,0),(5,5,5),(10,10,10),(15,15,15),(20,20,20),(25,25,25);QueryOK,6rowsaffected(0.00sec)Records:6Duplicates:0Warnings:0 等值查询间隙锁 -- session A. mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eupdatetsetd=d+1whereid=7;QueryOK,0rowsaffected(0.00sec)Rowsmatched:0Changed:0Warnings:0-- session B(block). mysql\u003einsertintotvalues(8,8,8);-- session C. mysql\u003eupdatetsetd=d+1whereid=10;QueryOK,1rowaffected(0.00sec)Rowsmatched:1Changed:1Warnings:0 session A在主键索引上加锁,根据规则一加锁为(5, 10],根据优化二退化为间隙锁(5, 10) session B要插入id=8,落在间隙锁(5, 10)之间,被阻塞. session C更新id=10的行,根据规则一加锁为(5, 10],根据优化一退化为行锁(10),锁不冲突,可以更新成功. 非唯一索引等值锁 -- session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eselectidfromtwherec=5lockinsharemode;+----+ |id|+----+ |5|+----+ 1rowinset(0.00sec)-- session B mysql\u003eupdatetsetd=d+1whereid=5;QueryOK,1rowaffected(0.02sec)Rowsmatched:1Changed:1Warnings:0-- session C(block) mysql\u003einsertintotvalues(7,7,7); session A在索引c上加锁,根据规则一为(0, 5]和(5, 10],根据优化二退化为(0, 5]和(5, 10).注意该语句是使用了覆盖索引,所以并没有在主键索引上加锁. session B是在主键索引上加锁,根据优化一退化为行锁(5),和session A并不冲突,可以更新成功. session C要再索引c上插入c=7的行,落在了(5, 10)之间,被session A阻塞. 注意:语句中使用的是lock in share mode,是读锁且使用了覆盖索引,并不需要访问主键索引.但如果使用的是for update,又是什么效果列? 主键索引范围锁 --session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eselect*fromtwhereid\u003e=10andid\u003c11forupdate;+----+------+------+ |id|c|d|+----+------+------+ |10|10|10|+----+------+------+ 1rowinset(0.01sec)--session B mysql\u003einsertintotvalues(8,8,8);QueryOK,1rowaffected(0.00sec)-- block mysql\u003einsertintotvalues(13,13,13);--session C (block) mysql\u003eupdatetsetd=d+1whereid=15; session A在主键索引上加锁,满足条件的第一行是id=10,则加锁(5, 10],但根据优化一退化为行锁(10);范围查询要继续查找,则加锁(10, 15],由于是范围查询没有适用的优化规则.故加锁范围为[10, 15] session B第一条插入id=8,不在锁的范围,可以插入成功.第二条插入id=13,在锁的范围,被阻塞. session C更新id=15的行,在锁的范围,被阻塞. 非唯一索引范围锁 -- session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eselect*fromtwherec\u003e=10andc\u003c11forupdate;+----+------+------+ |id|c|d|+----+------+------+ |10|10|10|+----+------+------+ 1rowinset(0.00sec)-- session B(block) mysql\u003einsertintotvalues(8,8,8);-- session C(block) mysql\u003eupdatetsetd=d+1wherec=15; session A在索引c上加锁,满足条件的第一行是c=10,则加锁(5, 10],注意c是非唯一索引没有优化规则;范围查询要继续查找,则加锁(10, 15],由于是范围查询没有适用的优化规则.故加锁范围为(5, 15]. session B要插入c=8的行,在锁的范围内,被阻塞. session C要更新c=15的行,在锁的范围内,被阻塞. 唯一索引范围bug --session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003eselect*fromtwhereid\u003e10andid\u003c=15forupdate;+----+------+------+ |id|c|d|+----+------+------+ |15|15|15|+----+------+------+ 1rowinset(0.00sec)--session B(block) mysql\u003eupdatetsetd=d+1whereid=20;--session C(block) mysql\u003einsertintotvalues(16,16,16); session A在主键索引上加锁,按照原则一,加锁(10, 15],但InnoDB会扫描到第一个不满足条件的行为止,这里也就是id=20，由于是范围扫描,所以还会加锁(15, 20].但(15, 20]是完全没必要的,可以认为是bug. session B要更新id=20的行,在锁的范围内,被阻塞. session C要插入id=16的行,在锁的范围内,被阻塞. 非唯一索引上存在相同键 -- 插入新行,c=10的有两行数据. mysql\u003einsertintotvalues(30,10,30);QueryOK,1rowaffected(0.00sec)-- 可以看到索引c的顺序. mysql\u003eselect*fromtorderbyc;+----+------+------+ |id|c|d|+----+------+------+ |0|0|0||5|5|5||10|10|10||30|10|30||15|15|15||20|20|20||25|25|25|+----+------+------+ 7rowsinset(0.01sec)--session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mysql\u003edeletefromtwherec=10;QueryOK,2rowsaffected(0.00sec)--session B(block) mysql\u003einsertintotvalues(12,12,12);--session C mysql\u003eupdatetsetd=d+1wherec=15;QueryOK,1rowaffected(0.01sec)Rowsmatched:1Changed:1Warnings:0 session A是在索引c上加锁,满足条件的第一行为(id=10,c=10),则加锁((id=5,c=5),(id=10,c=10)],非唯一索引没有优化规则.第一行为(id=30,c=10),则加锁((id=10,c=10),(id=30,c=10)],继续查找到(id=15,c=15)结束,根据优化规则二,退化为间隙锁((id=30,c=10),(id=15,c=15)) session B插入c=12的行,在锁的范围内,被阻塞. session C更新c=15的行,不再锁的范围内,可正常更新. limit语句加锁 --session A mysql\u003ebegin;QueryOK,0rowsaffected(0.00sec)mys","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:5","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"InnoDB加锁规则(读已提交隔离级别) 主要是行锁(只有在外键场景下会有间隙锁) 在语句执行过程中加的行锁,在语句执行完成后,就会把不满足条件的行的行锁释放掉,不需要等待事务提交. 在读已提交下锁的范围更小,锁的时间更短. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-lock/:3:6","tags":["mysql"],"title":"MySQL锁机制分析","uri":"/2020/11/24/mysql-lock/"},{"categories":["MySQL"],"content":"问题 使用join时驱动表、被驱动表是如何选择的?影响因素有哪些? 如何优化? ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:1:0","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"数据准备 /*创建表*/CREATETABLE`t1`(`id`int(11)NOTNULL,`a`int(11)DEFAULTNULL,`b`int(11)DEFAULTNULL,PRIMARYKEY(`id`),KEY`a`(`a`))ENGINE=InnoDB;/*创建存储过程*/delimiter;;createprocedureidata()begindeclareiint;seti=1;while(i\u003c=1000)doinsertintot2values(i,i,i);seti=i+1;endwhile;end;;delimiter;/*执行*/callidata();/*创建表t1*/createtablet1liket2;/*插入数据*/insertintot1(select*fromt2whereid\u003c=100); ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:2:0","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"Index Nested-Loop Join(NLJ) join时能用上被驱动表的索引,称之为Index Nested-Loop Join,简称为NLJ mysql\u003eexplainselect*fromt1joint2ont1.a=t2.a;+----+-------------+-------+------------+------+---------------+------+---------+----------+------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+----------+------+----------+-------------+ |1|SIMPLE|t1|NULL|ALL|a|NULL|NULL|NULL|100|100.00|Usingwhere||1|SIMPLE|t2|NULL|ref|a|a|5|web.t1.a|1|100.00|NULL|+----+-------------+-------+------------+------+---------------+------+---------+----------+------+----------+-------------+ 2rowsinset,1warning(0.00sec) 从上面的结果可以看出,驱动表是表t1,被驱动表是表t2.驱动表是全表扫描,而被驱动表是用的索引a. 语句执行过程是: 从表t1中读入一行数据R. 从R取出字段a的值,去表t2里查找. 取出表t2中满足条件的行,跟R组成一行,作为结果集的一部分. 重复执行步骤1到3,直到表t1的末尾,循环结束. 假定驱动表有N行,被驱动表有M行,每扫描一行驱动表,使用字段a的值去被驱动表的索引树a上查找,然后再回表到被驱动表的主键索引树,则被驱动表要扫描2*$log_2{M}$.则总的扫描行数为N+N*2*$log_2{M}$.显然N对扫描行数的影响更大,因此在这种情况下应该使用小表为驱动表. 但如果被驱动表没有索引列? ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:3:0","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"Block Nested-Loop Join(BNL) join时被驱动表没有索引时,称之为Block Nested-Loop Join,简称为BNL /*使用straight_join强行指定驱动表为t1*/mysql\u003eexplainselect*fromt1straight_joint2ont1.a=t2.b;+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ |1|SIMPLE|t1|NULL|ALL|a|NULL|NULL|NULL|100|100.00|NULL||1|SIMPLE|t2|NULL|ALL|NULL|NULL|NULL|NULL|1000|10.00|Usingwhere;Usingjoinbuffer(BlockNestedLoop)|+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+ 2rowsinset,1warning(0.00sec) 从上面的Extra字段的值Using join buffer (Block Nested Loop)可以看出,采用的正是BNL. 语句执行过程是: 把表t1的数据读入线程内存join_buffer中 扫描表t2,把每一行取出,跟join_buffer中的数据比对,满足join条件的,作为结果集的一部分. 整个过程表t1和表t2都是全表扫描,扫描行数为1000+100=1100,由于join_buffer中的数据时无序的,对表t2里的每一行都要做100次判断,总判断次数为1000*100=10万次.判读次数是纯内存操作,相比读表会快上不少,整个过程就是扫描1100行+10万次内存操作. 在这种情况下,不论选择哪个表为驱动表其实是没有差异的. join_buffer的大小是由参数join_buffer_size来控制的,默认大小为256k. mysql\u003eshowvariableslike'join%';+------------------+--------+ |Variable_name|Value|+------------------+--------+ |join_buffer_size|262144|+------------------+--------+ 1rowinset(0.26sec) 如果表t1的数据量很大,导致join_buffer放不下了,整个过程又会是怎么样的? 策略很简单,就是分段放入,分次比较.详见官网说明 Basic information about the join buffer cache: The size of each join buffer is determined by the value of the join_buffer_size system variable. This buffer is used only when the join is of type ALL or index (in other words, when no possible keys can be used). A join buffer is never allocated for the first non-const table, even if it would be of type ALL or index. The buffer is allocated when we need to do a full join between two tables, and freed after the query is done. Accepted row combinations of tables before the ALL/index are stored in the cache and are used to compare against each read row in the ALL table. We only store the used columns in the join buffer, not the whole rows. Assume you have the following join: TablenameTypet1ranget2reft3ALL The Join is then done as follows: -Whilerowsint1matchingrange-Readthroughallrowsint2accordingtoreferencekey-Storeusedfieldsfromt1,t2incache-Ifcacheisfull-Readthroughallrowsint3-Comparet3rowagainstallt1,t2combinationsincache-Ifrowsatisfiesjoincondition,sendittoclient-Emptycache-Readthroughallrowsint3-Comparet3rowagainstallstoredt1,t2combinationsincache-Ifrowsatisfiesjoincondition,sendittoclient 假设驱动表的行数是N,需要分K段才能完成算法流程,被驱动表数据行数是M.显然N越大,K就会越大,K=$\\lambda$*N,$\\lambda$取值范围为(0,1).此算法扫描的行数为N+$\\lambda$*N*M,内存判断次数为N*M,在N和M确定的情况下,N小些,扫描行数的算式会更小些,可见此时应小表当驱动表.参数$\\lambda$才是影响扫描行数的关键因素,这个值应该越小越好.在N固定时若join_buffer_size越大,能放入的行数越多,该值就会越小. 若join语句很慢,可尝试把join_buffer_size改大点. 以上两个join算法都应该尽量使用小表作为驱动表,但什么是小表列? ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:4:0","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"什么是小表? 先来看看两组sql语句 /*Q1,此时放入join_buffer的数据是100行*/mysql\u003eexplainselect*fromt1straight_joint2ont1.b=t2.bwheret2.id\u003c=50;+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |1|SIMPLE|t1|NULL|ALL|NULL|NULL|NULL|NULL|100|100.00|NULL||1|SIMPLE|t2|NULL|range|PRIMARY|PRIMARY|4|NULL|50|10.00|Usingwhere;Usingjoinbuffer(BlockNestedLoop)|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ 2rowsinset,1warning(0.01sec)/*Q2,此时放入join_buffer的数据是50行*/mysql\u003eexplainselect*fromt2straight_joint1ont1.b=t2.bwheret2.id\u003c=50;+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |1|SIMPLE|t2|NULL|range|PRIMARY|PRIMARY|4|NULL|50|100.00|Usingwhere||1|SIMPLE|t1|NULL|ALL|NULL|NULL|NULL|NULL|100|10.00|Usingwhere;Usingjoinbuffer(BlockNestedLoop)|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ 2rowsinset,1warning(0.00sec)/*Q3,此时放入join_buffer的数据是100行,但只需包含t1.b这一个字段*/mysql\u003eexplainselectt1.b,t2.*fromt1straight_joint2ont1.b=t2.bwheret2.id\u003c=100;+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |1|SIMPLE|t1|NULL|ALL|NULL|NULL|NULL|NULL|100|100.00|NULL||1|SIMPLE|t2|NULL|range|PRIMARY|PRIMARY|4|NULL|100|10.00|Usingwhere;Usingjoinbuffer(BlockNestedLoop)|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ 2rowsinset,1warning(0.00sec)/*Q4,此时放入join_buffer的数据是100行,但需包含表t2的所有字段*/mysql\u003eexplainselectt1.b,t2.*fromt2straight_joint1ont1.b=t2.bwheret2.id\u003c=100;+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ |1|SIMPLE|t2|NULL|range|PRIMARY|PRIMARY|4|NULL|100|100.00|Usingwhere||1|SIMPLE|t1|NULL|ALL|NULL|NULL|NULL|NULL|100|10.00|Usingwhere;Usingjoinbuffer(BlockNestedLoop)|+----+-------------+-------+------------+-------+---------------+---------+---------+------+------+----------+----------------------------------------------------+ 2rowsinset,1warning(0.00sec) 针对语句Q1和Q2来说,Q2放入join_buffer的数据只有50行,相对来说此时t2是小表,应该为驱动表. 针对语句Q3和Q4来说,放入join_buffer的行数是一样的,但Q3只需要放入表t1的一个字段,相对来说此时t1是小表,应该为驱动表. 在决定哪个表为驱动表时,应该要两个表按照各自的条件过滤,然后计算参与join的各个字段的总数据量,数据量小的那个表就是小表,应该为驱动表. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:5:0","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"优化 ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:6:0","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"Multi-Range Read(MRR)优化 MRR优化的主要目的是使用顺序读盘. 在查询过程中使用非主键索引时需要进行回表操作去主键索引树中获取相关字段的值.在非主键索引树中获取的主键ID不是有序的,当循环回表时会触发主键索引树的磁盘随机读取,而这是最耗时的操作. 而MRR优化过程: 从非主键索引中把符合的主键ID先全部读取出来,放入read_rnd_buffer中 把read_rnd_buffer中的主键ID排序 根据排序后的主键ID去主键索引树中查找记录 read_rnd_buffer是由参数read_rnd_buffer_size控制的,默认为256k mysql\u003eshowvariableslike'read_rnd%';+----------------------+--------+ |Variable_name|Value|+----------------------+--------+ |read_rnd_buffer_size|262144|+----------------------+--------+ 1rowinset(0.01sec) 当read_rnd_buffer满时,就会执行步骤2和3,然后清空read_rnd_buffer,之后继续找非主键索引的下个记录,并继续循环. MRR能提升性能的核心在于,在非主键索引上是一个范围查询,可以有足够的主键ID,这样排序后,再去主键索引查找数据,才能体现出顺序性的优势. 若想稳定地使用MRR优化,需要设置set optimizer_switch=\"mrr_cost_based=off\"(官方文档:优化器在判断消耗时,会更倾向不使用MRR,把mrr_cost_based设置为off,就是固定使用MRR). /*开启MRR*/mysql\u003esetoptimizer_switch=\"mrr_cost_based=off\";QueryOK,0rowsaffected(0.00sec)/*使用了MRR*/mysql\u003eexplainselect*fromt2wherea\u003e=100anda\u003c=200;+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------+ |1|SIMPLE|t2|NULL|range|a|a|5|NULL|101|100.00|Usingindexcondition;UsingMRR|+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+----------------------------------+ 1rowinset,1warning(0.04sec) 需要注意,当未使用MRR优化时,查询返回的记录是按照索引a来排序的,但使用了MRR优化时,返回的记录若read_rnd_buffer未满时是按照主键ID来排序的;若满记录就不是有序的了. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:6:1","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"Batched Key Access(BKA) BKA算法是基于MRR对NLJ算法做的优化. 从驱动表中取出数据放入到join_buffer中,然后按照被驱动表的索引键排序,顺序去被驱动表查找相关数据.相比于NLJ的过程,BKA使用了MRR优化的思路,在被驱动表中是顺序读取,避免随机读取. 要启用BKA,需要设置参数. /*启动BKA,要先启动MRR*/mysql\u003esetoptimizer_switch='mrr=on,mrr_cost_based=off,batched_key_access=on';QueryOK,0rowsaffected(0.00sec)/*NLJ已被优化为BKA*/mysql\u003eexplainselect*fromt1joint2ont1.a=t2.a;+----+-------------+-------+------------+------+---------------+------+---------+----------+------+----------+----------------------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+----------+------+----------+----------------------------------------+ |1|SIMPLE|t1|NULL|ALL|a|NULL|NULL|NULL|100|100.00|Usingwhere||1|SIMPLE|t2|NULL|ref|a|a|5|web.t1.a|1|100.00|Usingjoinbuffer(BatchedKeyAccess)|+----+-------------+-------+------------+------+---------------+------+---------+----------+------+----------+----------------------------------------+ 2rowsinset,1warning(0.00sec) ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:6:2","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"BNL算法的性能问题 基于BNL算法时,由于join_buffer容量有限,如果驱动表是大表时需要进行分段处理,这样会导致被驱动表进行多次扫描,如果被驱动表是一个大的冷数据库,除了导致IO压力大外,还有什么其它影响? 从磁盘读取数据后是放入到Buffer Pool里的,而Buffer Pool的容量也是有限的,当空间不够时会淘汰一些老数据页来容纳新的数据页,淘汰算法InnoDB引擎使用的是变种LRU算法. 把Buffer Pool按照3:5的比例划分为old和young区域,从磁盘读取的数据先放入到old区域,如果超过1秒该数据页还在old区域且还有被访问就会被移入到young区域,在old和young区域都是LRU算法来淘汰老数据的. 若被驱动表的数据量小于old区域(即整个表能全部放入到old区域),由于需要多次扫描被驱动表,而时间间隔可能超过1秒,这会导致这部分数据会被移入到young区域. 若被驱动表的数据量超过了old区域,在遍历的过程中会涉及到淘汰old区域的数据来存放该表的数据,这样会导致业务正常访问的数据没有机会放入到young区域(没有超过1秒就被淘汰出old了). 以上两种情况都会对Buffer Pool的正常运作产生影响. BNL算法对系统的影响主要包括: 可能会多次扫描被驱动表,占用磁盘IO. 判断join条件需要执行M*N次对比,如果大表就会占用CPU资源. 可能会导致Buffer Pool的热数据被淘汰,影响内存命中率. 优化策略: 在被驱动表建索引,把BNL转化为BKA. 若被驱动表不适合加索引,可使用临时表(create temporary),把被驱动表满足条件的记录放入临时表,给临时表加索引,仍然转化为BKA. hash join,但目前MySQL不支持,在join_buffer中支持hash,被驱动表的数据可以通过hash查找能快速定位,而不用再去执行M*N次比对了. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-join/:6:3","tags":["mysql"],"title":"MySQL的join分析","uri":"/2020/11/24/mysql-join/"},{"categories":["MySQL"],"content":"B+树 基于N叉树(每个父节点有N个子节点,子节点的值从左到右按照从小到大的顺序排列),非叶子节点只存储索引值,叶子节点储存索引值和数据,所有叶子节点采用链表串起来. InnoDB采用的就是B+树,表的数据都是以索引的形式存放的,称为索引组织表.针对每个InnoDB引擎的表,都必须存在索引,当建表时没有显示声明索引,InnoDB会默认创建,如下表t1. /*mysql版本*/Serverversion:5.7.27-logMySQLCommunityServer(GPL)/*创建表t,没有显示声明索引*/CREATETABLE`t`(`id`int(11)NOTNULL,`city`varchar(16)NOTNULL,`name`varchar(16)NOTNULL,`age`int(11)NOTNULL,`addr`varchar(128)DEFAULTNULL)ENGINE=InnoDB; 由于数据全部存储在叶子节点上,则每次查询都必须到叶子节点,其查找时间复杂度稳定,只和树高有关. 假如树高为4,N为1200,则这棵B+树能存储17亿多的数据量,整棵树的高度能维持在非常小的范围内,同时在内存里缓存前面若干层的节点,则极大的降低访问磁盘的次数,提高查询效率. B+树的树高是由数据页大小和索引列的大小来决定的,而数据页大小是由参数innodb_page_size(默认为16k)的值决定;比如以一个bigint类型的字段为主键,则索引的大小为8字节(bigint大小)+6字节(指针大小,mysql里指针占6字节).则N的大小为16k/14字节,大约为1170.索引字段越小一个数据页能存放的数据就越多,N就会越大. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:1:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"索引分类 ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:2:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"主键索引 也被称为聚簇索引(clustered index),每个表都必须有且仅有一个主键索引(当没有显示声明主键索引时,InnoDB会默认创建一个以rowid为主键的索引),索引对应的字段的值唯一且不允许有空值. 主键索引的B+树的叶子节点的数据页里存放的是整行数据,且只有在主键索引的B+树中才有完整的数据. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:2:1","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"非主键索引 又叫二级索引(secondary index),可以有多个,可分为唯一索引(索引对应的字段的值必须是唯一的,但允许为空值)和普通索引(对字段没什么限制,可重复可为空),对应的B+树的叶子节点的数据页里存放的是对应的主键值. 需要注意:当使用二级索引查询时只能获取到主键值和索引所对应列的值,要获取其它字段的值就只能根据主键值再去主键索引中查找,这个操作称为回表. /*mysql版本*/Serverversion:5.7.27-logMySQLCommunityServer(GPL)/*创建表t*/CREATETABLE`t`(`id`int(11)NOTNULLAUTO_INCREMENT,`city`varchar(16)NOTNULL,`name`varchar(16)NOTNULL,`age`int(11)NOTNULL,`addr`varchar(128),PRIMARYKEY(`id`),KEY`city`(`city`))ENGINE=InnoDB/*根据普通索引city来查询,此时需要进行回表操作,来获取其它字段的值*/mysql\u003eselect*fromt1wherecity='hangzhou';/*根据主键索引来查询,直接在主键索引B+树上查找*/mysql\u003eselect*fromt1whereid=1; ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:2:2","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"索引维护 索引必然是有序的.当做增删改操作时,必须要进行必要的维护操作,来保持索引的有序性.在InnoDB中,读写都是以数据页为单位的(默认为16k),数据都是存放在数据页里的,当插入新值时为了保持索引的有序性,可能要把新值插入到数据页的中间位置,但如果此时数据页已满时会怎么样? 页分裂:当数据页已满时,引擎会申请一个新的数据页,然后挪动原数据页的一部分数据到新的数据页中.在这种情况下,插入性能是会受到影响的.另外数据页的空间利用率也会降低大约50%. 同理在删除数据时,有可能引发页合并(有分裂就有合并,是分裂的逆过程). 注意: 当新数据要插入到某个数据页的首位置,而此时该数据页已满,为了避免页分裂,会优先去找前一个数据页是否还有空余,若有就把新数据插入到前一个数据页的末尾位置. 当新数据要插入到某个数据页的尾位置,而此时该数据页已满,为了避免页分裂,会优先去找后一个数据页是否还有空余,若有就把新数据插入到后一个数据页的首位置. 为什么一般建议采用自增ID为主键? 自增ID自带有序性,每次插入都是追加操作,不涉及到挪动其它记录,也不会触发叶子节点的分裂. 每个二级索引的叶子节点上都是主键的值,而自增ID若是整型(int)为4字节,若是长整型(bigint)为8字节,主键长度越小,二级索引的叶子节点就越小,占用的空间也就越小. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:3:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"索引优化 ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:4:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"覆盖索引 为什么需要回表,是因为要去主键索引树中获取相应字段的值.但如果想要查询的字段是索引列的一部分列? /*仍以表t为例,先新加个索引,包含city和name字段*/mysql\u003ealtertabletaddindexcity_name(city,name);/*通过city来查询name*/mysql\u003eexplainselectnamefromtwherecity='hangzhou';+----+-------------+-------+------------+------+----------------+-----------+---------+-------+------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+----------------+-----------+---------+-------+------+----------+-------------+ |1|SIMPLE|t|NULL|ref|city,city_name|city_name|50|const|5505|100.00|Usingindex|+----+-------------+-------+------------+------+----------------+-----------+---------+-------+------+----------+-------------+ 1rowinset,1warning(0.00sec)/*通过city来查询name和age*/mysql\u003eexplainselectname,agefromtwherecity='hangzhou';+----+-------------+-------+------------+------+----------------+------+---------+-------+------+----------+-------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+----------------+------+---------+-------+------+----------+-------+ |1|SIMPLE|t|NULL|ref|city,city_name|city|50|const|5505|100.00|NULL|+----+-------------+-------+------------+------+----------------+------+---------+-------+------+----------+-------+ 1rowinset,1warning(0.00sec) 第一个查询采用的是索引city_name,Extra为Using index,表示是覆盖索引,需要查询的字段已包含在索引列中,不需要回表. 第二个查询采用的是索引city,Extra为NULL,则需要进行回表,去主键索引中获取相应字段的值. 覆盖索引能减少回表次数,即减少树的搜索次数,可显著提升性能. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:4:1","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"前缀索引 联合索引的最左N个字段. 字符串索引的最左M个字符. 联合索引的值是按照索引定义里出现的字段顺序来排列的,比如表t的索引city_name是按照字段city和name的顺序排列的. 还是以表t为例,创建由字段city、name和age组成的联合索引. /*创建索引*/mysql\u003ealtertabletaddindexcity_name_age(city,name,age);/*根据字段city和name来查询,使用了索引city_name_age*/mysql\u003eexplainselect*fromtwherecity='hangzhou'andname='zhou';+----+-------------+-------+------------+------+---------------+---------------+---------+-------------+------+----------+-------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+---------------+---------+-------------+------+----------+-------+ |1|SIMPLE|t|NULL|ref|city_name_age|city_name_age|100|const,const|1|100.00|NULL|+----+-------------+-------+------------+------+---------------+---------------+---------+-------------+------+----------+-------+ 1rowinset,1warning(0.00sec) 联合索引的字段顺序如何安排? 如果通过调整顺序,可以少维护一个索引,则这个顺序是需要优先考虑的. 索引空间的大小.比如表t对name和age字段添加索引,基于空间考虑,应该增加一个(name,age)的联合索引和一个(age)的单索引,这时因为字段name比字段age大. 字符串索引,适用于模糊查询. /*在字符串字段city上新建索引*/mysql\u003ealtertabletaddindexcity(city);/*模糊查询*/mysql\u003eexplainselect*fromtwherecitylike'h%';+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+ |1|SIMPLE|t|NULL|range|city|city|50|NULL|5505|100.00|Usingindexcondition|+----+-------------+-------+------------+-------+---------------+------+---------+------+------+----------+-----------------------+ 1rowinset,1warning(0.00sec) 可以看出like 'h%'模糊查询使用了索引city来快速查找.但如果是针对like '%h%'之类的模糊查询列? /*使用前后都模糊匹配模式,查询所有字段*/mysql\u003eexplainselect*fromtwherecitylike'%h%';+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+ |1|SIMPLE|t|NULL|ALL|NULL|NULL|NULL|NULL|43868|11.11|Usingwhere|+----+-------------+-------+------------+------+---------------+------+---------+------+-------+----------+-------------+ 1rowinset,1warning(0.00sec)/*使用前后都模糊匹配模式,查询id字段*/mysql\u003eexplainselectidfromtwherecitylike'%h%';+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+--------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+--------------------------+ |1|SIMPLE|t|NULL|index|NULL|city|50|NULL|43868|11.11|Usingwhere;Usingindex|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+--------------------------+ 1rowinset,1warning(0.00sec) 针对查询select * from t where city like '%h%',使用的是全表扫描. 针对查询select id from t where city like '%h%',使用的索引city,看rows字段也是全索引扫描,由于只需要获取字段id,故优化器认为顺序扫描索引city比主键索引的代价小. 针对索引,一种是通过索引来快速查找,而另一种是通过索引来顺序遍历. **注意:**针对字段city和age的联合索引,也是适用于city like 'h%'的模糊查询. /*在字段city和age上新建联合索引*/mysql\u003ealtertabletaddindexcity_age(city,age);/*利用联合索引的最左字段city的最左M个字符*/mysql\u003eexplainselect*fromtwherecitylike'hang%';+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+ |1|SIMPLE|t|NULL|range|city_age|city_age|50|NULL|5505|100.00|Usingindexcondition|+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+--------","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:4:2","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"索引下推 针对前缀索引中的联合索引(city和age),适用于city的模糊查询,那如果查询条件再加上age会如何? /*在city字段上模糊匹配,在age上精确匹配*/mysql\u003eexplainselect*fromtwherecitylike'hang%'andage=10;+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+ |1|SIMPLE|t|NULL|range|city_age|city_age|54|NULL|5505|10.00|Usingindexcondition|+----+-------------+-------+------------+-------+---------------+----------+---------+------+------+----------+-----------------------+ 1rowinset,1warning(0.00sec) 可以看到仍然会使用索引city_age,Extra字段里的Using index condition表示使用了索引下推.通过字段city快速定位记录后,再直接利用索引中age字段的值来进行过滤. 这是MySQL5.6版本引入的索引下推优化(index condition pushdown),可以在索引查找过程中,对索引包含的字段先做判断,直接过滤不满足条件的记录,减少回表次数,提高性能. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:4:3","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"索引选择 主要是针对普通索引和唯一索引的选择. 在选择的前提是要能满足业务的需求,比如业务上需要保证唯一性,但应用层并不一定能保证,需要数据库层面来保证,就必须选择唯一索引. 又比如身份证号,在业务应用层面已经保证了唯一性,并不需要数据库层面来保证,就可以选择普通索引,当然也可以选择唯一索引. 在普通索引和唯一索引都能保证业务正确的前提下,如何评估该选择普通索引还是唯一索引列?下面主要从查询和更新两方面来进行评估. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:5:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"查询 仍然以表t为例,针对字段name,分别创建唯一索引和普通索引时,其查找过程是如何的?针对查询语句:select * from t where name='guanguan';. 对于普通索引而言,查找到第一条满足条件的记录后,需要继续查找下一条记录,直到碰到第一个不满足条件的记录. 对于唯一索引而言,由于索引定义了唯一性,查找到第一个满足条件的记录后,就会停止继续检索. 上述两个查找过程的性能差异,几乎微乎其微.InnoDB读写都是基于数据页的,当要去磁盘查找某一记录时,是会把该记录所在的数据页整体读入内存的.对于普通索引的继续查找只是一次指针寻找和一次判断,对于现代CPU来说,影响微乎其微. 可见对于查询来说,普通索引和唯一索引之间的差异几乎微乎其微. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:5:1","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"更新 在比较更新的差异之前，先引入概念change buffer,这也是InnoDB对更新操作所作出的优化. 当需要更新一个数据页时,如果该数据页在内存中,就直接更新内存的数据.但如果该数据页不在内存中,在不影响数据一致性的前提下,引擎会把这些更新操作缓存到change buffer中,这样就暂时不需要把磁盘数据读入到内存中,减少了磁盘随机读取,提升了更新性能. 在必要时,会把change buffer中的数据应用到原始数据页中,得到新的数据页,这个过程称为merge. 那change buffer中的数据什么时候应用到原始数据页中列? 当原始数据页被加载到Buffer pool时,会执行merge. 后台有线程定期会执行merge. 当MySQL正常关闭时,也会执行merge. change buffer优化的前提是要不影响数据一致性,而对于唯一索引,由于需要判断是否唯一,就必须先把磁盘数据加载到内存中来进行判断,由于数据页已经在内存中了,直接更新内存就行了,所以该优化对唯一索引是不起作用的.因此change buffer优化只是对普通索引有效. change buffer优化实际上是延迟了更新操作,当一个数据页上对应的更新操作在change buffer中越多,其收益是越大的.但如果针对先更新然后马上就查询的场景,这个优化可能会起到反作用.该场景下随机IO的次数不会少,反而增加了change buffer的维护代价. change buffer用的是buffer pool里的内存,其大小可以通过参数innodb_change_buffer_max_size来动态设置,表示change buffer的大小最多只能占用buffer pool的比例. 另外是否启用change buffer或对哪些操作启用,是通过参数innodb_change_buffering来控制的,该参数默认是all,有以下几种选择: all: 默认值.开启buffer inserts、delete-marking operations、purges nono: 不开启 inserts: 只对buffer insert操作(对insert和update有效)开启 deletes: 只对delete-marking操作开启 changes: 只对buffer insert和delete-marking操作开启 purges: 只对在后台执行的物理删除操作开启. mysql\u003eshowvariableslike'%innodb_change_buffer%';+-------------------------------+-------+ |Variable_name|Value|+-------------------------------+-------+ |innodb_change_buffer_max_size|25||innodb_change_buffering|all|+-------------------------------+-------+ 2rowsinset(0.00sec) change buffer是会持久化的,保存在系统表空间(ibdata1) 数据库空闲时,后台有线程定时持久化 当数据库缓冲池空间不够时 当数据库正常关闭时 redo log写满时 注意: change buffer也是通过B+树来存储的,键是表空间ID. change buffer的变更也是会记录redo日志的(既然记录到redo中了,为什么还要持久化到系统表空间?). change buffer节省的主要是随机读磁盘的IO消耗. 再来谈谈更新时,普通索引和唯一索引的差异: 当数据页在内存中时,唯一索引会校验下唯一性,通过后再更新内存;而普通索引直接更新内存即可. 当数据页不在内存中时,唯一索引会先通过磁盘随机读把数据加载到内存中,然后再校验唯一性,通过后再更新内存;而普通索引是直接把更新操作记录到change buffer中. 从更新来说,普通索引更具有优势. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:5:2","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"为什么会选错索引? ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:6:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"现象分析 当一个表有多个索引,查询究竟走哪个索引?优化器是通过什么因素来决定使用哪个索引的?先看个例子. /*创建表t1*/CREATETABLE`t1`(`id`int(11)NOTNULLAUTO_INCREMENT,`a`int(11)DEFAULTNULL,`b`int(11)DEFAULTNULL,PRIMARYKEY(`id`),KEY`a`(`a`),KEY`b`(`b`))ENGINE=InnoDB;/*创建存储过程插入数据*/delimiter;;createprocedureidata()begindeclareiint;declarejint;seti=1;while(i\u003c=10)dosetj=1;starttransaction;while(j\u003c=10000)doinsertintot1(a,b)values(j+(i-1)*10000,j+(i-1)*10000);setj=j+1;endwhile;commit;seti=i+1;endwhile;end;;delimiter;/*调用存储过程,插入数据到表中*/callidata();/*查询使用索引a,符合预期*/mysql\u003eexplainselect*fromt1whereabetween10000and20000;+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |1|SIMPLE|t1|NULL|range|a|a|5|NULL|10001|100.00|Usingindexcondition|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ 1rowinset,1warning(0.00sec) 上面例子中,查询使用索引a,符合预期,说明优化器选择了正确的索引. 再来看看另一种场景,事务隔离级别为REPEATABLE-READ. session A session B start transaction with consistent snapshot; delete from t1; call idata(); explain select * from t1 where a between 10000 and 20000; commit; session A: mysql\u003eexplainselect*fromt1whereabetween10000and20000;+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |1|SIMPLE|t1|NULL|range|a|a|5|NULL|10001|100.00|Usingindexcondition|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ 1rowinset,1warning(0.00sec)mysql\u003estarttransactionwithconsistentsnapshot;QueryOK,0rowsaffected(0.00sec) session B: mysql\u003edeletefromt1;QueryOK,100000rowsaffected(0.72sec)mysql\u003ecallidata();QueryOK,0rowsaffected(2.93sec)mysql\u003eexplainselect*fromt1whereabetween10000and20000;+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ |1|SIMPLE|t1|NULL|ALL|a|NULL|NULL|NULL|100015|37.11|Usingwhere|+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1rowinset,1warning(0.00sec) session B此时选择的是全表扫描,开启慢查询日志,再来看看查询的具体信息. mysql\u003esetlong_query_time=0;QueryOK,0rowsaffected(0.00sec)mysql\u003eselect*fromt1whereabetween10000and20000;mysql\u003eselect*fromt1forceindex(a)whereabetween10000and20000;/*慢查询日志,设置慢查询时间为0秒,即打印所有语句*/setlong_query_time=0;#Time:2020-06-16T06:28:51.528132Z#User@Host:web[web]@localhost[]Id:57#Query_time:0.038091Lock_time:0.000289Rows_sent:10001Rows_examined:100000SETtimestamp=1592288931;select*fromt1whereabetween10000and20000;#Time:2020-06-16T06:29:01.998170Z#User@Host:web[web]@localhost[]Id:57#Query_time:0.015590Lock_time:0.000345Rows_sent:10001Rows_examined:10001SETtimestamp=1592288941;select*fromt1forceindex(a)whereabetween10000and20000; 第一个查询语句的Rows_examined为100000行,走的是全表扫描,耗时38毫秒. 第二个查询语句的Rows_examined为10001行,走的是索引a,耗时15毫秒. 很显然,针对第一个查询语句,优化器选错了索引. 那么到底优化器选择索引的逻辑是什么列? 优化器选择索引的目的,是找到一个最优执行方案,用最小的代价去执行语句. 在数据库里面,扫描行数是影响执行代价的因素之一.扫描行数越少,意味这访问磁盘数据的次数越少,消耗CPU资源越少.但扫描行数并不是唯一的判断标准,优化器还会结合是否使用临时表、是否排序等因素进行综合判断. 那扫描行数是怎么判断的? 在执行语句之前,数据库并不能直到满足条件的记录到底有多少条?只能根据统计信息来估算.这个统计信息就是索引的区分度.一个索引上不同的值越多,这个索引的区分度就越好.一个索引上不同的值得个数,称之为基数(cardinality),基数越大,说明索引的区分度越好. /*查看索引基数*/mysql\u003eshowindexfromt1;+-------+------------+----------+--------------+-------------+---------","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:6:1","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"选择异常的处理 在语句中使用force index强制选择一个索引. 考虑修改语句,引导MySQL使用我们期望的索引,把上面例子中的order by b limit 1换成order by b, a limit 1试试,看执行计划是怎样的. 在某些场景下,可以新建一个更适合的索引来供优化器选择,或者删掉误用的索引. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:6:2","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"思考 第一个例子中是通过session A和session B的配合,来复现选错了索引的情况,如果单独执行session B的语句,会出现什么情况? mysql\u003edeletefromt1;QueryOK,100000rowsaffected(0.55sec)mysql\u003ecallidata();QueryOK,0rowsaffected(2.41sec)mysql\u003eexplainselect*fromt1whereabetween10000and20000;+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ |1|SIMPLE|t1|NULL|range|a|a|5|NULL|10001|100.00|Usingindexcondition|+----+-------------+-------+------------+-------+---------------+------+---------+------+-------+----------+-----------------------+ 1rowinset,1warning(0.00sec) 索引选择正确,为什么会出现这种差异?session A是马上开启一个一致性读视图,并没有其它操作,是因为什么造成了统计信息的错误? 当session A开启了一致性视图之后,session B的删除是不能直接把数据删除的.这样每一行数据会存在两个版本(MVCC机制),索引a上的数据其实是有两份的.但对于使用主键索引时,rows是直接按照表的行数来估计的,而表的行数,优化器是直接用show table status中的Rows值. mysql\u003eshowtablestatuslike't1'\\G***************************1.row***************************Name:t1Engine:InnoDBVersion:10Row_format:DynamicRows:100256Avg_row_length:36Data_length:3686400Max_data_length:0Index_length:3178496Data_free:15728640Auto_increment:300001Create_time:2020-06-1614:24:46Update_time:2020-06-1615:10:39Check_time:NULLCollation:utf8_unicode_ciChecksum:NULLCreate_options:Comment:1rowinset(0.00sec) ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:6:3","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"如何给字符串字段加索引? 在字符串字段上加索引,是可以指定只取前几个字节的,如下: /*city全字段加索引*/mysql\u003ealtertabletaddindexindex1(name);QueryOK,0rowsaffected(0.48sec)Records:0Duplicates:0Warnings:0/*city字段前6个字节加索引*/mysql\u003ealtertabletaddindexindex2(name(5));QueryOK,0rowsaffected(0.48sec)Records:0Duplicates:0Warnings:0 针对索引index2,占用的空间更小,这是前缀索引的优势,但带来的损失是可能会增加额外的记录扫描次数. /*先插入数据*/mysql\u003einsertintot(city,name,age)values('hangzhou','zhangyi',20);mysql\u003einsertintot(city,name,age)values('hangzhou','zhanger',20);mysql\u003einsertintot(city,name,age)values('hangzhou','zhangsan',20);mysql\u003einsertintot(city,name,age)values('hangzhou','zhangsi',20);mysql\u003einsertintot(city,name,age)values('hangzhou','zhangwu',20);/*查询*/mysql\u003eselect*fromtwherename='zhangsi';+--------+----------+---------+-----+------+ |id|city|name|age|addr|+--------+----------+---------+-----+------+ |128004|hangzhou|zhangsi|20|NULL|+--------+----------+---------+-----+------+ 1rowinset(0.07sec) 查询如果使用的是索引index1 在索引树上能直接定位到索引值为zhangsi的记录,取得主键ID的值 然后回表查询整行记录 继续取下一个索引值,发现已不满足条件,循环结束. 查询如果使用的是索引index2 在索引树上先定位到zhang的记录,取主键ID的值 然后回表获取字段name的值发现不满足条件,丢弃 继续取下一个索引值,值仍为zhang,取出主键ID的值 重复步骤2,若满足条件就把行放入记录集中 直到索引值不为zhang,循环结束. 使用索引index2,总共会有5次回表,扫描了5次.使用前缀索引,导致查询语句读数据的次数变多了.如果索引index2设置为name(6)列?此时只需要扫描2次了. 使用前缀索引,定义好长度,就可以做到既节省空间,又不用额外增加太多的查询成本. 建立索引时要关注索引的区分度,区分度越高越好.可以通过统计索引上有多少个不同的值来判断要使用多长的前缀.从如下sql可以看出,可以选用name(4)来作为索引. /*统计各个长度的前缀数量*/mysql\u003eselectcount(distinctname)asL,count(distinctleft(name,4))asL4,count(distinctleft(name,5))asL5,count(distinctleft(name,6))asL6,count(distinctleft(name,7))asL7fromt;+-----+-----+-----+-----+-----+ |L|L4|L5|L6|L7|+-----+-----+-----+-----+-----+ |149|145|145|148|149|+-----+-----+-----+-----+-----+ 1rowinset(0.52sec) 当使用前缀索引时,会导致覆盖索引无效(无论前缀使用多少位). 以select id, name from t where name = 'zhangsi'为例,如果使用索引index2(就算使用name(16)为索引),获取主键ID的值后还必须回表,然后判断name的值是否满足条件,这就导致覆盖索引无效了. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:7:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"索引失效 ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:8:0","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"条件字段函数操作 如果在where条件查询的字段上使用函数为如何? /*创建交易流水表*/CREATETABLE`tradelog`(`id`int(11)NOTNULL,`tradeid`varchar(32)DEFAULTNULL,`operator`int(11)DEFAULTNULL,`t_modified`datetimeDEFAULTNULL,PRIMARYKEY(`id`),KEY`tradeid`(`tradeid`),KEY`t_modified`(`t_modified`))ENGINE=InnoDBDEFAULTCHARSET=utf8mb4/*整个表模拟了10万行数据*/mysql\u003eselectcount(*)fromtradelog;+----------+ |count(*)|+----------+ |100000|+----------+ 1rowinset(0.02sec)/*在字段t_modified使用month函数*/mysql\u003eexplainselectcount(*)fromtradelogwheremonth(t_modified)=7;+----+-------------+----------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+----------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+ |1|SIMPLE|tradelog|NULL|index|NULL|t_modified|6|NULL|100194|100.00|Usingwhere;Usingindex|+----+-------------+----------+------------+-------+---------------+------------+---------+------+--------+----------+--------------------------+ 1rowinset,1warning(0.00sec) 查询语句是统计所有7月份的流水,从Extra可以看出是使用了索引t_modified,从rows看是索引全扫描.使用函数month之后,获取到的值并不是有序的,所以无法利用索引的有序性来快速查找,只能是全索引扫描. 为了能利用索引的快速定位能力,就需要把上面的sql改造成按照字段本身的范围查询 /*按照范围来查询,利用索引特性快速定位*/mysql\u003eexplainselectcount(*)fromtradelogwhere-\u003e(t_modified\u003e='2016-07-01'andt_modified\u003c'2016-08-01')or-\u003e(t_modified\u003e='2017-07-01'andt_modified\u003c'2017-08-01')or-\u003e(t_modified\u003e='2018-07-01'andt_modified\u003c'2018-08-01');+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+ |1|SIMPLE|tradelog|NULL|range|t_modified|t_modified|6|NULL|5605|100.00|Usingwhere;Usingindex|+----+-------------+----------+------------+-------+---------------+------------+---------+------+------+----------+--------------------------+ 1rowinset,1warning(0.00sec) 再来看看一个非常简单的加法操作,也是全表扫描,无法利用主键索引. mysql\u003eexplainselect*fromtradelogwhereid+1=1000;+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ |1|SIMPLE|tradelog|NULL|ALL|NULL|NULL|NULL|NULL|100194|100.00|Usingwhere|+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1rowinset,1warning(0.00sec) 在索引字段上使用函数会导致无法利用索引的快速定位能力,不管是什么函数,都会导致优化器认为无法使用索引快速定位. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:8:1","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"隐式类型转换 mysql\u003eexplainselect*fromtradelogwheretradeid=6981747220;+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ |1|SIMPLE|tradelog|NULL|ALL|tradeid|NULL|NULL|NULL|100194|10.00|Usingwhere|+----+-------------+----------+------------+------+---------------+------+---------+------+--------+----------+-------------+ 1rowinset,3warnings(0.01sec) 字段tradeid上是有索引的,而上面的语句直接走的主键全表扫描,为什么没有使用tradeid索引? 在表中字段tradeid的定义是varchar(32),为字符串类型.而where里等号右边是个整数,当两边类型不一致时,MySQL是如何处理的? 当字符串与数字进行比较时,是把字符串转化为数字还是把数字转换为字符串? mysql\u003eselect\"10\"\u003e9;+----------+ |\"10\"\u003e9|+----------+ |1|+----------+ 1rowinset(0.00sec) 上面select \"10\" \u003e 9返回1,说明是把字符串转换为数字了. 实际上语句select * from tradelog where tradeid = 6981747220相当于被转化为了select * from tradelog where CAST(tradeid AS signed int) = 6981747220,这条语句就触发了上面说的:对索引字段做函数操作,优化器放弃走树搜索功能. ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:8:2","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["MySQL"],"content":"隐式字符编码转换 /*新建交易明细表*/CREATETABLE`trade_detail`(`id`int(11)NOTNULL,`tradeid`varchar(32)DEFAULTNULL,`trade_step`int(11)DEFAULTNULL,`step_info`varchar(32)DEFAULTNULL,PRIMARYKEY(`id`),KEY`tradeid`(`tradeid`))ENGINE=InnoDBDEFAULTCHARSET=utf8/*关联查询*/mysql\u003eexplainselect*fromtradelogl,trade_detaildwhered.tradeid=l.tradeidandl.id=100002;+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ |1|SIMPLE|l|NULL|const|PRIMARY,tradeid|PRIMARY|4|const|1|100.00|NULL||1|SIMPLE|d|NULL|ALL|NULL|NULL|NULL|NULL|11|100.00|Usingwhere|+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ 2rowsinset,1warning(0.00sec) 驱动表为表tradelog,被驱动表为表trade_detail.查询对表tradelog是走的主键索引,但表trade_detail却是全表扫描,但在其字段tradeid是存在索引的,为何? 对比发现,表tradelog的字符集是utf8mb4,而表trade_detail的字符集是utf8,字符集不一样时查询时如何处理的?字符集utf8mb4是utf8的超集,MySQL会把utf8字符串转换为utf8mb4字符集,然后再做比较.查询语句会转化为如下: mysql\u003eexplainselect*fromtradelogl,trade_detaildwhereconvert(d.tradeidUSINGutf8mb4)=l.tradeidandl.id=100002;+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ |1|SIMPLE|l|NULL|const|PRIMARY,tradeid|PRIMARY|4|const|1|100.00|NULL||1|SIMPLE|d|NULL|ALL|NULL|NULL|NULL|NULL|11|100.00|Usingwhere|+----+-------------+-------+------------+-------+-----------------+---------+---------+-------+------+----------+-------------+ 2rowsinset,1warning(0.01sec) 转化后的语句对表trade_detail的索引上的字段做了函数操作,此时优化器是会放弃树搜索功能的,就导致做了全表扫描. 再来看看如下语句: mysql\u003eexplainselectl.operatorfromtradelogl,trade_detaildwhered.tradeid=l.tradeidandd.id=4;+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ |1|SIMPLE|d|NULL|const|PRIMARY|PRIMARY|4|const|1|100.00|NULL||1|SIMPLE|l|NULL|ref|tradeid|tradeid|131|const|1|100.00|NULL|+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ 2rowsinset,1warning(0.00sec) 驱动表为表trade_detail,被驱动表为表tradelog.这个语句里两个表都有用到索引,但驱动表用的是主键索引,而被驱动表用的是索引tradeid.语句在转化时是针对驱动表的tradeid字段,所以被驱动表可以用上索引tradeid. 针对字符集不一样的情况下的优化: 修改表结构,把字符集设置成一样. 修改sql语句,可以主动把驱动表的字符集修改为被驱动表的字符集,使得可以使用被驱动表的索引. mysql\u003eexplainselect*fromtradelogl,trade_detaildwhered.tradeid=convert(l.tradeidusingutf8)andl.id=100002;+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ |id|select_type|table|partitions|type|possible_keys|key|key_len|ref|rows|filtered|Extra|+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ |1|SIMPLE|l|NULL|const|PRIMARY|PRIMARY|4|const|1|100.00|NULL||1|SIMPLE|d|NULL|ref|tradeid|tradeid|99|const|4|100.00|NULL|+----+-------------+-------+------------+-------+---------------+---------+---------+-------+------+----------+-------+ 2rowsinset,1warning(0.01sec) ","date":"2020-11-24","objectID":"/2020/11/24/mysql-index/:8:3","tags":["mysql"],"title":"MySQL索引(InnoDB引擎)","uri":"/2020/11/24/mysql-index/"},{"categories":["Golang"],"content":"主要包含基本的内建类型(布尔类型、数值类型和字符串类型)和复合类型(array、slice、map、channel、function、struct、interface) ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:0:0","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"基本数据类型 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:0","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"布尔类型 类型标记为bool，值为true/false，零值为false，值类型，可定义为常量 // 变量定义的几种方式 var bflag bool bflag = true var bflag1 bool = true var bflag2 = true // 短变量声明,只能用于函数内部 bflag3 := true ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:1","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"整数类型 类型标记为int/uint、int8/uint8、int16/uint16、int32/uint32、int64/uint64、byte、rune，零值为0，值类型，可定义为常量 标记符 说明 int/uint 有符号/无符号整数,依赖于CPU平台机器字大小,32或64bit int8/uint8 有符号/无符号整数,8bit int16/uint16 有符号/无符号整数,16bit int32/uint32 有符号/无符号整数,32bit int64/uint64 有符号/无符号整数,64bit byte 等价于uint8,一般用于强调数值是一个原始的数据而不是一个小的整数 rune 等价于int32,表示一个Unicode码点 其中有符号整数采用2的补码形式表示，也就是最高bit位用来表示符号位，一个n-bit的有符号数的值域是从$-2^{n-1}$到$2^{n-1}-1$。无符号整数的所有bit位都用于表示非负数，值域是0到$2^n-1$。例如，int8类型整数的值域是从-128到127，而uint8类型整数的值域是从0到255 rune专门用来存储Unicode编码的单个字符，有5种表示方式： 该rune字面量所对应的字符，比如’a'、'-'，这个字符必须是Unicode编码规范所支持的 使用“\\x”为前导后跟2位十六进制数，表示宽度为1字节 使用“\\”为前导后跟3位八进制数，表示的范围与上一个表示法相同 使用“\\u”为前导后跟4位十六进制数，表示宽度为2字节的值 使用“\\U”为前导后跟8位十六进制数，表示宽度为4字节的值 还支持一类特殊的字符序列—-转义符 Go语言中关于算术运算、逻辑运算和比较运算的二元运算符，它们按照优先级递减的顺序排列： * / % \u003c\u003c \u003e\u003e \u0026 \u0026^ + - | ^ == != \u003c \u003c= \u003e \u003e= \u0026\u0026 || 整数的bit位操作符 \u0026 位运算 AND | 位运算 OR ^ 位运算 XOR \u0026^ 位清空 (AND NOT) \u003c\u003c 左移 \u003e\u003e 右移 注意： ++/- -只能后置，且是语句不是表达式，不能进行赋值，即i++是合法，++i和j=i++都是非法的 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:2","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"浮点数类型 类型标记为float32/float64，零值为0，值类型，可定义为常量 浮点数的范围极限值可以在math包找到。常量math.MaxFloat32表示float32能表示的最大数值，大约是 3.4e38；对应的math.MaxFloat64常量大约是1.8e308。它们分别能表示的最小值近似为1.4e-45和4.9e-324。 一个float32类型的浮点数可以提供大约6个十进制数的精度，而float64则可以提供约15个十进制数的精度；通常应该优先使用float64类型，因为float32类型的累计计算误差很容易扩散，并且float32能精确表示的正整数并不是很大（注意：因为float32的有效bit位只有23个，其它的bit位用于指数和符号；当整数大于23bit能表达的范围时，float32的表示将出现误差）。 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:3","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"复数类型 类型标记为complex64/complex128，值类型，可定义为常量 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:4","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"字符串类型 类型标记为string，零值为\"\"，值类型，可定义为常量 一个字符串是一个不可改变的字节序列 str := \"hello\" str[0] = 'x' // 非法，字符串是只读的 字符串在Go语言内存模型中用一个2字长的数据结构表示。它包含一个指向字符串存储数据的指针和一个长度数据。因为string类型是不可变的，对于多字符串共享同一个存储数据是安全的。切分操作会得到一个新的2字长结构字符串，但是指向同一个字节序列，切分时不涉及内存分配或复制操作。 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:5","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"常量特别说明 常量只能是布尔类型、整数类型、浮点数类型、复数类型、字符串 常量生成器itoa，常量声明可以使用iota常量生成器初始化，它用于生成一组以相似规则初始化的常量，但是不用每行都写一遍初始化表达式 无类型常量： Go语言的常量有个不同寻常之处。虽然一个常量可以有任意有一个确定的基础类型，例如int或float64，或者是类似time.Duration这样命名的基础类型，但是许多常量并没有一个明确的基础类型。编译器为这些没有明确的基础类型的数字常量提供比基础类型更高精度的算术运算；你可以认为至少有256bit的运算精度。这里有六种未明确类型的常量类型，分别是无类型的布尔型、无类型的整数、无类型的字符、无类型的浮点数、无类型的复数、无类型的字符串。 通过延迟明确常量的具体类型，无类型的常量不仅可以提供更高的运算精度，而且可以直接用于更多的表达式而不需要显式的类型转换。 // 不需要类型转换 var x float32 = math.Pi var y float64 = math.Pi var z complex128 = math.Pi // 需要类型转换,Pi64定义了具体类型 const Pi64 float64 = math.Pi var x float32 = float32(Pi64) var y float64 = Pi64 var z complex128 = complex128(Pi64) ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:1:6","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"复合数据类型 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:0","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"数组(array) 固定长度的特定类型元素组成的序列，值类型 // 定义 var a [3]int b := [...]int{1,2,3} c := [3]int{1,2,3} // 含有100个元素的数组，最后一个元素被初始化为-1，其余的为0 d := [...]int{99: -1} // 支持切片操作 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:1","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"切片(slice) 变长的序列，序列中每个元素都有相同的类型，一个slice类型一般写作[]T，其中T代表slice中元素的类型。 零值为nil，引用类型 元素的底层存储结构为数组，一个slice由三个部分构成：指针、长度和容量。指针指向第一个slice元素对应的底层数组元素的地址，要注意的是slice的第一个元素并不一定就是数组的第一个元素。长度指目前slice中已有元素的数目；长度不能超过容量，容量指目前slice最多能存放的元素个数。内置的len和cap函数分别返回slice的长度和容量。 和数组不同的是，slice之间不能比较，因此我们不能使用==操作符来判断两个slice是否含有全部相等元素。不过标准库提供了高度优化的bytes.Equal函数来判断两个字节型slice是否相等（[]byte），但是对于其他类型的slice，我们必须自己展开每个元素进行比较。 一个零值的slice等于nil。一个nil值的slice并没有底层数组。如果你需要测试一个slice是否是空的，使用len(s) == 0来判断，而不应该用s == nil来判断 var s []int // len(s) == 0, s == nil s = nil // len(s) == 0, s == nil s = []int(nil) // len(s) == 0, s == nil s = []int{} // len(s) == 0, s != nil 内置的make函数创建一个指定元素类型、长度和容量的slice。容量部分可以省略，在这种情况下，容量将等于长度。 make([]T, len) make([]T, len, cap) 当调用内置的append函数向slice追加元素时，如果元素数量超过容量，会引发扩容操作，此时slice的指针所指向的数组会发生变更 package main import ( \"fmt\" ) func main() { // intslice为nil var intslice []int64 // intslice的长度和容量为4 intslice = append(intslice, 11, 22, 33, 44) fmt.Println(len(intslice), cap(intslice)) // output:4 4 // intslice会扩充，长度和容量变为8 intslice = append(intslice, 55, 66, 77, 88) fmt.Println(len(intslice), cap(intslice)) // output:8 8 // 通过切片操作赋值给is1，此时is1和intslice底层指向同一个数组 is1 := intslice[1:3:5] fmt.Println(len(is1), cap(is1)) // output: 2 4 fmt.Println(is1) // output: [22 33] // is1追加一个元素，长度未超过容量，不会引起扩容，此时修改is1中的元素会影响intslice is1 = append(is1, 99) fmt.Println(len(is1), cap(is1)) // output: 3 4 fmt.Println(is1) // output: [22 33 99] // intslice[3]也被修改为99了 fmt.Println(intslice) // output: [11 22 33 99 55 66 77 88] // 继续追加元素，超过了容量，引起扩容，is1和intslice此时底层指向不同的数组，对is1的操作不会影响intslice is1 = append(is1, 990, 991, 992) fmt.Println(len(is1), cap(is1)) // output: 6 8 fmt.Println(is1) // output: [22 33 99 990 991 992] // intslice并未被修改 fmt.Println(intslice) // output: [11 22 33 99 55 66 77 88] } 遍历切片时可以直接采用下标也可以采用for-range的方式 package main import ( \"fmt\" \"sync\" ) func main() { ss := []int32{1, 2, 3, 4, 5} // 针对切片的第一种遍历方式,直接采用下标访问 ssLen := len(ss) for i := 0; i \u003c ssLen; i++ { fmt.Println(ss[i]) } fmt.Println(\"------\") // 针对切片的第二种遍历方式,使用for-range,也是采用下标访问 for i := range ss { fmt.Println(ss[i]) } fmt.Println(\"------\") var wg sync.WaitGroup wg.Add(len(ss)) // 针对切片的第三种遍历方式,也是使用for-range,但采用索引+值的方式,下文中索引使用了_(忽略该参数) // 需要注意value是个局部变量,生命周期归属这个for循环 // 迭代时只会改变value所对应的值,value本身只会被声明一次,即在整个for循环内其地址是不会改变的 for _, value := range ss { /* 注意和下面的go func方式进行比较 此种方式在https://goplay.space/上一直输出的是5(即切片中的最后一个元素) 但在本地windows下用vscode输出的结果是变化的,有时全部输出5,有时输出4和5... 此种方式存在竞态 使用go tool vet 可以进行检测：loop variable value captured by func literal */ go func() { // 此种方式在vscode上会直接显示告警 loop variable value captured by func literal fmt.Println(value) wg.Done() }() } wg.Wait() fmt.Println(\"------\") wg.Add(len(ss)) for _, value := range ss { // 此种方式会把切片中的元素打印一遍 go func(i int32) { fmt.Println(i) wg.Done() }(value) } wg.Wait() fmt.Println(\"done\") } ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:2","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"字典(map) map在底层是用哈希表实现的，哈希表是一种巧妙并且实用的数据结构。它是一个无序的key/value对的集合，其中所有的key都是不同的，然后通过给定的key可以在常数时间复杂度内检索、更新或删除对应的value。 零值为nil，引用类型 一个map就是一个哈希表的引用，map类型可以写为map[K]V，其中K和V分别对应key和value。map中所有的key都有相同的类型，所有的value也有着相同的类型，但是key和value之间可以是不同的数据类型。其中K对应的key必须是支持==比较运算符的数据类型，所以map可以通过测试key是否相等来判断是否已经存在。虽然浮点数类型也是支持相等运算符比较的，但是将浮点数用做key类型则是一个坏的想法。 可以通过内置函数make或字面值创建map // 通过内置make来创建 ages := make(map[string]int) // 通过字面值创建,并初始化了2个元素 ages := map[string]int{ \"alice\": 31, \"charlie\": 34, } // 通过key的下标进行访问对应的value ages[\"alice\"] = 32 fmt.Println(ages[\"alice\"]) // \"32\" // 通过key来访问value时，若key不存在，也不会报错，而是会返回value对应的零值 fmt.Println(ages[\"bob\"]) // \"0\",bob并不存在于ages中,返回value的零值(0) age := ages[\"bob\"] // age=0 // 若key存在则ok为true;若key不存在则ok为false。可通过这种方式来判断key是不是存在 // 判断一个元素是否存在必须采用此种方式，不能通过比较零值 age, ok := ages[\"bob\"] // 通过内置delete函数来删除元素 delete(ages, \"alice\") // 空的map ages := map[string]int{} **注意：**向一个nil值的map存入元素将导致一个panic异常 map中的元素并不是一个变量，因此我们不能对map的元素进行取址操作。禁止对map元素取址的原因是map可能随着元素数量的增长而重新分配更大的内存空间，从而可能导致之前的地址无效。 _ = \u0026ages[\"bob\"] // compile error: cannot take address of map element 遍历map中全部的key/value对，可以使用range风格的for循环实现，和之前的slice遍历语法类似 // name和age对应map中的key、value，迭代顺序是不确定的 for name, age := range ages { fmt.Printf(\"%s\\t%d\\n\", name, age) } map的迭代顺序是不确定的，并且不同的哈希函数实现可能导致不同的遍历顺序。在实践中，遍历的顺序是随机的，每一次遍历的顺序都不相同。这是故意的，每次都使用随机的遍历顺序可以强制要求程序不会依赖具体的哈希函数实现。 hash结构中直接使用的Bucket数组，而不是Bucket*指针的数据，是一段连续的内存空间。 每个bucket中存放最多8个key/value对, 如果多于8个，那么会申请一个新的bucket，并将它与之前的bucket链起来(称为溢出链overflow)，溢出链的Bucket的空间是使用mallocgc分配的。 hash结构采用的可扩展哈希的算法。由hash值mod当前hash表大小决定某一个值属于哪个桶，而hash表大小是2的指数(2^B)。每次扩容，会增大到上次大小的两倍。结构体中有一个buckets和一个oldbuckets是用来实现增量扩容的。正常情况下直接使用buckets，而oldbuckets为空。如果当前哈希表正在扩容中，则oldbuckets不为空，并且buckets大小是oldbuckets大小的两倍。 按key的类型采用相应的hash算法得到key的hash值。将hash值的低位当作hmap结构体中buckets数组的index，找到key所在的bucket。将hash的高8位存储在了bucket的tophash中。**注意，这里高8位不是用来当作key/value在bucket内部的offset的，而是作为一个主键，在查找时对tophash数组的每一项进行顺序匹配的。**先比较hash值高位与bucket的tophash[i]是否相等，如果相等则再比较bucket的第i个的key与所给的key是否相等。如果相等，则返回其对应的value，反之，在overflow buckets中按照上述方法继续寻找。 **注意：**Bucket中key/value的放置顺序，是将keys放在一起，values放在一起，为什么不将key和对应的value放在一起呢？如果那么做，存储结构将变成key1/value1/key2/value2… 设想如果是这样的一个map[int64]int8，考虑到字节对齐，会浪费很多存储空间。不得不说通过上述的一个小细节，可以看出Go在设计上的深思熟虑。 Go语言使用的是增量扩容。假设扩容之前容量为X，扩容之后容量为Y，对于某个哈希值hash，一般情况下(hash mod X)不等于(hash mod Y)，所以扩容之后要重新计算每一项在哈希表中的新位置。当hash表扩容之后，需要将那些旧的pair重新哈希到新的table上(源代码中称之为evacuate)， 这个工作并没有在扩容之后一次性完成，而是逐步的完成（在insert和remove时每次搬移1-2个pair），主要是为了缩短map容器的响应时间，避免扩容时阻塞(本质上还是将总的扩容时间分摊到了每一次哈希操作上面)。 **注意：**如果key或value小于128字节，则它们的值是直接使用的bucket作为存储的。否则bucket中存储的是指向实际key/value数据的指针， ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:3","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"通道(channel) 类型标记为chan，它在栈上只是一个指针，实际的数据都是由指针所指向的堆上面。零值为nil，引用类型 一个channel是一个通信机制，它可以让一个goroutine通过它给另一个goroutine发送值信息。每个channel都有一个特殊的类型，也就是channels可发送数据的类型。一个可以发送int类型数据的channel一般写为chan int。 // 通过内置make函数创建chan ch := make(chan int) // 先声明，再创建 var ch chan int ch = make(chan int) // 通过内置close函数关闭chan，此操作不是必须的，当没有被引用时，GC会回收 close(ch) channel创建时默认时双向的，但Go语言也提供了单向的channel，分别表示用于只发送或只接收的channel。类型chan\u003c- int表示一个只发送int的channel，只能发送不能接收。相反，类型**\u003c-chan int**表示一个只接收int的channel，只能接收不能发送。（箭头\u003c-和关键字chan的相对位置表明了channel的方向）这种限制将在编译期检测。 Go语言提供了无缓冲的channels和带缓冲的channels，在使用make创建时看是否提供了第二个参数。 // 无缓冲channel ch := make(chan int) // 带缓冲channel,容量为100 ch := make(chan int, 100) // 读channel时可忽略读取到的值 \u003c-ch 读或写一个nil的channel的操作会永远阻塞 读一个已关闭的channel会立刻返回一个channel元素类型的零值 写一个已关闭的channel会导致panic 无缓冲channel是同步的，若发送者和接受者不是同时存在，则读或写将被阻塞 有缓冲channel是异步的，当容量为满时写将被阻塞，当容量为空时读将被阻塞 select-case 多路复用 package main import ( \"fmt\" \"os\" \"time\" ) func main() { abort := make(chan struct{}) go func() { os.Stdin.Read(make([]byte, 1)) abort \u003c- struct{}{} }() tick := time.NewTicker(1 * time.Second) fmt.Println(\"Commencing countdown. Please return to abort.\") for countdown := 10; countdown \u003e 0; countdown-- { fmt.Println(countdown) /* case后必须是channel变量的操作 若不存在default分支且所有channel都未触发时，select将阻塞 若存在default分支且所有channel都未触发时，会立即执行default分支 若只有1个channel触发时，会立即执行对应的case分支代码 若同时多个channel触发时，会随机选择执行其中一个对应的case分支代码 */ select { case \u003c-abort: fmt.Println(\"Lauch aborted!\") return case \u003c-tick.C: } } tick.Stop() close(abort) fmt.Println(\"Lauching.\") } // 读取channel时可返回2个参数，ok表示是否读取成功，当fileSize被关闭时会立即返回false fileSize := make(chan int64) case size, ok := \u003c-fileSize for-range 迭代，可循环获取channel上的数据，若channel上没有数据会被阻塞。当channel被close后for-range结束循环迭代 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:4","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"函数(function) 类型标记为func，函数声明包括函数名、形式参数列表、返回值列表（可省略）以及函数体。零值为nil，引用类型 func name(parameter-list) (result-list) { body } // 当涉及多参数返回时，需要注意返回值类型是必须的，返回值命名是可选的（所有返回值要么都有命名，要么都没有） // 命名的返回值可以在函数内部作为变量来使用 func name(parameter-list) (bool, error) { body } func name(parameter-list) (result bool, err error) { body } // 以下2种声明是错误的 func name(parameter-list) (result bool, error) { body } func name(parameter-list) (bool, err error) { body } 函数的类型被称为函数的标识符。如果两个函数形式参数列表和返回值列表中的变量类型一一对应，那么这两个函数被认为有相同的类型和标识符。形参和返回值的变量名不影响函数标识符也不影响它们是否可以以省略参数类型的形式表示。 拥有函数名的函数只能在包级语法块中被声明，通过函数字面量（function literal），我们可绕过这一限制，在任何表达式中表示一个函数值。函数字面量的语法和函数声明相似，区别在于func关键字后没有函数名。函数值字面量是一种表达式，它的值被成为匿名函数（anonymous function）。 // squares返回一个匿名函数。 // 该匿名函数每次被调用时都会返回下一个数的平方。 func squares() func() int { var x int return func() int { x++ return x * x } } func main() { f := squares() fmt.Println(f()) // \"1\" fmt.Println(f()) // \"4\" fmt.Println(f()) // \"9\" fmt.Println(f()) // \"16\" } **注意：**Go语言没有默认参数值，也没有任何方法可以通过参数名指定形参，因此形参和返回值的变量名对于函数调用者而言没有意义 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:5","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"方法 在函数声明时，在其名字之前放上一个变量，即是一个方法。这个附加的参数会将该函数附加到这种类型上，即相当于为这种类型定义了一个独占的方法。 import \"math\" // 结构体 type Point struct{ X, Y float64 } // 函数 func Distance(p, q Point) float64 { return math.Hypot(q.X-p.X, q.Y-p.Y) } // Point结构的一个方法, func (p Point) Distance(q Point) float64 { return math.Hypot(q.X-p.X, q.Y-p.Y) } 上面的代码里那个附加的参数p，叫做方法的接收器(receiver)，早期的面向对象语言留下的遗产将调用一个方法称为“向一个对象发送消息”。可以任意的选择接收器的名字。 在方法调用过程中，接收器参数一般会在方法名之前出现。 // 定义Point类型的变量 p := Point{1, 2} q := Point{3, 4} // 通过变量调用方法 r := p.Distance(q) 也可以采用指针来声明方法，如下： // 方法的接收器类型是*Point func (p *Point) ScaleBy(factor float64) { p.X *= factor p.Y *= factor } 不管你的method的receiver是指针类型还是非指针类型，都是可以通过指针/非指针类型进行调用的，编译器会帮你做类型转换。 在声明一个method的receiver该是指针还是非指针类型时，你需要考虑两方面的因素，第一方面是这个对象本身是不是特别大，如果声明为非指针变量时，调用会产生一次拷贝；第二方面是如果你用指针类型作为receiver，那么你一定要注意，这种指针类型指向的始终是一块内存地址，就算你对其进行了拷贝。 不能通过一个无法取到地址的接收器来调用指针方法，比如临时变量的内存地址就无法获取得到。Point{1, 2}.ScaleBy(2)是非法的，编译会报错 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:6","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"结构体(struct) 类型标记为struct，零值为结构中各成员变量所对应的零值，值类型 结构体是一种聚合的数据类型，是由零个或多个任意类型的值聚合成的实体。每个值称为结构体的成员。 // 定义了结构体Employee type Employee struct { ID int Name string Address string DoB time.Time Position string Salary int ManagerID int } // 声明变量，类型为Employee var dilbert Employee dilbert结构体变量的成员可以通过点操作符访问，比如dilbert.Name和dilbert.DoB。因为dilbert是一个变量，它所有的成员也同样是变量。 // 直接对每个成员赋值 dilbert.Salary -= 5000 // 对成员取地址，然后通过指针访问 position := \u0026dilbert.Position *position = \"Senior \" + *position // 点操作符也可以和指向结构体的指针一起工作： var employeeOfTheMonth *Employee = \u0026dilbert employeeOfTheMonth.Position += \" (proactive team player)\" **注意：**结构体成员名字是以大写字母开头的，那么该成员就是导出的；这是Go语言导出规则决定的。一个结构体可能同时包含导出和未导出的成员。 结构体值也可以用结构体字面值表示，结构体字面值可以指定每个成员的值 // 严格按照结构体定义的成员顺序 type Point struct{ X, Y int } p := Point{1, 2} // 以成员名字和相应的值来初始化，可以包含部分或全部的成员，成员出现的顺序不重要 p := Point{X: 1, Y: 2} // 初始化全部成员 anim := gif.GIF{LoopCount: nframes} // 初始化其中一个成员，其它成员默认为对应的零值 // 以上2种方式不能混用 如果结构体的全部成员都是可以比较的，那么结构体也是可以比较的，那样的话两个结构体将可以使用= =或!=运算符进行比较。相等比较运算符==将比较两个结构体的每个成员 Go语言提供的不同寻常的结构体嵌入机制，让一个命名的结构体包含另一个结构体类型的匿名成员，这样就可以通过简单的点运算符x.f来访问匿名成员链中嵌套的x.d.e.f成员。 type Point struct { X, Y int } type Circle struct { Point // 匿名嵌套 Radius int } type Wheel struct { Circle // 匿名嵌套 Spokes int } // 基于匿名嵌入的特性，可以直接访问叶子属性而不需要给出完整的路径 var w Wheel w.X = 8 // 等价于w.Circle.Point.X = 8 w.Y = 8 // 等价于w.Circle.Point.Y = 8 w.Radius = 5 // 等价于w.Circle.Radius = 5 w.Spokes = 20 // 结构体字面值并没有简短表示匿名成员的语法， 因此下面的语句都不能编译通过 w = Wheel{8, 8, 5, 20} // compile error: unknown fields w = Wheel{X: 8, Y: 8, Radius: 5, Spokes: 20} // compile error: unknown fields // 正确的做法 w = Wheel{ Circle: Circle{ Point: Point{X: 8, Y: 8}, Radius: 5, }, Spokes: 20, // NOTE: 逗号是必须要的 } 因为匿名成员也有一个隐式的名字，因此不能同时包含两个类型相同的匿名成员，这会导致名字冲突。同时，因为成员的名字是由其类型隐式地决定的，所有匿名成员也有可见性的规则约束。在上面的例子中，Point和Circle匿名成员都是导出的。即使它们不导出（比如改成小写字母开头的point和circle），我们依然可以用简短形式访问匿名成员嵌套的成员。 但是在包外部，因为circle和point没有导出不能访问它们的成员，因此简短的匿名成员访问语法也是禁止的。 目前描述的匿名成员特性只是对访问嵌套成员的点运算符提供了简短的语法糖。匿名成员并不要求是结构体类型，其实任何命名的类型都可以作为结构体的匿名成员。 简短的点运算符语法可以用于选择匿名成员嵌套的成员，也可以用于访问它们的方法。实际上，外层的结构体不仅仅是获得了匿名成员类型的所有成员，而且也获得了该类型导出的全部的方法。这个机制可以用于将一个有简单行为的对象组合成有复杂行为的对象。组合是Go语言中面向对象编程的核心 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:7","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["Golang"],"content":"接口(interface) 接口类型是一种抽象的类型，类型标记为interface，零值为nil，引用类型 接口类型具体描述了一系列方法的集合，一个实现了这些方法的具体类型是这个接口类型的实例。 **依赖于接口而不是实现，优先使用组合而不是继承，这是程序抽象的基本原则。**但是长久以来以C++为代表的“面向对象”语言曲解了这些原则，让人们走入了误区。为什么要将方法和数据绑死？为什么要有多重继承这么变态的设计？面向对象中最强调的应该是对象间的消息传递，却为什么被演绎成了封装继承和多态。面向对象是否实现程序程序抽象的合理途径，又或者是因为它存在我们就认为它合理了。历史原因，中间出现了太多的错误。不管怎么样，Go的interface给我们打开了一扇新的窗。 关于C++和面向对象的发展可以参考下孟岩的文章–function/bind的救赎 interface实际上就是一个结构体，包含两个成员。其中一个成员是指向具体数据的指针，另一个成员中包含了类型信息。 **注意：**空接口和带方法的接口底层结构略有不同，带方法的接口除了包含类型信息还需要包含具体类型中已实现的方法 一个不包含任何值的nil接口和一个刚好包含nil指针的接口值是不同的。这个细微区别产生了一个容易绊倒每个Go程序员的陷阱。具体可参考官方文档，国内链接 国外链接 package main import ( \"fmt\" ) type Student struct { Name string } func main() { var s *Student var it interface{} // 此时it为nil.output: nil if it == nil { fmt.Println(\"nil\") } else { fmt.Println(\"non nil\") } // 赋值后it本身不为nil了,其类型字段指向了*Student,其值指向的是nil.output: non nil it = s if it == nil { fmt.Println(\"nil\") } else { fmt.Println(\"non nil\") } } // 引用官方文档的例子，无论条件怎么变化始终会返回一个non-nil的error func returnsError() error { var p *MyError = nil if bad() { p = ErrBad } return p // Will always return a non-nil error. } 空的interface可以被当作任意类型来使用，它使得Go语言拥有了一定的动态性，但却又不损失静态语言在类型安全方面拥有的编译时检查的优势。 ","date":"2020-11-22","objectID":"/2020/11/22/golang-data-type/:2:8","tags":["go"],"title":"golang数据类型","uri":"/2020/11/22/golang-data-type/"},{"categories":["MySQL"],"content":"开启GTID. # 启用gtid模式,每个事务有个唯一的id,全局事务ID,事务提交时分配,基于gtid来复制. gtid_mode=ON # 开启gtid的一些安全限制. enforce_gtid_consistency=ON # gtid生成方式,默认为自动. # gtid_next=AUTOMATIC # 从库启动复制,master_auto_position=1表示开启基于GTID的复制. CHANGE MASTER TO MASTER_HOST='xxx', MASTER_PORT=3306, MASTER_USER='user_name', MASTER_PASSWORD='password', MASTER_AUTO_POSITION=1; start slave; ","date":"2020-11-09","objectID":"/2020/11/09/mysql-gtid/:1:0","tags":["mysql"],"title":"MySQL基于GTID复制","uri":"/2020/11/09/mysql-gtid/"},{"categories":["MySQL"],"content":"并行复制参数. # 并行复制类型,默认值为DATABASE,即按库来并行复制,LOGICAL_CLOCK为根据同时进入prepare和commit来并行复制. slave_parallel_type=LOGICAL_CLOCK # 并行复制线程数. slave_parallel_workers=8 # 并行复制策略,默认值为COMMIT_ORDER,即按照上面的prepare和commit来并行; # WRITESET对事务中的每一行计算hash,组合成writeset,如果两个事务没有更新相同行,writeset会没有交集可并行. # WRITESET直接记录在binlog,不需要解析event,对binlog的格式没要求,5.7.22版本的新功能,binlog协议不向上兼容. # WRITESET_SESSION,即在WRITESET基础上多了个约束,主库上同一线程先后执行的事务,在备库也要保证相同的顺序. binlog_transaction_dependency_tracking=WRITESET transaction_write_set_extraction=XXHASH64 # 记录writeset的容量,不需要修改,复制时可以并发的事务数大概为该值的一半. #binlog_transaction_dependency_history_size=25000 # slave把从master接收到的binlog记录到自己的binlog中,主要用于级联复制的场景. log_slave_updates=ON ","date":"2020-11-09","objectID":"/2020/11/09/mysql-gtid/:2:0","tags":["mysql"],"title":"MySQL基于GTID复制","uri":"/2020/11/09/mysql-gtid/"},{"categories":["MySQL"],"content":"GTID的限制. 从复制时报错,error: 1032 select * from performance_schema.replication_applier_status_by_worker可以查询从复制时的错误. error: 1032,主删除数据,但从没有相应的记录.遇到错误主从复制会停止. 解决方案是在从库上跳过主库的这个事务: -- 设置从库上的gtid_next为报错事务的gtid. set gtid_next=\"af299bf7-dc7c-11ea-8417-0242ac170002:22805\"; begin; commit; start slave; create function报错,error: 1418 [Err] 1418 - This function has none of DETERMINISTIC, NO SQL, or READS SQL DATA in its declaration and binary logging is enabled (you might want to use the less safe log_bin_trust_function_creators variable) 解决方案: DELIMITER;;CREATEFUNCTION`xxxx`(user_idINT)RETURNSvarchar(4000)CHARSETutf8COLLATEutf8_unicode_ci-- 添加关键字DETERMINISTIC. DETERMINISTICBEGIN create table报错,error: 1786 [Err] 1786 - Statement violates GTID consistency: CREATE TABLE … SELECT. 解决方案,需要拆分成两部分,create语句和insert语句: CREATE TABLE xxxx LIKE t; INSERT INTO xxxx SELECT * FROM t; create temporary报错,error: 1787 [Err] 1787 - Statement violates GTID consistency: CREATE TEMPORARY TABLE and DROP TEMPORARY TABLE can only be executed outside transactional context. These statements are also not allowed in a function or trigger because functions and triggers are also considered to be multi-statement transactions. 解决方案: 在autocommit=1的情况下可以创建临时表,主库创建临时表时不产生GTID信息,所以不会同步到从库,但在删除临时表时会产生GTID,从在处理时会报错,导致复制中断. ","date":"2020-11-09","objectID":"/2020/11/09/mysql-gtid/:3:0","tags":["mysql"],"title":"MySQL基于GTID复制","uri":"/2020/11/09/mysql-gtid/"},{"categories":["MySQL"],"content":"查看GTID -- 主库上执行,Executed_Gtid_Set表示已执行过的GTID集合. mysql\u003eshowmasterstatus\\G***************************1.row***************************File:bin.000005Position:194Binlog_Do_DB:Binlog_Ignore_DB:Executed_Gtid_Set:af299bf7-dc7c-11ea-8417-0242ac170002:1-228051rowinset(0.00sec)-- 在从库上执行,Executed_Gtid_Set表示从库已经执行过的GTID集合. -- Retrieved_Gtid_Set,从库会扫描最后一个relay log,显示当前扫描所得的GTID集合. mysql\u003eshowslavestatus\\G***************************1.row***************************Slave_IO_State:WaitingformastertosendeventMaster_Host:192.168.20.151Master_User:mtp2Master_Port:3406Connect_Retry:60Master_Log_File:bin.000005Read_Master_Log_Pos:194Relay_Log_File:relay_log.000007Relay_Log_Pos:355Relay_Master_Log_File:bin.000005Slave_IO_Running:YesSlave_SQL_Running:YesReplicate_Do_DB:Replicate_Ignore_DB:Replicate_Do_Table:Replicate_Ignore_Table:Replicate_Wild_Do_Table:Replicate_Wild_Ignore_Table:Last_Errno:0Last_Error:Skip_Counter:0Exec_Master_Log_Pos:194Relay_Log_Space:556Until_Condition:NoneUntil_Log_File:Until_Log_Pos:0Master_SSL_Allowed:NoMaster_SSL_CA_File:Master_SSL_CA_Path:Master_SSL_Cert:Master_SSL_Cipher:Master_SSL_Key:Seconds_Behind_Master:0Master_SSL_Verify_Server_Cert:NoLast_IO_Errno:0Last_IO_Error:Last_SQL_Errno:0Last_SQL_Error:Replicate_Ignore_Server_Ids:Master_Server_Id:1Master_UUID:af299bf7-dc7c-11ea-8417-0242ac170002Master_Info_File:mysql.slave_master_infoSQL_Delay:0SQL_Remaining_Delay:NULLSlave_SQL_Running_State:Slavehasreadallrelaylog;waitingformoreupdatesMaster_Retry_Count:86400Master_Bind:Last_IO_Error_Timestamp:Last_SQL_Error_Timestamp:Master_SSL_Crl:Master_SSL_Crlpath:Retrieved_Gtid_Set:Executed_Gtid_Set:60012ca6-dc7d-11ea-8f34-0242ac180002:1-5,af299bf7-dc7c-11ea-8417-0242ac170002:1-22805Auto_Position:1Replicate_Rewrite_DB:Channel_Name:Master_TLS_Version:1rowinset(0.00sec)-- 查看binlog event.在语句前后都会设置GTID_NEXT -- SHOW BINLOG EVENTS [IN 'log_name'] [FROM pos] [LIMIT [offset,] row_count] mysql\u003eshowbinlogeventsin'bin.000003'limit0,5;+------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ |Log_name|Pos|Event_type|Server_id|End_log_pos|Info|+------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ |bin.000003|4|Format_desc|1|123|Serverver:5.7.31-log,Binlogver:4||bin.000003|123|Previous_gtids|1|194|af299bf7-dc7c-11ea-8417-0242ac170002:1-5||bin.000003|194|Gtid|1|259|SET@@SESSION.GTID_NEXT='af299bf7-dc7c-11ea-8417-0242ac170002:6'||bin.000003|259|Query|1|398|createdatabasemtp2defaultcharsetutf8collateutf8_unicode_ci||bin.000003|398|Gtid|1|463|SET@@SESSION.GTID_NEXT='af299bf7-dc7c-11ea-8417-0242ac170002:7'|+------------+-----+----------------+-----------+-------------+-------------------------------------------------------------------+ 5rowsinset(0.00sec)-- 解析binlog. root@baecf53ab2f8:/var/log/mysql#mysqlbinlog-vvbin.000003--include-gtids='af299bf7-dc7c-11ea-8417-0242ac170002:6' /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;DELIMITER/*!*/;#at4#20081217:17:57serverid1end_log_pos123CRC320x815f99b3Start:binlogv4,serverv5.7.31-logcreated20081217:17:57atstartupROLLBACK/*!*/;BINLOG' xbMzXw8BAAAAdwAAAHsAAAAAAAQANS43LjMxLWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAADFszNfEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AbOZX4E= '/*!*/;#at123#20081217:17:57serverid1end_log_pos194CRC320x5856cea2Previous-GTIDs#af299bf7-dc7c-11ea-8417-0242ac170002:1-5#at194#20081217:19:17serverid1end_log_pos259CRC320xf56e199cGTIDlast_committed=0sequence_number=1rbr_only=noSET@@SESSION.GTID_NEXT='af299bf7-dc7c-11ea-8417-0242ac170002:6'/*!*/;#at259#20081217:19:17serverid1end_log_pos398CRC320x010ed0fbQuerythread_id=2exec_time=0error_code=0SETTIMESTAMP=1597223957/*!*/;SET@@session.pseudo_thread_id=2/*!*/;SET@@session.foreign_key_checks=1,@@session.sql_auto_is_null=0,@@session.unique_checks=1,@@session.autocommit=1/*!*/;SET@@session.sql_mode=1436549152/*!","date":"2020-11-09","objectID":"/2020/11/09/mysql-gtid/:4:0","tags":["mysql"],"title":"MySQL基于GTID复制","uri":"/2020/11/09/mysql-gtid/"},{"categories":["Golang"],"content":"连接server端失败的处理 ","date":"2020-11-07","objectID":"/2020/11/07/grpc-connection-exception/:1:0","tags":["go","grpc"],"title":"gRPC系列之连接异常机制","uri":"/2020/11/07/grpc-connection-exception/"},{"categories":["Golang"],"content":"重试机制 // 重点关注addrConn.resetTransport方法. func (ac *addrConn) resetTransport() { // 代码逻辑放在一个死循环里的. for i := 0; ; i++ { if i \u003e 0 { ac.cc.resolveNow(resolver.ResolveNowOptions{}) } ac.mu.Lock() // 当连接关闭时直接返回. if ac.state == connectivity.Shutdown { ac.mu.Unlock() return } addrs := ac.addrs // 获取backoff时间,根据重试的次数会计算出不同的时间,算法后面重点关注. backoffFor := ac.dopts.bs.Backoff(ac.backoffIdx) // This will be the duration that dial gets to finish. // 超时时间,默认20秒. dialDuration := minConnectTimeout if ac.dopts.minConnectTimeout != nil { dialDuration = ac.dopts.minConnectTimeout() } if dialDuration \u003c backoffFor { // Give dial more time as we keep failing to connect. dialDuration = backoffFor } // We can potentially spend all the time trying the first address, and // if the server accepts the connection and then hangs, the following // addresses will never be tried. // // The spec doesn't mention what should be done for multiple addresses. // https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md#proposed-backoff-algorithm connectDeadline := time.Now().Add(dialDuration) // 更新状态为连接中. ac.updateConnectivityState(connectivity.Connecting, nil) ac.transport = nil ac.mu.Unlock() // 尝试连接服务端. newTr, addr, reconnect, err := ac.tryAllAddrs(addrs, connectDeadline) if err != nil { // After exhausting all addresses, the addrConn enters // TRANSIENT_FAILURE. ac.mu.Lock() // 如果失败了,且状态为Shutdown直接返回. if ac.state == connectivity.Shutdown { ac.mu.Unlock() return } // 标记状态为失败. ac.updateConnectivityState(connectivity.TransientFailure, err) // Backoff. b := ac.resetBackoff ac.mu.Unlock() // 根据backoff时间创建定时器. timer := time.NewTimer(backoffFor) select { case \u003c-timer.C: // backoff时间到,增加backoff次数,继续循环去尝试连接. ac.mu.Lock() ac.backoffIdx++ ac.mu.Unlock() case \u003c-b: // 外部重置了backoff,马上重新循环去尝试连接. timer.Stop() case \u003c-ac.ctx.Done(): // context取消了或超时了,直接返回. timer.Stop() return } continue } ac.mu.Lock() if ac.state == connectivity.Shutdown { ac.mu.Unlock() newTr.Close() return } ac.curAddr = addr ac.transport = newTr ac.backoffIdx = 0 hctx, hcancel := context.WithCancel(ac.ctx) ac.startHealthCheck(hctx) ac.mu.Unlock() // Block until the created transport is down. And when this happens, // we restart from the top of the addr list. \u003c-reconnect.Done() hcancel() // restart connecting - the top of the loop will set state to // CONNECTING. This is against the current connectivity semantics doc, // however it allows for graceful behavior for RPCs not yet dispatched // - unfortunate timing would otherwise lead to the RPC failing even // though the TRANSIENT_FAILURE state (called for by the doc) would be // instantaneous. // // Ideally we should transition to Idle here and block until there is // RPC activity that leads to the balancer requesting a reconnect of // the associated SubConn. } } 总结 当连接失败后会等待一段时间之后再尝试重连,时间间隔的算法依赖于backoff.Strategy接口的Backoff方法. 利用context的超时控制或取消机制,直接结束. ","date":"2020-11-07","objectID":"/2020/11/07/grpc-connection-exception/:1:1","tags":["go","grpc"],"title":"gRPC系列之连接异常机制","uri":"/2020/11/07/grpc-connection-exception/"},{"categories":["Golang"],"content":"Backoff算法 // 在DialContext函数中,当没有设置bs自定义参数时,会默认设置为DefaultExponential. if cc.dopts.bs == nil { cc.dopts.bs = backoff.DefaultExponential } // internal/backoffG/backoff.go var DefaultExponential = Exponential{Config: grpcbackoff.DefaultConfig} // backoff/backoff.go // DefaultConfig is a backoff configuration with the default values specfied // at https://github.com/grpc/grpc/blob/master/doc/connection-backoff.md. // // This should be useful for callers who want to configure backoff with // non-default values only for a subset of the options. var DefaultConfig = Config{ // 第一次失败之后的延迟时间. BaseDelay: 1.0 * time.Second, // 多次失败之后的时间乘数. Multiplier: 1.6, // 随机因子. Jitter: 0.2, // 最大延迟时间. MaxDelay: 120 * time.Second, } // Backoff returns the amount of time to wait before the next retry given the // number of retries. func (bc Exponential) Backoff(retries int) time.Duration { // 当重试次数为0时直接返回BaseDelay,为1秒. if retries == 0 { return bc.Config.BaseDelay } backoff, max := float64(bc.Config.BaseDelay), float64(bc.Config.MaxDelay) for backoff \u003c max \u0026\u0026 retries \u003e 0 { // 当backoff小于max且重试次数大于0时不断的乘以Multiplier. backoff *= bc.Config.Multiplier retries-- } if backoff \u003e max { backoff = max } // 对时间加上一个随机数. // Randomize backoff delays so that if a cluster of requests start at // the same time, they won't operate in lockstep. backoff *= 1 + bc.Config.Jitter*(grpcrand.Float64()*2-1) if backoff \u003c 0 { return 0 } return time.Duration(backoff) } 总结 可以通过grpc.WithConnectParams和grpc.WithBackoff来设置自定义的backoff策略,在自定义策略里可以定义重试的时间间隔. 默认的backoff策略,第一次重试间隔为1秒,第二次为1*1.6+随机数…第N次为1*1.6^N +随机数(其中1*1.6^N最大不能超过120秒). ","date":"2020-11-07","objectID":"/2020/11/07/grpc-connection-exception/:1:2","tags":["go","grpc"],"title":"gRPC系列之连接异常机制","uri":"/2020/11/07/grpc-connection-exception/"},{"categories":["Golang"],"content":"客户端超时的处理 客户端在调用rpc接口时带timeout的context是如何传递给服务端的. // 在调用对应的rpc方法时设置了超时时间为3秒. ctx, cancel := context.WithTimeout(context.Background(), 3*time.Second) defer cancel() reply, err := client.SayHello(ctx, \u0026pb.HelloRequest{Name: \"zhou\"}) // 调用SayHello方法最终会调用到invoke函数(call.go文件中). func invoke(ctx context.Context, method string, req, reply interface{}, cc *ClientConn, opts ...CallOption) error { cs, err := newClientStream(ctx, unaryStreamDesc, cc, method, opts...) if err != nil { return err } if err := cs.SendMsg(req); err != nil { return err } return cs.RecvMsg(reply) } // 在newClientStream方法中会创建clientStream对象,ctx会赋值给该对象的ctx字段. func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, method string, opts ...CallOption) (_ ClientStream, err error) { ...... // 前面代码都不关注,这里生成clientStream对象,关注其ctx字段. cs := \u0026clientStream{ callHdr: callHdr, ctx: ctx, methodConfig: \u0026mc, opts: opts, callInfo: c, cc: cc, desc: desc, codec: c.codec, cp: cp, comp: comp, cancel: cancel, beginTime: beginTime, firstAttempt: true, } if !cc.dopts.disableRetry { cs.retryThrottler = cc.retryThrottler.Load().(*retryThrottler) } cs.binlog = binarylog.GetMethodLogger(method) // 在newAttemptLocked中会通过负载均衡算法来选择Ready状态的连接. // Only this initial attempt has stats/tracing. // TODO(dfawley): move to newAttempt when per-attempt stats are implemented. if err := cs.newAttemptLocked(sh, trInfo); err != nil { cs.finish(err) return nil, err } // 在withRetry中最终会调用op函数,该函数会调用到csAttempt.newStream方法. op := func(a *csAttempt) error { return a.newStream() } if err := cs.withRetry(op, func() { cs.bufferForRetryLocked(0, op) }); err != nil { cs.finish(err) return nil, err } ...... // 下面的也不用关注了. return cs, nil } func (a *csAttempt) newStream() error { cs := a.cs cs.callHdr.PreviousAttempts = cs.numRetries // 这里会调用NewStream方法,t指向的是http2Client对象,cs是指向clientStream对象的. s, err := a.t.NewStream(cs.ctx, cs.callHdr) if err != nil { if _, ok := err.(transport.PerformedIOError); ok { // Return without converting to an RPC error so retry code can // inspect. return err } return toRPCErr(err) } cs.attempt.s = s cs.attempt.p = \u0026parser{r: s} return nil } // NewStream creates a stream and registers it into the transport as \"active\" // streams. func (t *http2Client) NewStream(ctx context.Context, callHdr *CallHdr) (_ *Stream, err error) { ctx = peer.NewContext(ctx, t.getPeer()) // 重点关注createHeaderFields,这个方法会处理HEADERS的数据,来传播表头数据. headerFields, err := t.createHeaderFields(ctx, callHdr) ...... // 下面的代码不关注. } func (t *http2Client) createHeaderFields(ctx context.Context, callHdr *CallHdr) ([]hpack.HeaderField, error) { // 主要处理HEADERS头部数据. ...... // 支持的字段. headerFields := make([]hpack.HeaderField, 0, hfLen) headerFields = append(headerFields, hpack.HeaderField{Name: \":method\", Value: \"POST\"}) headerFields = append(headerFields, hpack.HeaderField{Name: \":scheme\", Value: t.scheme}) headerFields = append(headerFields, hpack.HeaderField{Name: \":path\", Value: callHdr.Method}) headerFields = append(headerFields, hpack.HeaderField{Name: \":authority\", Value: callHdr.Host}) headerFields = append(headerFields, hpack.HeaderField{Name: \"content-type\", Value: grpcutil.ContentType(callHdr.ContentSubtype)}) headerFields = append(headerFields, hpack.HeaderField{Name: \"user-agent\", Value: t.userAgent}) headerFields = append(headerFields, hpack.HeaderField{Name: \"te\", Value: \"trailers\"}) if callHdr.PreviousAttempts \u003e 0 { headerFields = append(headerFields, hpack.HeaderField{Name: \"grpc-previous-rpc-attempts\", Value: strconv.Itoa(callHdr.PreviousAttempts)}) } if callHdr.SendCompress != \"\" { headerFields = append(headerFields, hpack.HeaderField{Name: \"grpc-encoding\", Value: callHdr.SendCompress}) headerFields = append(headerFields, hpack.HeaderField{Name: \"grpc-accept-encoding\", Value: callHdr.SendCompress}) } // 重点在这里,获取ctx的deadline时间,然后放入头部中的grpc-timeout字段中,客户端就是利用这个来吧超时时间传递到服务端的. if dl, ok := ctx.Deadline(); ok { // Send out timeout regardless its value. The server can detect timeout context by itself. // TODO(mmukhi)","date":"2020-11-07","objectID":"/2020/11/07/grpc-client-server-timeout/:1:0","tags":["go","grpc"],"title":"gRPC系列之client和server的timetou机制","uri":"/2020/11/07/grpc-client-server-timeout/"},{"categories":["Golang"],"content":"服务端超时处理 // HandleStreams receives incoming streams using the given handler. This is // typically run in a separate goroutine. // traceCtx attaches trace to ctx and returns the new context. func (t *http2Server) HandleStreams(handle func(*Stream), traceCtx func(context.Context, string) context.Context) { defer close(t.readerDone) for { // 该函数主要接收客户端发送过来的数据,重点关注处理HEADERS部分的数据. ...... switch frame := frame.(type) { case *http2.MetaHeadersFrame: // 当frame类型为HEADERS时的处理. if t.operateHeaders(frame, handle, traceCtx) { t.Close() break } // 其它暂不关注. ...... } } // operateHeader takes action on the decoded headers. func (t *http2Server) operateHeaders(frame *http2.MetaHeadersFrame, handle func(*Stream), traceCtx func(context.Context, string) context.Context) (fatal bool) { streamID := frame.Header().StreamID state := \u0026decodeState{ serverSide: true, } // 获取HEADERS里的数据. if h2code, err := state.decodeHeader(frame); err != nil { if _, ok := status.FromError(err); ok { t.controlBuf.put(\u0026cleanupStream{ streamID: streamID, rst: true, rstCode: h2code, onWrite: func() {}, }) } return false } buf := newRecvBuffer() s := \u0026Stream{ id: streamID, st: t, buf: buf, fc: \u0026inFlow{limit: uint32(t.initialWindowSize)}, recvCompress: state.data.encoding, method: state.data.method, contentSubtype: state.data.contentSubtype, } if frame.StreamEnded() { // s is just created by the caller. No lock needed. s.state = streamReadDone } // 当有设置超时时,Stream对象的ctx设置为Timeout的. if state.data.timeoutSet { s.ctx, s.cancel = context.WithTimeout(t.ctx, state.data.timeout) } else { s.ctx, s.cancel = context.WithCancel(t.ctx) } // 其它暂不关注. ...... return false } func (d *decodeState) decodeHeader(frame *http2.MetaHeadersFrame) (http2.ErrCode, error) { // frame.Truncated is set to true when framer detects that the current header // list size hits MaxHeaderListSize limit. if frame.Truncated { return http2.ErrCodeFrameSize, status.Error(codes.Internal, \"peer header list size exceeded limit\") } // 处理HEADERS. for _, hf := range frame.Fields { d.processHeaderField(hf) } // 其它暂不关注. ...... return http2.ErrCodeProtocol, status.Error(code, d.constructHTTPErrMsg()) } func (d *decodeState) processHeaderField(f hpack.HeaderField) { switch f.Name { // 其它暂不关注. ...... case \"grpc-timeout\": // 如果有该字段,解析超时时间,在这里就和客户端联系起来了. d.data.timeoutSet = true var err error if d.data.timeout, err = decodeTimeout(f.Value); err != nil { d.data.grpcErr = status.Errorf(codes.Internal, \"transport: malformed time-out: %v\", err) } // 其它暂不关注. ...... } } // 经过上述分析,带超时的context已经赋值给Stream的ctx字段了. // 最后这个ctx会被传递给RPC接口的第一个参数,这样在服务端的接口中就能感知到超时了. 总结 服务端在收到HEADERS之后,会解析所有参数,如果有grpc-timeout,就会设置一个带timeout的context,然后传递到rpc接口中. ","date":"2020-11-07","objectID":"/2020/11/07/grpc-client-server-timeout/:2:0","tags":["go","grpc"],"title":"gRPC系列之client和server的timetou机制","uri":"/2020/11/07/grpc-client-server-timeout/"},{"categories":["Golang"],"content":"抓包 带timeout的HEADERStimeout \" 带timeout的HEADERS 从图中可以看到HEADERS中有参数grpc-timeout,值为2994000u,超时时间为2994毫秒. ","date":"2020-11-07","objectID":"/2020/11/07/grpc-client-server-timeout/:3:0","tags":["go","grpc"],"title":"gRPC系列之client和server的timetou机制","uri":"/2020/11/07/grpc-client-server-timeout/"},{"categories":["Golang"],"content":"Dail/DailContext与服务端建立连接 // 直接调用的DialContext. // Dial creates a client connection to the given target. func Dial(target string, opts ...DialOption) (*ClientConn, error) { return DialContext(context.Background(), target, opts...) } // DialContext默认是非阻塞的,不会等待连接成功,而是在后台处理连接逻辑.可以用grpc.WithBlock()来设置为阻塞的. func DialContext(ctx context.Context, target string, opts ...DialOption) (conn *ClientConn, err error) { // 生成conn. cc := \u0026ClientConn{ target: target, csMgr: \u0026connectivityStateManager{}, conns: make(map[*addrConn]struct{}), dopts: defaultDialOptions(), blockingpicker: newPickerWrapper(), czData: new(channelzData), firstResolveEvent: grpcsync.NewEvent(), } cc.retryThrottler.Store((*retryThrottler)(nil)) cc.ctx, cc.cancel = context.WithCancel(context.Background()) // 处理DiaoOption,来设置自定义参数. for _, opt := range opts { opt.apply(\u0026cc.dopts) } // 处理一元RPC的拦截器. chainUnaryClientInterceptors(cc) // 处理流式RPC的拦截器. chainStreamClientInterceptors(cc) defer func() { if err != nil { cc.Close() } }() if channelz.IsOn() { ...... } if !cc.dopts.insecure { // 校验认证相关. ...... } else { if cc.dopts.copts.TransportCredentials != nil || cc.dopts.copts.CredsBundle != nil { return nil, errCredentialsConflict } // PerRPCCredentials是指设置了自定义认证. for _, cd := range cc.dopts.copts.PerRPCCredentials { // 判断自定义认证是否依赖TLS. if cd.RequireTransportSecurity() { return nil, errTransportCredentialsMissing } } } ...... // 超时控制. if cc.dopts.timeout \u003e 0 { var cancel context.CancelFunc ctx, cancel = context.WithTimeout(ctx, cc.dopts.timeout) defer cancel() } defer func() { select { case \u003c-ctx.Done(): switch { case ctx.Err() == err: conn = nil case err == nil || !cc.dopts.returnLastError: conn, err = nil, ctx.Err() default: conn, err = nil, fmt.Errorf(\"%v: %v\", ctx.Err(), err) } default: } }() ...... // 解析target的格式,识别出对应的Scheme和Endpoint. // Determine the resolver to use. cc.parsedTarget = grpcutil.ParseTarget(cc.target) unixScheme := strings.HasPrefix(cc.target, \"unix:\") channelz.Infof(logger, cc.channelzID, \"parsed scheme: %q\", cc.parsedTarget.Scheme) // 根据Scheme来获取对应的Resolver. // 如gRPC默认支持的dns,就是在dns包的init函数中调用了resolver.Register()来注册dns Resolver. resolverBuilder := cc.getResolver(cc.parsedTarget.Scheme) if resolverBuilder == nil { // 如果获取失败,就采用默认的Resolver来解析,即passthrough. // If resolver builder is still nil, the parsed target's scheme is // not registered. Fallback to default resolver and set Endpoint to // the original target. channelz.Infof(logger, cc.channelzID, \"scheme %q not registered, fallback to default scheme\", cc.parsedTarget.Scheme) cc.parsedTarget = resolver.Target{ Scheme: resolver.GetDefaultScheme(), Endpoint: target, } resolverBuilder = cc.getResolver(cc.parsedTarget.Scheme) if resolverBuilder == nil { return nil, fmt.Errorf(\"could not get resolver for default scheme: %q\", cc.parsedTarget.Scheme) } } ...... // 认证相关. // banlancer的相关Option. cc.balancerBuildOpts = balancer.BuildOptions{ DialCreds: credsClone, CredsBundle: cc.dopts.copts.CredsBundle, Dialer: cc.dopts.copts.Dialer, ChannelzParentID: cc.channelzID, Target: cc.parsedTarget, } // 调用Resolver对象的Build方法,在Build内部会调用cc.UpdateState来更新服务端地址. // Build the resolver. rWrapper, err := newCCResolverWrapper(cc, resolverBuilder) if err != nil { return nil, fmt.Errorf(\"failed to build resolver: %v\", err) } cc.mu.Lock() cc.resolverWrapper = rWrapper cc.mu.Unlock() // 当调用grpc.WithBlock()后会阻塞直到连接成功建立. // A blocking dial blocks until the clientConn is ready. if cc.dopts.block { for { // 循环判断连接的状态是否为Ready. s := cc.GetState() if s == connectivity.Ready { break } else if cc.dopts.copts.FailOnNonTempDialError \u0026\u0026 s == connectivity.TransientFailure { if err = cc.connectionError(); err != nil { terr, ok := err.(interface { Temporary() bool }) if ok \u0026\u0026 !terr.Temporary() { return nil, err } } } if !cc.WaitForStateChange(ctx, s) { // ctx got timeout or canceled. if err = cc.connectionError(); err != nil \u0026\u0026 cc.dopts.returnLastError { return nil, err } return nil, ctx.Err() } } } return cc, nil } // newCCResolverWrapper uses the resolve","date":"2020-11-07","objectID":"/2020/11/07/grpc-client-call/:1:0","tags":["go","grpc"],"title":"gRPC系列之client端调用","uri":"/2020/11/07/grpc-client-call/"},{"categories":["Golang"],"content":"调用RPC接口 func (c *helloServiceClient) SayHello(ctx context.Context, in *HelloRequest, opts ...grpc.CallOption) (*HelloReply, error) { out := new(HelloReply) // 这里cc是ClientConn对象,调用其Invoke方法,定义在call.go文件里. err := c.cc.Invoke(ctx, \"/HelloService/SayHello\", in, out, opts...) if err != nil { return nil, err } return out, nil } // Invoke sends the RPC request on the wire and returns after response is // received. This is typically called by generated code. // // All errors returned by Invoke are compatible with the status package. func (cc *ClientConn) Invoke(ctx context.Context, method string, args, reply interface{}, opts ...CallOption) error { // allow interceptor to see all applicable call options, which means those // configured as defaults from dial option as well as per-call options opts = combine(cc.dopts.callOptions, opts) if cc.dopts.unaryInt != nil { // 如果有设置客户端拦截器,最终也会调用到invoke函数. return cc.dopts.unaryInt(ctx, method, args, reply, cc, invoke, opts...) } // 没有就直接调用invoke函数. return invoke(ctx, method, args, reply, cc, opts...) } func invoke(ctx context.Context, method string, req, reply interface{}, cc *ClientConn, opts ...CallOption) error { // 该函数内部会使用负载均衡算法,来选出一个连接. cs, err := newClientStream(ctx, unaryStreamDesc, cc, method, opts...) if err != nil { return err } if err := cs.SendMsg(req); err != nil { return err } return cs.RecvMsg(reply) } func newClientStream(ctx context.Context, desc *StreamDesc, cc *ClientConn, method string, opts ...CallOption) (_ ClientStream, err error) { // 前面代码可以不关注. ...... cs := \u0026clientStream{ callHdr: callHdr, ctx: ctx, methodConfig: \u0026mc, opts: opts, callInfo: c, cc: cc, desc: desc, codec: c.codec, cp: cp, comp: comp, cancel: cancel, beginTime: beginTime, firstAttempt: true, } if !cc.dopts.disableRetry { cs.retryThrottler = cc.retryThrottler.Load().(*retryThrottler) } cs.binlog = binarylog.GetMethodLogger(method) // 重点关注newAttempLocked. // Only this initial attempt has stats/tracing. // TODO(dfawley): move to newAttempt when per-attempt stats are implemented. if err := cs.newAttemptLocked(sh, trInfo); err != nil { cs.finish(err) return nil, err } op := func(a *csAttempt) error { return a.newStream() } if err := cs.withRetry(op, func() { cs.bufferForRetryLocked(0, op) }); err != nil { cs.finish(err) return nil, err } if cs.binlog != nil { md, _ := metadata.FromOutgoingContext(ctx) logEntry := \u0026binarylog.ClientHeader{ OnClientSide: true, Header: md, MethodName: method, Authority: cs.cc.authority, } if deadline, ok := ctx.Deadline(); ok { logEntry.Timeout = time.Until(deadline) if logEntry.Timeout \u003c 0 { logEntry.Timeout = 0 } } cs.binlog.Log(logEntry) } if desc != unaryStreamDesc { // Listen on cc and stream contexts to cleanup when the user closes the // ClientConn or cancels the stream context. In all other cases, an error // should already be injected into the recv buffer by the transport, which // the client will eventually receive, and then we will cancel the stream's // context in clientStream.finish. go func() { select { case \u003c-cc.ctx.Done(): cs.finish(ErrClientConnClosing) case \u003c-ctx.Done(): cs.finish(toRPCErr(ctx.Err())) } }() } return cs, nil } // newAttemptLocked creates a new attempt with a transport. // If it succeeds, then it replaces clientStream's attempt with this new attempt. func (cs *clientStream) newAttemptLocked(sh stats.Handler, trInfo *traceInfo) (retErr error) { newAttempt := \u0026csAttempt{ cs: cs, dc: cs.cc.dopts.dc, statsHandler: sh, trInfo: trInfo, } ...... // 重点关注getTransport. t, done, err := cs.cc.getTransport(ctx, cs.callInfo.failFast, cs.callHdr.Method) if err != nil { return err } if trInfo != nil { trInfo.firstLine.SetRemoteAddr(t.RemoteAddr()) } newAttempt.t = t newAttempt.done = done cs.attempt = newAttempt return nil } func (cc *ClientConn) getTransport(ctx context.Context, failfast bool, method string) (transport.ClientTransport, func(balancer.DoneInfo), error) { // 调用pick方法来负载均衡. t, done, err := cc.blockingpicker.pick(ctx, failfast, balancer.PickInfo{ Ctx: ct","date":"2020-11-07","objectID":"/2020/11/07/grpc-client-call/:2:0","tags":["go","grpc"],"title":"gRPC系列之client端调用","uri":"/2020/11/07/grpc-client-call/"},{"categories":["Golang"],"content":"总结 要实现自定义Resolver,需要实现resolver.Builder接口和resolver.Resolver接口.在init函数中调用resolver.Register来注册自定义的Builder对象.在Builder接口的Build方法中要调用UpdateState来更新服务端地址信息.客户端在解析target时会根据scheme来获取对应的Resolver对象. 要实现自定义负载均衡算法,需要实现base.PickerBuilder接口和balancer.Picker接口(注意这两个接口有V2版的),通过base.NewBalancerBuilder包装成balancer.Builder接口,再在init函数中调用balancer.Register来注册该Builder对象.最后grpc.WithBalancerName把Builder包装成grpc.DialOption. 要实现自定义认证,需要实现credentials.PerRPCCredentials接口,然后调用grpc.WithPerRPCCredentials把对象包装成grpc.DialOption. 在客户端调用Dial或DialContext过程中,会先通过Resolver来解析出所有的Endpoint,然后会与每一个Endpoint建立连接(采用异步的方式),同时也会设置好对应的负载均衡Picker对象.注意如果使用了grpc.WithBlock()会同步等待连接建立成功(有一个成功就可以了),否则就会直接返回. 负载均衡算法会在调用具体的RPC函数过程中使用到,在invoke函数(call.go文件中)中会根据负载均衡算法来选定一个连接,然后发起请求. ","date":"2020-11-07","objectID":"/2020/11/07/grpc-client-call/:3:0","tags":["go","grpc"],"title":"gRPC系列之client端调用","uri":"/2020/11/07/grpc-client-call/"},{"categories":["Golang"],"content":"NewServer创建gRPC服务对象 主要是基于grpc-go的1.33.1版本Unary RPC来分析. // NewServer creates a gRPC server which has no service registered and has not // started to accept requests yet. func NewServer(opt ...ServerOption) *Server { // 处理Server对象的一些定制化参数.在Go中推荐采用Option的方式从外部来影响对象内的行为. opts := defaultServerOptions for _, o := range opt { o.apply(\u0026opts) } // 生成Server对象. s := \u0026Server{ lis: make(map[net.Listener]bool), opts: opts, conns: make(map[transport.ServerTransport]bool), services: make(map[string]*serviceInfo), quit: grpcsync.NewEvent(), done: grpcsync.NewEvent(), czData: new(channelzData), } // 处理一元服务端拦截器,支持链式多拦截器. chainUnaryServerInterceptors(s) // 处理流式服务端拦截器,支持链式多拦截器. chainStreamServerInterceptors(s) s.cv = sync.NewCond(\u0026s.mu) if EnableTracing { _, file, line, _ := runtime.Caller(1) s.events = trace.NewEventLog(\"grpc.Server\", fmt.Sprintf(\"%s:%d\", file, line)) } // 用来控制处理连接的goroutine的数量,为0时表示不控制. if s.opts.numServerWorkers \u003e 0 { s.initServerWorkers() } if channelz.IsOn() { s.channelzID = channelz.RegisterServer(\u0026channelzServer{s}, \"\") } return s } ","date":"2020-11-07","objectID":"/2020/11/07/grpc-server-call/:1:0","tags":["go","grpc"],"title":"gRPC系列之server端调用","uri":"/2020/11/07/grpc-server-call/"},{"categories":["Golang"],"content":"ServerOption自定义参数 Creds 主要用来设置服务端认证相关的参数. // 用来设置TLS. // Creds returns a ServerOption that sets credentials for server connections. func Creds(c credentials.TransportCredentials) ServerOption { return newFuncServerOption(func(o *serverOptions) { o.creds = c }) } // gRPC中的credentials包,已定义相关的TransportCredentials. // NewServerTLSFromFile constructs TLS credentials from the input certificate file and key // file for server. func NewServerTLSFromFile(certFile, keyFile string) (TransportCredentials, error) { cert, err := tls.LoadX509KeyPair(certFile, keyFile) if err != nil { return nil, err } return NewTLS(\u0026tls.Config{Certificates: []tls.Certificate{cert}}), nil } UnaryInterceptor 主要用来设置服务端的拦截器. // UnaryInterceptor returns a ServerOption that sets the UnaryServerInterceptor for the // server. Only one unary interceptor can be installed. The construction of multiple // interceptors (e.g., chaining) can be implemented at the caller. func UnaryInterceptor(i UnaryServerInterceptor) ServerOption { return newFuncServerOption(func(o *serverOptions) { if o.unaryInt != nil { panic(\"The unary server interceptor was already set and may not be reset.\") } o.unaryInt = i }) } // 具体的自定义拦截器需要实现UnaryServerInterceptor函数原型. // UnaryServerInterceptor provides a hook to intercept the execution of a unary RPC on the server. info // contains all the information of this RPC the interceptor can operate on. And handler is the wrapper // of the service method implementation. It is the responsibility of the interceptor to invoke handler // to complete the RPC. type UnaryServerInterceptor func(ctx context.Context, req interface{}, info *UnaryServerInfo, handler UnaryHandler) (resp interface{}, err error) ChainUnaryInterceptor 主要用来设置服务端的链式拦截器. // 支持同时设置多个拦截器. // ChainUnaryInterceptor returns a ServerOption that specifies the chained interceptor // for unary RPCs. The first interceptor will be the outer most, // while the last interceptor will be the inner most wrapper around the real call. // All unary interceptors added by this method will be chained. func ChainUnaryInterceptor(interceptors ...UnaryServerInterceptor) ServerOption { return newFuncServerOption(func(o *serverOptions) { o.chainUnaryInts = append(o.chainUnaryInts, interceptors...) }) } ","date":"2020-11-07","objectID":"/2020/11/07/grpc-server-call/:2:0","tags":["go","grpc"],"title":"gRPC系列之server端调用","uri":"/2020/11/07/grpc-server-call/"},{"categories":["Golang"],"content":"注册RPC对象到Server中 // 注册HelloServiceServer对象到gRPC对象中. func RegisterHelloServiceServer(s *grpc.Server, srv HelloServiceServer) { s.RegisterService(\u0026_HelloService_serviceDesc, srv) } // _HelloService_serviceDesc主要用来描述RPC对象的信息. var _HelloService_serviceDesc = grpc.ServiceDesc{ ServiceName: \"HelloService\", HandlerType: (*HelloServiceServer)(nil), Methods: []grpc.MethodDesc{ { MethodName: \"SayHello\", Handler: _HelloService_SayHello_Handler, }, }, Streams: []grpc.StreamDesc{}, Metadata: \"hello.proto\", } // 注册RPC服务对象. // RegisterService registers a service and its implementation to the gRPC // server. It is called from the IDL generated code. This must be called before // invoking Serve. If ss is non-nil (for legacy code), its type is checked to // ensure it implements sd.HandlerType. func (s *Server) RegisterService(sd *ServiceDesc, ss interface{}) { // 主要用来判断接口类型是否一致. if ss != nil { ht := reflect.TypeOf(sd.HandlerType).Elem() st := reflect.TypeOf(ss) if !st.Implements(ht) { logger.Fatalf(\"grpc: Server.RegisterService found the handler of type %v that does not satisfy %v\", st, ht) } } s.register(sd, ss) } func (s *Server) register(sd *ServiceDesc, ss interface{}) { s.mu.Lock() defer s.mu.Unlock() s.printf(\"RegisterService(%q)\", sd.ServiceName) if s.serve { logger.Fatalf(\"grpc: Server.RegisterService after Server.Serve for %q\", sd.ServiceName) } // 判断服务名是否已经注册过了. if _, ok := s.services[sd.ServiceName]; ok { logger.Fatalf(\"grpc: Server.RegisterService found duplicate service registration for %q\", sd.ServiceName) } // 创建serviceInfo对象. info := \u0026serviceInfo{ // 接口的实现对象. serviceImpl: ss, // 具体的方法描述信息. methods: make(map[string]*MethodDesc), streams: make(map[string]*StreamDesc), // 元数据. mdata: sd.Metadata, } // 保存具体的方法. for i := range sd.Methods { d := \u0026sd.Methods[i] info.methods[d.MethodName] = d } for i := range sd.Streams { d := \u0026sd.Streams[i] info.streams[d.StreamName] = d } s.services[sd.ServiceName] = info } ","date":"2020-11-07","objectID":"/2020/11/07/grpc-server-call/:3:0","tags":["go","grpc"],"title":"gRPC系列之server端调用","uri":"/2020/11/07/grpc-server-call/"},{"categories":["Golang"],"content":"Server启动监听等待连接 // Serve accepts incoming connections on the listener lis, creating a new // ServerTransport and service goroutine for each. The service goroutines // read gRPC requests and then call the registered handlers to reply to them. // Serve returns when lis.Accept fails with fatal errors. lis will be closed when // this method returns. // Serve will return a non-nil error unless Stop or GracefulStop is called. func (s *Server) Serve(lis net.Listener) error { s.mu.Lock() ...... ls := \u0026listenSocket{Listener: lis} s.lis[ls] = true ...... s.mu.Unlock() defer func() { s.mu.Lock() if s.lis != nil \u0026\u0026 s.lis[ls] { ls.Close() delete(s.lis, ls) } s.mu.Unlock() }() var tempDelay time.Duration // how long to sleep on accept failure for { // 等待连接. rawConn, err := lis.Accept() if err != nil { // 当返回错误时,会尝试重新调用Accept,时间间隔从5毫秒开始,每重试一次时间翻倍,直到1秒. if ne, ok := err.(interface { Temporary() bool }); ok \u0026\u0026 ne.Temporary() { // 计算时间间隔的逻辑. if tempDelay == 0 { tempDelay = 5 * time.Millisecond } else { tempDelay *= 2 } if max := 1 * time.Second; tempDelay \u003e max { tempDelay = max } s.mu.Lock() s.printf(\"Accept error: %v; retrying in %v\", err, tempDelay) s.mu.Unlock() timer := time.NewTimer(tempDelay) select { case \u003c-timer.C: case \u003c-s.quit.Done(): timer.Stop() return nil } continue } s.mu.Lock() s.printf(\"done serving; Accept = %v\", err) s.mu.Unlock() if s.quit.HasFired() { return nil } return err } // Accept正常后,重置时间为0. tempDelay = 0 // Start a new goroutine to deal with rawConn so we don't stall this Accept // loop goroutine. // // Make sure we account for the goroutine so GracefulStop doesn't nil out // s.conns before this conn can be added. // 主要是为了优雅的关闭,在关闭前所有的连接必须被处理完了. s.serveWG.Add(1) go func() { // 启动一个新的goroutine来处理新的连接. s.handleRawConn(rawConn) s.serveWG.Done() }() } } ","date":"2020-11-07","objectID":"/2020/11/07/grpc-server-call/:4:0","tags":["go","grpc"],"title":"gRPC系列之server端调用","uri":"/2020/11/07/grpc-server-call/"},{"categories":["Golang"],"content":"业务处理逻辑 // handleRawConn forks a goroutine to handle a just-accepted connection that // has not had any I/O performed on it yet. func (s *Server) handleRawConn(rawConn net.Conn) { if s.quit.HasFired() { rawConn.Close() return } // 设置超时时间,默认是120秒. rawConn.SetDeadline(time.Now().Add(s.opts.connectionTimeout)) // 处理TLS认证. conn, authInfo, err := s.useTransportAuthenticator(rawConn) if err != nil { // ErrConnDispatched means that the connection was dispatched away from // gRPC; those connections should be left open. if err != credentials.ErrConnDispatched { s.mu.Lock() s.errorf(\"ServerHandshake(%q) failed: %v\", rawConn.RemoteAddr(), err) s.mu.Unlock() channelz.Warningf(logger, s.channelzID, \"grpc: Server.Serve failed to complete security handshake from %q: %v\", rawConn.RemoteAddr(), err) rawConn.Close() } rawConn.SetDeadline(time.Time{}) return } // 开启HTTP/2协议. // Finish handshaking (HTTP2) st := s.newHTTP2Transport(conn, authInfo) if st == nil { return } rawConn.SetDeadline(time.Time{}) if !s.addConn(st) { return } go func() { // 开启新的goroutine来处理业务数据. s.serveStreams(st) s.removeConn(st) }() } func (s *Server) serveStreams(st transport.ServerTransport) { defer st.Close() var wg sync.WaitGroup var roundRobinCounter uint32 // HandleStreams主要是接收数据,并生成Stream对象,在调用下面的匿名函数来处理具体的业务逻辑. st.HandleStreams(func(stream *transport.Stream) { wg.Add(1) // 判断是否有设置numServerWorkers. if s.opts.numServerWorkers \u003e 0 { data := \u0026serverWorkerData{st: st, wg: \u0026wg, stream: stream} select { // 发送数据到指定的channel中. case s.serverWorkerChannels[atomic.AddUint32(\u0026roundRobinCounter, 1)%s.opts.numServerWorkers] \u003c- data: default: // If all stream workers are busy, fallback to the default code path. go func() { // 若所有workerchannel都在忙,则单独创建goroutine. s.handleStream(st, stream, s.traceInfo(st, stream)) wg.Done() }() } } else { go func() { // 没有限制worker大小,则单独创建goroutine. defer wg.Done() s.handleStream(st, stream, s.traceInfo(st, stream)) }() } }, func(ctx context.Context, method string) context.Context { if !EnableTracing { return ctx } tr := trace.New(\"grpc.Recv.\"+methodFamily(method), method) return trace.NewContext(ctx, tr) }) wg.Wait() } // HandleStreams receives incoming streams using the given handler. This is // typically run in a separate goroutine. // traceCtx attaches trace to ctx and returns the new context. func (t *http2Server) HandleStreams(handle func(*Stream), traceCtx func(context.Context, string) context.Context) { defer close(t.readerDone) for { t.controlBuf.throttle() // 读取HTTP/2协议中的frame数据. frame, err := t.framer.fr.ReadFrame() atomic.StoreInt64(\u0026t.lastRead, time.Now().UnixNano()) if err != nil { // 错误相关. ...... t.Close() return } // 针对不同类型的frame的处理,可以和之前的抓包对应起来. switch frame := frame.(type) { case *http2.MetaHeadersFrame: if t.operateHeaders(frame, handle, traceCtx) { t.Close() break } case *http2.DataFrame: t.handleData(frame) case *http2.RSTStreamFrame: t.handleRSTStream(frame) case *http2.SettingsFrame: t.handleSettings(frame) case *http2.PingFrame: t.handlePing(frame) case *http2.WindowUpdateFrame: t.handleWindowUpdate(frame) case *http2.GoAwayFrame: // TODO: Handle GoAway from the client appropriately. default: if logger.V(logLevel) { logger.Errorf(\"transport: http2Server.HandleStreams found unhandled frame type %v.\", frame) } } } } func (s *Server) handleStream(t transport.ServerTransport, stream *transport.Stream, trInfo *traceInfo) { // 解析方法名,如/HelloService/SayHello. sm := stream.Method() if sm != \"\" \u0026\u0026 sm[0] == '/' { sm = sm[1:] } pos := strings.LastIndex(sm, \"/\") if pos == -1 { // 错误处理. ...... return } // service服务名等于HelloService. service := sm[:pos] // 方法名等于SyaHello. method := sm[pos+1:] // 从已注册的service中查找. srv, knownService := s.services[service] if knownService { if md, ok := srv.methods[method]; ok { // 若方法存在,在调用processUnaryRPC.此时md已经指向注册时的Handler了,如SyaHello方法对应的_HelloService_SayHello_Handler. s.processUnaryRPC(t, stream, srv, md, trInfo) return } if sd, ok := srv.streams[method]; ok { // 调用流式处理. s.processStreamingRPC(t, stream, srv, sd, ","date":"2020-11-07","objectID":"/2020/11/07/grpc-server-call/:5:0","tags":["go","grpc"],"title":"gRPC系列之server端调用","uri":"/2020/11/07/grpc-server-call/"},{"categories":["Golang"],"content":"总结 通过ServerOption来设置自定义参数,最主要的包括grpc.Creds(用于设置服务端认证)和grpc.UnaryInterceptor(用于设置服务端拦截器). 在*.pb.go中主要是通过grpc.ServiceDesc来描述rpc接口的信息,最终调用会指向其Handler字段. 在整个处理过程中会涉及到的goroutine. 当Accept接收到一个新连接时就会启用一个goroutine,主要用来处理认证及HTTP/2相关的初始化. 接着会启用一个goroutine用来接收HTTP/2协议的数据. 每接收到一个完整请求包时,会再启用一个goroutine用来处理新的消息包.注意:此处如果设置了numServerWorkers,会优先使用workchannel. ","date":"2020-11-07","objectID":"/2020/11/07/grpc-server-call/:6:0","tags":["go","grpc"],"title":"gRPC系列之server端调用","uri":"/2020/11/07/grpc-server-call/"},{"categories":["Golang"],"content":"RPC RPC指远程过程调用(Remote Procedure Call),让远程服务调用更加简单、透明.服务调用者可以像调用本地接口一样调用远程的服务提供者,而不需要关心底层通信细节和调用过程,RPC框架负责底层的传输方式、序列化方式和通信细节. gRPC是一个高性能、开源和通用的RPC框架,面向服务端和移动端,特点如下: 支持多语言. 基于IDL文件定义服务,通过protoc工具生成指定语言的数据结构、服务端接口和客户端Stub. 通信协议基于HTTP/2设计,支持双向流、消息头压缩、单TCP的多路复用、服务端推送等特性.使得在移动端设备上更加省电和节省网络流量. 序列化支持Protocol Buffer和JSON. gRPC调用gRPC调用示例 \" gRPC调用 ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:1:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"protoc工具 先安装相关工具,可以用来直接生成go和grpc的代码. 安装protoc 安装protoc-gen-go 安装protoc-gen-go-grpc ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:2:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"IDL文件 生成hello.proto文件,内容如下: syntax = \"proto3\"; option go_package=\".;pb\"; message HelloRequest { string name = 1; } message HelloReply { string message = 1; } service HelloService { rpc SayHello (HelloRequest) returns (HelloReply); } syntax,定义proto的版本,支持proto2和proto3,proto3才支持grpc. go_package,定义生成的go文件的包名(package name). message,定义数据结构. service,定义服务,可包含多个rpc函数. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:3:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"生成go语言代码 使用工具protoc来生成对应的go文件,命令protoc -I=./proto --go_out=plugins=grpc:./pb hello.proto. -I=./proto,表示proto文件所在的目录. --go_out,表示生成go语言的代码,且存放go文件的目录,默认是不会生成grpc的代码的,需要显式声明plugins=grpc. hello.proto,表示proto文件的名字. 在目录pb中生成文件hello.pb.go,文件里面包含了grpc相关代码. // Reference imports to suppress errors if they are not otherwise used. var _ context.Context var _ grpc.ClientConnInterface // This is a compile-time assertion to ensure that this generated file // is compatible with the grpc package it is being compiled against. const _ = grpc.SupportPackageIsVersion6 // HelloServiceClient is the client API for HelloService service. // // For semantics around ctx use and closing/ending streaming RPCs, please refer to https://godoc.org/google.golang.org/grpc#ClientConn.NewStream. type HelloServiceClient interface { SayHello(ctx context.Context, in *HelloRequest, opts ...grpc.CallOption) (*HelloReply, error) } type helloServiceClient struct { cc grpc.ClientConnInterface } func NewHelloServiceClient(cc grpc.ClientConnInterface) HelloServiceClient { return \u0026helloServiceClient{cc} } func (c *helloServiceClient) SayHello(ctx context.Context, in *HelloRequest, opts ...grpc.CallOption) (*HelloReply, error) { out := new(HelloReply) err := c.cc.Invoke(ctx, \"/HelloService/SayHello\", in, out, opts...) if err != nil { return nil, err } return out, nil } // HelloServiceServer is the server API for HelloService service. type HelloServiceServer interface { SayHello(context.Context, *HelloRequest) (*HelloReply, error) } // UnimplementedHelloServiceServer can be embedded to have forward compatible implementations. type UnimplementedHelloServiceServer struct { } func (*UnimplementedHelloServiceServer) SayHello(context.Context, *HelloRequest) (*HelloReply, error) { return nil, status.Errorf(codes.Unimplemented, \"method SayHello not implemented\") } func RegisterHelloServiceServer(s *grpc.Server, srv HelloServiceServer) { s.RegisterService(\u0026_HelloService_serviceDesc, srv) } func _HelloService_SayHello_Handler(srv interface{}, ctx context.Context, dec func(interface{}) error, interceptor grpc.UnaryServerInterceptor) (interface{}, error) { in := new(HelloRequest) if err := dec(in); err != nil { return nil, err } if interceptor == nil { return srv.(HelloServiceServer).SayHello(ctx, in) } info := \u0026grpc.UnaryServerInfo{ Server: srv, FullMethod: \"/HelloService/SayHello\", } handler := func(ctx context.Context, req interface{}) (interface{}, error) { return srv.(HelloServiceServer).SayHello(ctx, req.(*HelloRequest)) } return interceptor(ctx, in, info, handler) } var _HelloService_serviceDesc = grpc.ServiceDesc{ ServiceName: \"HelloService\", HandlerType: (*HelloServiceServer)(nil), Methods: []grpc.MethodDesc{ { MethodName: \"SayHello\", Handler: _HelloService_SayHello_Handler, }, }, Streams: []grpc.StreamDesc{}, Metadata: \"hello.proto\", } ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:4:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"服务端代码 // 监听tcp端口,用于接受客户端请求. lis, err := net.Listen(\"tcp\", fmt.Sprintf(\":%d\", *port)) if err != nil { fmt.Printf(\"failed to listen: %v\", err) return } // 创建gRPC服务实例对象. grpcserver := grpc.NewServer() // 把server对象注册到gRPC服务中,server对象实现了HelloServiceServer接口. pb.RegisterHelloServiceServer(grpcserver, \u0026server{}) // 阻塞等待客户端连接,直到进程被终止或Stop函数被调用. err = grpcserver.Serve(lis) if err != nil { fmt.Printf(\"failed to server: %v\", err) } ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:5:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"客户端代码 // 创建一个连接与服务端进行通信. conn, err := grpc.Dial(*addr, grpc.WithInsecure()) if err != nil { fmt.Printf(\"failed to dail: %v\", err) return } // 关闭连接. defer conn.Close() // 创建HelloService的Client stub. client := pb.NewHelloServiceClient(conn) // 调用对应的服务方法. reply, err := client.SayHello(context.Background(), \u0026pb.HelloRequest{Name: \"zhou\"}) if err != nil { fmt.Printf(\"failed to sayhello: %v\", err) return } fmt.Println(reply) ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:6:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"调用分析 ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"wireshark抓包 服务端监听在9090端口,用wireshark来进行抓包. wireshark抓包抓包 \" wireshark抓包 可以看到前三行是tcp三次握手的报文,后续的全部都解析成了tcp协议,gRPC是基于http/2,需要手工修改Protocol为http/2. wireshark菜单栏–\u003e分析(A)–\u003e解码为(Decode As),在弹出的界面新增一行,然后修改\"当前\"列为HTTP2. 设置协议为http2设置http/2 \" 设置协议为http2 现在能正常解析为HTTP/2协议了,一次gRPC调用总览如下: gRPC调用总览HTTP/2协议 \" gRPC调用总览 从上图大体可以看出,gRPC调用过程分为:Magic(C-\u003eS) --\u003e SETTINGS(S-\u003eC) --\u003e SETTINGS(C-\u003eS) --\u003e SETTINGS(S-\u003eC) --\u003e SETTINGS,HEADERS,DATA(C-\u003eS) --\u003e WINDOW_UPDATE,PING(S-\u003eC) --\u003e HEADERS,DATA,HEADERS(S-\u003eC) --\u003e PING,WINDOW_UPDATE,PING(C-\u003eS) --\u003e PING(S-\u003eC) ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:1","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"Magic gRPC-MagicMagic \" gRPC-Magic Magic帧的主要作用是建立HTTP/2请求的前言.在HTTP/2协议中,要求两端都要发送连接前言,来最终确认所使用的协议,并确定HTTP/2连接的初始设置. 而Magic帧是客户端的前言之一,内容为PRI * HTTP/2.0\\r\\n\\r\\nSM\\r\\n\\r\\n,以确定启用HTTP/2连接. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:2","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"SETTINGS gRPC-SETTINGSSETTINGS1 \" gRPC-SETTINGS 由服务端发送给客户端,主要设置Max Frame Size为16384字节,作用域是整个连接而非单一的流.也是服务端的连接前言. gRPC-SETTINGSSETTINGS2 \" gRPC-SETTINGS 由客户端发送给服务端,是客户端的连接前言(和Magic一起). gRPC-SETTINGSSETTINGS3 \" gRPC-SETTINGS 由服务端发送给客户端,在发送完前言后,客户端和服务端还需要有一步互相确认的动作,对应的就是带有ACK: True的帧. gRPC-SETTINGSSETTINGS4 \" gRPC-SETTINGS 由客户端发送给服务端,是带有ACK: True的帧. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:3","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"HEADERS gRPC-HEADERSHEADERS1 \" gRPC-HEADERS 主要是存储和传播HTTP的表头信息. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:4","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"DATA gRPC-DATADATA1 \" gRPC-DATA DATA是数据帧,可以看到请求的protobuf结构只有1个字段,该字段的值为zhou(可以参见客户端代码). ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:5","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"WINDOW_UPDATE gRPC-WINDOW_UPDATEWIN \" gRPC-WINDOW_UPDATE 主要是管理流控制窗口的大小. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:6","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"HEADERS,DATA,HEADERS gRPC-DATARSP1 \" gRPC-DATA gRPC-DATARSP2 \" gRPC-DATA 服务端发送给客户端的响应,HEADERS frame记录的是HTTP响应状态(200 OK)和响应的内容格式(application/grpc). 响应的protobuf结构也是只有1个字段,字段的值是zhou. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:7","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"PING 主要是判断当前连接是否仍然可用,也常用于计算往返时间. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:7:8","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"流式模式 gRPC支持UnaryRPC(一元PRC)和StreamRPC(流式RPC). UnaryRPC,上面介绍的都是基于UnaryRPC的,该模式是一个请求对应一个响应. StreamRPC流式模式的请求和响应是多对多的,又分为三种类型: Server-side streaming RPC,服务端流式模式,即一个请求对应多个响应. Client-side streaming RPC,客户端流式模式,即多个请求对应一个响应. Bidirectional streaming RPC,双向流式模式,即多个请求对应多个响应. 流式模式需要用到关键字stream,如下proto文件. syntax = \"proto3\"; option go_package=\".;pb\"; message HelloRequest { string name = 1; } message HelloReply { string message = 1; } service HelloService { rpc SayHello (HelloRequest) returns (HelloReply); rpc ServerSayHello (HelloRequest) returns (stream HelloReply); rpc ClientSayHello (stream HelloRequest) returns (HelloReply); rpc BidirSayHello (stream HelloRequest) returns (stream HelloReply); } SayHello是一元RPC模式. ServerSayHello是服务端流式模式. ClientSayHello是客户端流式模式. BidirSayHello是双向流式模式. ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:8:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"参考资料 抓包gRPC的细节与分析 Hypertext Transfer Protocol Version 2 从实践到原理，带你参透 gRPC ","date":"2020-11-06","objectID":"/2020/11/06/grpc-first-introduction/:9:0","tags":["go","grpc"],"title":"gRPC系列之初识","uri":"/2020/11/06/grpc-first-introduction/"},{"categories":["Golang"],"content":"性能分析 Go语言项目中的性能分析主要有以下几个方面: CPU profile: CPU使用情况,按照一定频率去采集应用程序在CPU和寄存器上面的数据. Memory profile(Heap profile): 报告程序的内存使用情况. Block Profiling: 报告goroutines不在运行状态的情况,可用来分析和查找死锁等性能瓶颈. Goroutine Profiling: 报告goroutines的使用情况,有哪些goroutines,调用关系是怎么样的? ","date":"2020-11-03","objectID":"/2020/11/03/golang-pprof/:1:0","tags":["go"],"title":"go性能分析","uri":"/2020/11/03/golang-pprof/"},{"categories":["Golang"],"content":"数据采集 Go语言内置了获取程序运行数据的工具,包括两个标准库: runtime/pprof: 采集工具型应用的运行数据进行分析 import \"runtime/pprof\" // 开启CPU性能分析. pprof.StartCPUProfile(w io.Writer) // 关闭CPU性能分析. pprof.StopCPUProfile() // 记录程序堆栈信息. pprof.WriteHeapProfile(w io.Writer) pprof开启后,每隔一段时间(10ms)就会收集下当前的堆栈信息,获取各个函数占用的CPU以及内存资源,最后通过采样数据分析,形成性能分析报告. net/http/pprof: 采集服务型应用的运行数据进行分析 // 在web server端导入pprof库. import _ \"net/http/pprof\" // 如果使用自定义Mux,需要手动注册路由规则. r.HandleFunc(\"/debug/pprof\", pprof.Index) r.HandleFunc(\"/debug/pprof/cmdline\", pprof.Cmdline) r.HandleFunc(\"/debug/pprof/profile\", pprof.Profile) r.HandleFunc(\"/debug/pprof/symbol\", pprof.Symbol) r.HandleFunc(\"/debug/pprof/trace\", pprof.Trace) // 如果使用gin框架,推荐使用'github.com/DeanThompson/ginpprof' http服务会多出/debug/pprof的endpoint: /debug/pprof/profile: 访问这个链接会自动进行CPU Profiling,持续30s,并生成文件供下载. /debug/pprof/heap: Memory Profiling. /debug/pprof/block: Block Profiling. /debug/pprof/goroutines: 运行的goroutines列表以及调用关系. profiling数据是动态的,要想获得有效的数据,请保证应用处于较大的负载,否则如果处于空闲状态,得到的结果可能没有任何意义 ","date":"2020-11-03","objectID":"/2020/11/03/golang-pprof/:2:0","tags":["go"],"title":"go性能分析","uri":"/2020/11/03/golang-pprof/"},{"categories":["Golang"],"content":"数据分析 ","date":"2020-11-03","objectID":"/2020/11/03/golang-pprof/:3:0","tags":["go"],"title":"go性能分析","uri":"/2020/11/03/golang-pprof/"},{"categories":["Golang"],"content":"go tool pprof命令 可以通过命令go tool pprof --help查看命令的具体使用方法. $ go tool pprof --help usage: Produce output in the specified format. pprof \u003cformat\u003e [options] [binary] \u003csource\u003e ... Omit the format to get an interactive shell whose commands can be used to generate various views of a profile pprof [options] [binary] \u003csource\u003e ... Omit the format and provide the \"-http\" flag to get an interactive web interface at the specified host:port that can be used to navigate through various views of a profile. pprof -http [host]:[port] [options] [binary] \u003csource\u003e ... Details: Output formats (select at most one): ","date":"2020-11-03","objectID":"/2020/11/03/golang-pprof/:3:1","tags":["go"],"title":"go性能分析","uri":"/2020/11/03/golang-pprof/"},{"categories":["Golang"],"content":"图形化 安装graphviz,windows是还需要把安装目录下的bin文件夹添加到PATH环境变量中. 使用dot -version命令查看graphviz安装是否成功. 安装go-torch,使用go get -v github.com/uber/go-torch命令安装. 当go-torch不带任何参数时,会默认从http://localhost:8080/debug/pprof/profile获取profiling数据. $ go-torch --help Usage: go-torch [options] [binary] \u003cprofile source\u003e pprof Options: -u, --url= Base URL of your Go program (default: http://localhost:8080) --suffix= URL path of pprof profile (default: /debug/pprof/profile) -b, --binaryinput= File path of previously saved binary profile. (binary profile is anything accepted by https://golang.org/cmd/pprof) --binaryname= File path of the binary that the binaryinput is for, used for pprof inputs -t, --seconds= Number of seconds to profile for (default: 30) --pprofArgs= Extra arguments for pprof 安装perl,FlameGraph需要perl支持. 安装FlameGraph,使用git clone https://github.com/brendangregg/FlameGraph.git命令安装. windows平台下,需要把go-torch/render/flamegraph.go文件中的GenerateFlameGraph按如下方式修改,然后在go-torch目录下执行go install命令. // GenerateFlameGraph runs the flamegraph script to generate a flame graph SVG. func GenerateFlameGraph(graphInput []byte, args ...string) ([]byte, error) { flameGraph := findInPath(flameGraphScripts) if flameGraph == \"\" { return nil, errNoPerlScript } if runtime.GOOS == \"windows\" { return runScript(\"perl\", append([]string{flameGraph}, args...), graphInput) } return runScript(flameGraph, args, graphInput) } 安装go-wrk,使用go get -v https://github.com/adjust/go-wrk命令安装. $ go-wrk --help Usage of go-wrk: -CA string A PEM eoncoded CA's certificate file. (default \"someCertCAFile\") -H string the http headers sent separated by '\\n' (default \"User-Agent: go-wrk 0.1 benchmark\\nContent-Type: text/html;\") -b string the http request body -c int the max numbers of connections used (default 100) -cert string A PEM eoncoded certificate file. (default \"someCertFile\") -d string dist mode -f string json config file -i TLS checks are disabled -k if keep-alives are disabled (default true) -key string A PEM encoded private key file. (default \"someKeyFile\") -m string the http request method (default \"GET\") -n int the total number of calls processed (default 1000) -p string the http request body data file -r in the case of having stream or file in the response, it reads all response body to calculate the response size -s string if specified, it counts how often the searched string s is contained in the responses -t int the numbers of threads used (default 1) 使用方式 使用go-wrk压测,使用命令go-wrk -n 50000 http://127.0.0.1:8080/*/*在某个接口进行压测 使用go-torch收集数据,使用命令go-torch -u http://127.0.0.1:8080 -t 30,30秒之后终端会出现如下提示: Writing svg to torch.svg,然后使用浏览器打开torch.svg,就能看到火焰图. perf Linux下使用命令perf record -a -g -p pid -- sleep 30,对指定进程采样30秒. 使用命令perf script -i ../perf.data | ./stackcollapse-perf.pl --all | ./flamegraph.pl \u003e app.svg,其中../perf.data为perf record生成的采样数据,然后切换到FlameGraph的目录来执行上述命令,就能得到火焰图了(stackcollapse-perf.pl脚本是合并调用栈信息,flamegraph.pl脚本是生成火焰图). ","date":"2020-11-03","objectID":"/2020/11/03/golang-pprof/:3:2","tags":["go"],"title":"go性能分析","uri":"/2020/11/03/golang-pprof/"},{"categories":["Golang"],"content":"参考资料 Go pprof性能调优. ","date":"2020-11-03","objectID":"/2020/11/03/golang-pprof/:4:0","tags":["go"],"title":"go性能分析","uri":"/2020/11/03/golang-pprof/"},{"categories":["Linux"],"content":"tcp建连接三次握手 客户端发送SYN到服务器发起握手. 服务器收到SYN后回复SYN+ACK给客户端. 客户端收到SYN+ACK后,回复服务器一个ACK表示收到了,此时客户端的端口状态已经是established. tcp握手的详细过程,图片来源 tcp连接的流程tcp建立连接的流程和队列 \" tcp连接的流程 客户端作为主动发起连接方,首先要发送SYN(Synchronize Sequence Nubers,同步序列号)包,若客户端长时间收不到服务端的ACK报文,客户端就会重发SYN包,重传次数是受内核参数/proc/sys/net/ipv4/tcp_syn_retries控制. # 系统为centos7.2.1511,默认为6次. $ cat /proc/sys/net/ipv4/tcp_syn_retries 6 通常第一次超时重传为1秒,第二次超时重传为2秒,第三次超时重传为4秒,第四次为8秒,第五次为16秒,第五次超时重传之后还会再等待32秒,如果服务端仍然没有回应ACK,客户端就会终止三次握手.总耗时为63秒. ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:1:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["Linux"],"content":"半连接队列 syns queue就是半连接队列,server收到client的syn后会把连接信息放入该队列. 半连接队列的大小为max(64, /proc/sys/net/ipv4/tcp_max_syn_backlog). # 在ubuntu18.04机器上为128. $ cat /proc/sys/net/ipv4/tcp_max_syn_backlog 128 syn floods攻击就是针对半连接队列的,攻击方不停的建立连接,收到server的syn+ack就丢弃什么也不做,导致server的半连接队列满而其它正常连接无法进来. ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:2:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["Linux"],"content":"全连接队列 accept queue就是全连接队列,server再收到client的ack后会把连接信息放入该队列. 全连接队列的大小为min(backlog, /proc/sys/net/core/somaxconn). backlog是指listen(int sockfd, int backlog)函数中的backlog大小. # 在ubuntu18.04机器上为128. $ cat /proc/sys/net/core/somaxconn 128 ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:3:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["Linux"],"content":"如何观察队列溢出 netstat -s # 查看半连接队列溢出. sky-HP# netstat -s | egrep \"SYNs to LISTEN\" 667399 SYNs to LISTEN sockets ignored # 上面看到的667399就是半连接队列溢出次数,隔几秒执行下,如果这个数字一直在变大肯定就是半连接队列溢出了. # 查看全连接队列溢出. sky-HP# netstat -s | grep \"overflowed\" 667399 times the listen queue of a socket overflowed ss -lntp # l表示处于LISTEN状态 n表示不反向解析 t表示tcp协议 p表示进程信息. sky-HP# ss -lntp State Recv-Q Send-Q Local Address:Port Peer Address:Port LISTEN 0 128 0.0.0.0:8388 0.0.0.0:* users:((\"haproxy\",pid=1465,fd=6)) LISTEN 0 128 127.0.0.53%lo:53 0.0.0.0:* users:((\"systemd-resolve\",pid=841,fd=13)) LISTEN 0 128 0.0.0.0:22 0.0.0.0:* users:((\"sshd\",pid=1476,fd=3)) LISTEN 0 5 127.0.0.1:631 0.0.0.0:* users:((\"cupsd\",pid=5112,fd=7)) LISTEN 0 128 127.0.0.1:1080 0.0.0.0:* users:((\"trojan\",pid=1940,fd=6)) LISTEN 0 128 127.0.0.1:8123 0.0.0.0:* users:((\"polipo\",pid=1461,fd=4)) Send-Q就是表示全连接队列的允许最大长度,Recv-Q表示当前全连接队列的长度.这是套接字处于LISTEN状态时. 需要注意,当套接字处于Established状态时Recv-Q表示套接字缓冲区还没有被应用取走的字节数(接收队列长度),Send-Q表示还没有被远端主机确认的字节数(发送队列长度). ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:4:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["Linux"],"content":"溢出行为的控制 半连接队列控制 当半连接队列满时,只能丢弃连接? 并不是这样的,Linux提供了syncookies功能,可以在不适用半连接队列的情况下成功建立连接. syncookies原理:服务器会根据当前的状态计算出一个值,放入己方的SYN+ACK报文中发送给客户端,当客户端返回ACK报文时,取出该值验证,如果合法,就认为连接建立成功. syncookies功能由内核参数/proc/sys/net/ipv4/tcp_syncookies来控制. 值为0时,表示关闭该功能. 值为1时,表示仅当半连接队列满时,再启用该功能.1为默认值,默认开启. 值为2时,表示无条件开启该功能. 全连接队列控制 内核参数/proc/sys/net/ipv4/tcp_abort_on_overflow决定当溢出后系统如何处理. 为0时表示server扔掉client发过来的ack.server会认为连接还未建立.server过段时间会继续向client发送syn+ack.重传会经历1、2、4、8、16、32秒(若重传为5次),如果服务端仍没有收到ack,才会关闭连接,总共需63秒. 内核参数/proc/sys/net/ipv4/tcp_synack_retries控制重试次数.如果client超时时间比较短,client就容易异常. 为1时表示server发送一个reset包给client,表示废掉这个握手过程和连接.client会看到connection reset by peer的错误. # 在ubuntu18.04机器上的默认值. sky-HP# cat /proc/sys/net/ipv4/tcp_abort_on_overflow 0 sky-HP# cat /proc/sys/net/ipv4/tcp_synack_retries 5 可以通过sysctl -w来修改这些内核参数,重启之后修改无效. sky-HP# sysctl -w net.ipv4.tcp_synack_retries=2 net.ipv4.tcp_synack_retries = 2 sky-HP# cat /proc/sys/net/ipv4/tcp_synack_retries 2 sky-HP# sysctl -w net.ipv4.tcp_abort_on_overflow=1 net.ipv4.tcp_abort_on_overflow = 1 sky-HP# cat /proc/sys/net/ipv4/tcp_abort_on_overflow 1 修改配置文件/etc/sysctl.conf,然后再sysctl -p来触发,重启之后仍生效. ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:5:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["Linux"],"content":"绕过三次握手 三次握手建立连接的后果就是,数据请求必须在一个RTT(从客户端到服务器一个往返时间)后才能发送. 在Linux3.7内核版本之后,提供了TCP Fast Open功能,可以减少TCP连接建立的时延. Linux TCP Fast OpenFast Open \" Linux TCP Fast Open 客户端首次建立连接时仍然需要三次握手. 客户端发送SYN报文,报文包含Fast Open选项,且该选项的Cookie为空,表名客户端请求Fast Open Cookie. 支持TCP Fast Open的服务器生成Cookie,并置于SYN-ACK数据包中的Fast Open选项发回给客户端. 客户端收到SYN-ACK后,本地缓存Fast Open选项中的Cookie. 当客户端再次与服务器建立连接时就可以利用Cookie来绕过三次握手过程. 客户端发送SYN报文,该报文包含之前缓存的Cookie及业务数据报文. 支持TCP Fast Open的服务器会对收到的Cookie进行校验. 如果合法,服务器将在SYN-ACK报文中对SYN和业务数据进行确认,服务器随后把业务数据传递给应用程序; 如果不合法,服务器将丢弃SYN报文中包含的业务数据,且在SYN-ACK报文中只确认SYN的序列号. 若服务器接受了SYN报文中的业务数据,即在握手完成之前发送了数据,这就减少了握手带来的一个RTT的时间消耗. 客户端将发送ACK确认服务器发回的SYN及数据.若客户端在初始的SYN报文中的数据未被确认,则客户端会重新发送这些数据. 此后的TCP连接的数据传输过程和非TCP Fast Open的正常情况是一致的. TCP Fast Open功能受内核参数/proc/sys/net/ipv4/tcp_fastopen控制. 值为0时,表示关闭该功能. 值为1时,作为客户端使用Fast Open功能. 值为2时,作为服务端使用Fast Open功能. 值为3时,作为客户端和服务端都可以使用Fast Open功能. ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:6:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["Linux"],"content":"参考文章 TCP 三次握手原理，你真的理解吗？ 关于 Linux 网络，你必须知道这些 看完这篇，再不懂TCP我也没办法了 TCP SOCKET中backlog参数的用途是什么？ ","date":"2020-11-03","objectID":"/2020/11/03/tcp-three-way-handshake/:7:0","tags":["linux","tcp"],"title":"tcp连接过程","uri":"/2020/11/03/tcp-three-way-handshake/"},{"categories":["C++"],"content":"问题来源 源于如下代码: int main(int argc, char* argv[]){ int i = 0; int arr[3] = {0}; for(; i\u003c=3; i++){ arr[i] = 0; printf(\"hello world\\n\"); } return 0; } 这段代码的运行结果并非是打印三行“hello word”，而是会无限打印“hello word”，这是为什么？ 当看到这段后，脑海中最直观的感受： 函数栈上的地址是从大到小的，局部变量i先入栈，数组arr后入栈。假设arr的地址为0x0，则arr[0]的地址为0x0，arr[1]的地址为0x4，arr[2]的地址为0x8，i的地址为0xc。所以在循环中访问到arr[3]时的地址正好对应到变量i上，把i的值修改为0，导致出现死循环。 需要进行验证下，首先在vs2013上测试，发现和想象的完全不一样。 只打印了四行“hello word”，然后程序崩溃，显示“Run-Time Check Failure #2 - Stack around the variable ‘arr’ was corrupted”。 然后修改下代码把变量i和arr的地址都打印出来，可以看到变量i的地址比arr的地址小，这是怎么回事？怎么参数入栈的顺序和变量声明的顺序不一样了？ 接着又在x86-64位centos6上测试，采用“gcc demo.c -o demo”编译，运行发现和想象的是一样的，无限打印“hello world”。 问题： 1.参数入栈的顺序和变量声明的顺序怎么不一致？ 2.在linux下表现怎么和windows不一样？ ","date":"2020-11-03","objectID":"/2020/11/03/c-variables-function-stack/:1:0","tags":["c++"],"title":"局部变量在函数栈上的顺序分析","uri":"/2020/11/03/c-variables-function-stack/"},{"categories":["C++"],"content":"问题分析 以前从来没关注过局部变量的声明顺序和入栈顺序之前的关系，首先来理理为什么在windows下不一致，而在linux下是一致的。 在linux下使用的编译器是GCC，主要参考的是GCC 中的编译器堆栈保护技术这篇文章，文中提到了GCC中三个与堆栈保护有关的选项。 -fstack-protector，启用堆栈保护，不过只为局部变量中含有 char 数组的函数插入保护代码 -fstack-protector-all，启用堆栈保护，为所有函数插入保护代码 -fno-stack-protector，禁用堆栈保护 在linux下重新编译下，gcc demo.c -fstack-protector -o demo，发现运行时只打印了四次“hello world”，且变量i的地址也是比arr的地址小，linux下的现象已和windows一样了。 编译时如果开启了优化选项(O2或O3)，会默认启用堆栈保护。 当启用堆栈保护后，局部变量的顺序被重新组织了。这样做的目的主要是为了防止溢出攻击，具体可以详读上面提到的那篇文章。 局部变量的顺序的重组的规则如下，主要适用于GCC编译器，VS的处理不一样(主要参考C语言局部变量在内存栈中的顺序这篇文章,但结论有所修正)： 内存由高到低优先分配给占位8字节、4字节、2字节、1字节的数据类型 同总占位的类型按定义变量的先后顺序内存地址会增加 在规则2前提下，定义数组不会和同总数据类型混占内存 数据类型占位说明(64位机器下)： 8字节：double、long long int、long int(该类型在64位windows下位4字节，在linux x86/ppc下都是8字节) 4字节：int、float、unsigned int 2字节：short 、unsigned short 1字节：char 、unsigned char 例如,分别定义下列变量，内存地址中由高到低分别为： double \u003c int \u003c short \u003c char 参考如下代码： int main(int argc, const char *argv[]) { char char_a; short short_a; int int_a; float float_a; double double_a; unsigned int uint_a; long int lint_a; long long int dlint_a; printf(\" \u0026char_a : %p\\n\",\u0026char_a); printf(\" \u0026short_a : %p\\n\",\u0026short_a); printf(\" \u0026int_a : %p\\n\",\u0026int_a); printf(\" \u0026float_a : %p\\n\",\u0026float_a); printf(\"\u0026double_a : %p\\n\",\u0026double_a); printf(\" \u0026unsigned_int_a : %p\\n\",\u0026uint_a); printf(\" \u0026long_int_a : %p\\n\",\u0026lint_a); printf(\"\u0026long_long_int_a : %p\\n\",\u0026dlint_a); int i = 0; printf(\"address-index: %p\\n\", \u0026i); int arr[3] = {0}; for (; i \u003c 3; i++) { arr[i] = 0; printf(\"address-arr[%d]: %p\\n\", i, \u0026(arr[i])); } return 0; } 启用堆栈保护选项，结果如下： \u0026char_a : 0x7fff14d57fa5 --在最低的低地址位 \u0026short_a : 0x7fff14d57fa6 \u0026int_a : 0x7fff14d57fa8 --int_a、float_a、uint_a、i 4个同大小的变量地址在一起(与声明顺序相反) \u0026float_a : 0x7fff14d57fac \u0026double_a : 0x7fff14d57fb8 --double_a、lint_a、llint_a 3个同大小的变量地址在一起(与声明顺序相反) \u0026uint_a : 0x7fff14d57fb0 \u0026lint_a : 0x7fff14d57fc0 \u0026llint_a : 0x7fff14d57fc8 address-i: 0x7fff14d57fb4 address-arr[0]: 0x7fff14d57fd0 --数组地址在高位上 address-arr[1]: 0x7fff14d57fd4 address-arr[2]: 0x7fff14d57fd8 从以上结果可以验证上面的规则。针对规则三，对于int类型的数组地址在高位，非数组变量在低位。double类型的数组也是类似的(注意上面的提到的文章是相反的结论) ","date":"2020-11-03","objectID":"/2020/11/03/c-variables-function-stack/:2:0","tags":["c++"],"title":"局部变量在函数栈上的顺序分析","uri":"/2020/11/03/c-variables-function-stack/"},{"categories":["C++"],"content":"问题总结 堆栈保护技术，主要是编译器为了防止溢出攻击而发展出来的技术，GCC有相应的编译选项可以开启和关闭。 局部变量的入栈规则，不启用堆栈保护技术时，按照声明的顺序入栈。启用堆栈保护技术时，是按照三个规则来进行重组的。 GCC默认是没有开启堆栈保护选项的，但如果启用了优化选项，堆栈保护选项会自动启用。 ","date":"2020-11-03","objectID":"/2020/11/03/c-variables-function-stack/:3:0","tags":["c++"],"title":"局部变量在函数栈上的顺序分析","uri":"/2020/11/03/c-variables-function-stack/"},{"categories":["C++"],"content":"问题扩展 如果数组arr的长度是4，如下代码，在64位x86上还能无限打印“hello world”吗？(gcc demo.c -o demo编译) int main(int argc, char* argv[]){ int i = 0; int arr[4] = {0}; for(; i\u003c=4; i++){ arr[i] = 0; printf(\"hello world\\n\"); } return 0; } 是不会无限打印的，这涉及到8字节对齐的问题。 数组arr[4]刚好满足8字节对齐，在栈中i和arr是不会连续存放的(暂不清楚缘由)，所以越界是不会访问到i的。 数组arr[3]是不满足8字节对齐，把变量i放到一起刚好满足，编译器就会把i和arr存放到一起。 如果数组arr的长度是5或7列，结果又是如何的？ ","date":"2020-11-03","objectID":"/2020/11/03/c-variables-function-stack/:4:0","tags":["c++"],"title":"局部变量在函数栈上的顺序分析","uri":"/2020/11/03/c-variables-function-stack/"},{"categories":["Golang"],"content":"起因 公司有个公共的加解密库,供所有后端C++服务调用的,但最近要使用Go来实现个服务需要用到加解密,而Go并没有提供AES-256-ECB的加解密库,所以决定用cgo来调用这个公共的加解密库. ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:1:0","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"Window 在window下加解密库是提供的DLL,用Go的syscall.NewLazyDLL可以非常方便的加载DLL,window下基本没有遇到障碍. package main import ( \"fmt\" \"syscall\" \"unsafe\" ) // MICrypt 接口. type MICrypt struct { MIFreeSafeHandle *syscall.LazyProc MIGetDecryptDataLen *syscall.LazyProc MIGetEncryptDataLen *syscall.LazyProc MIGetSafeHandle *syscall.LazyProc MILoad *syscall.LazyProc MITransDecrypt *syscall.LazyProc MITransEncrypt *syscall.LazyProc } func main() { lib := syscall.NewLazyDLL(\"crypto64.dll\") c := \u0026MICrypt{ MIGetSafeHandle: lib.NewProc(\"MIGetSafeHandle\"), MIFreeSafeHandle: lib.NewProc(\"MIFreeSafeHandle\"), MIGetDecryptDataLen: lib.NewProc(\"MIGetDecryptDataLen\"), MIGetEncryptDataLen: lib.NewProc(\"MIGetEncryptDataLen\"), MILoad: lib.NewProc(\"MILoad\"), MITransDecrypt: lib.NewProc(\"MITransDecrypt\"), MITransEncrypt: lib.NewProc(\"MITransEncrypt\"), } // 待加密字符串. msg := []byte(\"I am test trans crypto!\") h, _, _ := c.MIGetSafeHandle.Call() c.MILoad.Call(uintptr(0), 0, h) var iLen int32 var srcLen int = len(msg) c.MIGetEncryptDataLen.Call(uintptr(unsafe.Pointer(\u0026iLen)), uintptr(unsafe.Pointer(\u0026msg[0])), uintptr(srcLen), h) buf := make([]byte, iLen) // 加密. c.MITransEncrypt.Call(uintptr(unsafe.Pointer(\u0026buf[0])), uintptr(iLen), uintptr(unsafe.Pointer(\u0026msg[0])), uintptr(srcLen), h) fmt.Println(buf) dstLen := len(buf) var iDLen int32 c.MIGetDecryptDataLen.Call(uintptr(unsafe.Pointer(\u0026iDLen)), uintptr(unsafe.Pointer(\u0026buf[0])), uintptr(dstLen), h) // 再解密. newSrc := make([]byte, iDLen) c.MITransDecrypt.Call(uintptr(unsafe.Pointer(\u0026newSrc[0])), uintptr(iDLen), uintptr(unsafe.Pointer(\u0026buf[0])), uintptr(dstLen), h) fmt.Println(string(newSrc)) c.MIFreeSafeHandle.Call(h) } ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:2:0","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"Linux 在Linux下加解密库提供的是静态库,调用方式完全不同于windows,碰到很多问题. ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:3:0","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"不支持C++中的引用\u0026 # aesecb ./aesecb.go:27:2: could not determine kind of name for C.MIGetDecryptDataLen ./aesecb.go:20:2: could not determine kind of name for C.MIGetEncryptDataLen cgo: gcc errors for preamble: In file included from ./aesecb.go:6:0: /home/sky/code/imp/2nd/crypto/include/ISafeInterface.h:111:50: error: expected ';', ',' or ')' before '\u0026' token _DLL_EXP_API int32_t MIGetEncryptDataLen(int32_t \u0026iRevLen, const char *pData, int32_t iLen, intptr_t pSafeHandle); 解决方案 把引用修改为指针 _DLL_EXP_API int32_t MIGetEncryptDataLen(int32_t* iRevLen, const char *pData, int32_t iLen, intptr_t pSafeHandle); ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:3:1","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"undefined reference $ go build -x WORK=/tmp/go-build163613860 mkdir -p $WORK/b001/ cd /home/sky/go/path/src/aesecb CGO_LDFLAGS='\"-g\" \"-O2\" \"-L/home/sky/code/imp/2nd/cryptogo/lib/Linux_x86_64\" \"-lmism\" \"-lstdc++\"' /home/sky/go/go1.14/pkg/tool/linux_amd64/cgo -objdir $WORK/b001/ -importpath aesecb -- -I/h ome/sky/code/imp/2nd/cryptogo/include -I $WORK/b001/ -g -O2 ./aesecb.gocd $WORK gcc -fno-caret-diagnostics -c -x c - -o /dev/null || true gcc -Qunused-arguments -c -x c - -o /dev/null || true gcc -fdebug-prefix-map=a=b -c -x c - -o /dev/null || true gcc -gno-record-gcc-switches -c -x c - -o /dev/null || true cd $WORK/b001 TERM='dumb' gcc -I /home/sky/go/path/src/aesecb -fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=$WORK/b001=/tmp/go-build -gno-record-gcc-switches -I/home/sky/code/imp/2nd/cryptogo /include -I ./ -g -O2 -o ./_x001.o -c _cgo_export.cTERM='dumb' gcc -I /home/sky/go/path/src/aesecb -fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=$WORK/b001=/tmp/go-build -gno-record-gcc-switches -I/home/sky/code/imp/2nd/cryptogo /include -I ./ -g -O2 -o ./_x002.o -c aesecb.cgo2.cTERM='dumb' gcc -I /home/sky/go/path/src/aesecb -fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=$WORK/b001=/tmp/go-build -gno-record-gcc-switches -I/home/sky/code/imp/2nd/cryptogo /include -I ./ -g -O2 -o ./_cgo_main.o -c _cgo_main.ccd /home/sky/go/path/src/aesecb TERM='dumb' gcc -I . -fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=$WORK/b001=/tmp/go-build -gno-record-gcc-switches -o $WORK/b001/_cgo_.o $WORK/b001/_cgo_main.o $WORK/b001/_x00 1.o $WORK/b001/_x002.o -g -O2 -L/home/sky/code/imp/2nd/cryptogo/lib/Linux_x86_64 -lmism -lstdc++# aesecb /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MIGetDecryptDataLen': /tmp/go-build/cgo-gcc-prolog:69: undefined reference to `MIGetDecryptDataLen' /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MIGetEncryptDataLen': /tmp/go-build/cgo-gcc-prolog:92: undefined reference to `MIGetEncryptDataLen' /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MIGetSafeHandle': /tmp/go-build/cgo-gcc-prolog:109: undefined reference to `MIGetSafeHandle' /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MILoad': /tmp/go-build/cgo-gcc-prolog:131: undefined reference to `MILoad' /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MITransDecrypt': /tmp/go-build/cgo-gcc-prolog:156: undefined reference to `MITransDecrypt' /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MITransEncrypt': /tmp/go-build/cgo-gcc-prolog:181: undefined reference to `MITransEncrypt' /tmp/go-build163613860/b001/_x002.o: In function `_cgo_2c40edecf6fd_Cfunc_MIFreeSafeHandle': /tmp/go-build/cgo-gcc-prolog:49: undefined reference to `MIFreeSafeHandle' collect2: error: ld returned 1 exit status 由于在编译加解密库mism时,是使用的g++编译器,编译出来的函数名会加上修饰符,导致cgo找不到对应的函数 # 查看符号表 $ nm libmism.a ... 0000000000000490 T _Z12MIMD5DecryptPcRiPKci 000000000000040f T _Z12MIMD5EncryptPcRiPKci 000000000000031f T _Z14MITransDecryptPciPKcil 00000000000002c9 T _Z14MITransEncryptPciPKcil ... 可以看到函数名已经发生变化了. 解决方案 修改代码以C方式编译 #ifdef __cplusplus extern \"C\" { #endif ...... _DLL_EXP_API intptr_t MIGetSafeHandle(); ...... #ifdef __cplusplus } #endif 再次编译后,查看符号表,可以看到函数名没有发生变化. $ nm libmism.a ...... 0000000000000040 T MIFreeSafeHandle 00000000000001d0 T MIGetDecryptDataLen 0000000000000190 T MIGetEncryptDataLen 0000000000000000 T MIGetSafeHandle 0000000000000070 T MILoad 0000000000000150 T MITransDecrypt 0000000000000110 T MITransEncrypt ...... ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:3:2","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"库依赖 继续编译,又报了undefined reference $ go build aesecb.go # command-line-arguments ./libmism.a(AESEncryptHandle.cpp.o): In function `AESEncryptHandle::Encrypt(char*, int\u0026, char const*, int const\u0026)': AESEncryptHandle.cpp:(.text+0x135): undefined reference to `EVP_aes_256_ecb' AESEncryptHandle.cpp:(.text+0x145): undefined reference to `EVP_EncryptInit' AESEncryptHandle.cpp:(.text+0x15c): undefined reference to `EVP_EncryptUpdate' AESEncryptHandle.cpp:(.text+0x171): undefined reference to `EVP_EncryptFinal' AESEncryptHandle.cpp:(.text+0x179): undefined reference to `EVP_CIPHER_CTX_cleanup' ./libmism.a(AESEncryptHandle.cpp.o): In function `AESEncryptHandle::Decrypt(char*, int\u0026, char const*, int const\u0026)': AESEncryptHandle.cpp:(.text+0x215): undefined reference to `EVP_aes_256_ecb' AESEncryptHandle.cpp:(.text+0x225): undefined reference to `EVP_DecryptInit' AESEncryptHandle.cpp:(.text+0x23c): undefined reference to `EVP_DecryptUpdate' AESEncryptHandle.cpp:(.text+0x251): undefined reference to `EVP_DecryptFinal' AESEncryptHandle.cpp:(.text+0x259): undefined reference to `EVP_CIPHER_CTX_cleanup' collect2: error: ld returned 1 exit status 这是加解密库是调用的openssl来实现的,而在go代码里只显示链接了加解密库 解决方案 修改go代码,还要额外链接openssl的库. #cgo LDFLAGS: -L../2nd/crypto/lib/Linux_x86_64 -L../3rd/openssl-OpenSSL_1_0_2-stable/lib/Linux_x86_64 -lmism -lssl -lcrypto -lstdc++ 继续编译,又报了undefined reference $ go build aesecb.go # command-line-arguments ./libcrypto.a(dso_dlfcn.o): In function `dlfcn_globallookup': dso_dlfcn.c:(.text+0x11): undefined reference to `dlopen' dso_dlfcn.c:(.text+0x24): undefined reference to `dlsym' dso_dlfcn.c:(.text+0x2f): undefined reference to `dlclose' ./libcrypto.a(dso_dlfcn.o): In function `dlfcn_bind_func': dso_dlfcn.c:(.text+0x354): undefined reference to `dlsym' dso_dlfcn.c:(.text+0x412): undefined reference to `dlerror' ./libcrypto.a(dso_dlfcn.o): In function `dlfcn_bind_var': dso_dlfcn.c:(.text+0x484): undefined reference to `dlsym' dso_dlfcn.c:(.text+0x542): undefined reference to `dlerror' ./libcrypto.a(dso_dlfcn.o): In function `dlfcn_load': dso_dlfcn.c:(.text+0x5a9): undefined reference to `dlopen' dso_dlfcn.c:(.text+0x60d): undefined reference to `dlclose' dso_dlfcn.c:(.text+0x645): undefined reference to `dlerror' ./libcrypto.a(dso_dlfcn.o): In function `dlfcn_pathbyaddr': dso_dlfcn.c:(.text+0x6d1): undefined reference to `dladdr' dso_dlfcn.c:(.text+0x731): undefined reference to `dlerror' ./libcrypto.a(dso_dlfcn.o): In function `dlfcn_unload': dso_dlfcn.c:(.text+0x792): undefined reference to `dlclose' collect2: error: ld returned 1 exit status 缺少ld引用. 解决方案 修改go代码,还要额外链接ld的库. #cgo LDFLAGS: -L../2nd/crypto/lib/Linux_x86_64 -L../3rd/openssl-OpenSSL_1_0_2-stable/lib/Linux_x86_64 -lmism -lssl -lcrypto -lstdc++ -ldl ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:3:3","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"类型映射错误 $ go build aesecb.go # command-line-arguments ./aesecb.go:16:10: assignment mismatch: 3 variables but _Cfunc_MIGetSafeHandle returns 1 values ./aesecb.go:17:18: cannot use uintptr(0) (type uintptr) as type *_Ctype_char in argument to _Cfunc_MILoad ./aesecb.go:20:31: cannot use uintptr(unsafe.Pointer(\u0026iLen)) (type uintptr) as type *_Ctype_int in argument to _Cfunc_MIGetEncryptDataLen ./aesecb.go:20:63: cannot use uintptr(unsafe.Pointer(\u0026msg[0])) (type uintptr) as type *_Ctype_char in argument to _Cfunc_MIGetEncryptDataLen ./aesecb.go:20:97: cannot use uintptr(srcLen) (type uintptr) as type _Ctype_int in argument to _Cfunc_MIGetEncryptDataLen ./aesecb.go:22:26: cannot use uintptr(unsafe.Pointer(\u0026buf[0])) (type uintptr) as type *_Ctype_char in argument to _Cfunc_MITransEncrypt ./aesecb.go:22:60: cannot use uintptr(iLen) (type uintptr) as type _Ctype_int in argument to _Cfunc_MITransEncrypt ./aesecb.go:22:75: cannot use uintptr(unsafe.Pointer(\u0026msg[0])) (type uintptr) as type *_Ctype_char in argument to _Cfunc_MITransEncrypt ./aesecb.go:22:109: cannot use uintptr(srcLen) (type uintptr) as type _Ctype_int in argument to _Cfunc_MITransEncrypt ./aesecb.go:27:31: cannot use uintptr(unsafe.Pointer(\u0026iDLen)) (type uintptr) as type *_Ctype_int in argument to _Cfunc_MIGetDecryptDataLen ./aesecb.go:27:31: too many errors 基本上都是用错了类型,参考类型映射来修改. C类型 调用方法 Go类型 字节数 char C.char byte 1 signed char C.schar int8 1 unsigned char C.uchar uint8 1 short int C.short int16 2 short unsigned int C.ushort uint16 2 int C.int int 4 unsigned int C.uint uint32 4 long int C.long int32 or int64 4 long unsigned int C.ulong uint32 or uint64 4 long long int C.longlong int64 8 long long unsigned int C.ulonglong uint64 8 float C.float float32 4 double C.double float64 8 wchar_t C.wchar_t 2 void * unsafe.Pointer 最后完整代码如下 package main /* #cgo CFLAGS: -I../2nd/crypto/include #cgo LDFLAGS: -L../2nd/crypto/lib/Linux_x86_64 -L../3rd/openssl-OpenSSL_1_0_2-stable/lib/Linux_x86_64 -lmism -lssl -lcrypto -lstdc++ -ldl #include \"ISafeInterface.h\" */ import \"C\" import ( \"fmt\" \"unsafe\" ) func main() { msg := []byte(\"I am test trans crypto!\") h := C.MIGetSafeHandle() C.MILoad(nil, C.int(0), h) var iLen int32 var srcLen int = len(msg) C.MIGetEncryptDataLen((*C.int)(\u0026iLen), (*C.char)(unsafe.Pointer(\u0026msg[0])), C.int(srcLen), h) buf := make([]byte, 44) C.MITransEncrypt((*C.char)(unsafe.Pointer(\u0026buf[0])), C.int(iLen), (*C.char)(unsafe.Pointer(\u0026msg[0])), C.int(srcLen), h) fmt.Println(buf) dstLen := len(buf) var iDLen int32 C.MIGetDecryptDataLen((*C.int)(\u0026iDLen), (*C.char)(unsafe.Pointer(\u0026buf[0])), C.int(dstLen), h) fmt.Println(iDLen) newSrc := make([]byte, iDLen) C.MITransDecrypt((*C.char)(unsafe.Pointer(\u0026newSrc[0])), C.int(iDLen), (*C.char)(unsafe.Pointer(\u0026buf[0])), C.int(dstLen), h) fmt.Println(string(newSrc)) C.MIFreeSafeHandle(h) } ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:3:4","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":["Golang"],"content":"go tool cgo 调用该命令会在当前目录生成_obj文件夹,在里面文件可以看到类型转换的信息.参考命令 # sky @ localhost in ~/go/path/src/aesecb/_obj [11:33:09] $ ll total 48 drwxr-xr-x 2 sky sky 4096 Sep 10 11:32 . drwxr-xr-x 3 sky sky 92 Sep 10 11:32 .. -rw-r--r-- 1 sky sky 6264 Sep 10 11:32 _cgo_.o -rw-r--r-- 1 sky sky 605 Sep 10 11:32 _cgo_export.c -rw-r--r-- 1 sky sky 1547 Sep 10 11:32 _cgo_export.h -rw-r--r-- 1 sky sky 13 Sep 10 11:32 _cgo_flags -rw-r--r-- 1 sky sky 5427 Sep 10 11:32 _cgo_gotypes.go -rw-r--r-- 1 sky sky 416 Sep 10 11:32 _cgo_main.c -rw-r--r-- 1 sky sky 2020 Sep 10 11:32 aesecb.cgo1.go -rw-r--r-- 1 sky sky 5710 Sep 10 11:32 aesecb.cgo2.c ","date":"2020-11-03","objectID":"/2020/11/03/cgo/:3:5","tags":["go","cgo"],"title":"记录cgo调用C实现的加解密静态库中遇到的问题","uri":"/2020/11/03/cgo/"},{"categories":null,"content":"关于 ","date":"2020-11-03","objectID":"/about/:0:0","tags":null,"title":"关于","uri":"/about/"}]