-
Notifications
You must be signed in to change notification settings - Fork 0
/
atom.xml
1982 lines (1782 loc) · 482 KB
/
atom.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
<title>xin053</title>
<subtitle>在安全圈里徘徊,停滞不前</subtitle>
<link href="/atom.xml" rel="self"/>
<link href="https://xin053.github.io/"/>
<updated>2017-05-27T13:20:48.775Z</updated>
<id>https://xin053.github.io/</id>
<author>
<name>xin053</name>
</author>
<generator uri="http://hexo.io/">Hexo</generator>
<entry>
<title>shell编程</title>
<link href="https://xin053.github.io/2017/03/10/shell%E7%BC%96%E7%A8%8B/"/>
<id>https://xin053.github.io/2017/03/10/shell编程/</id>
<published>2017-03-10T04:38:10.000Z</published>
<updated>2017-05-27T13:20:48.775Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="Hello-World"><a href="#Hello-World" class="headerlink" title="Hello World"></a>Hello World</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"><span class="meta">#</span><span class="bash"> this is a comment</span></div><div class="line">echo 'Hello World!'</div><div class="line">exit</div></pre></td></tr></table></figure>
<p>文件保存为<code>hello.sh</code>,然后修改文件的权限:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> chmod 755 hello.sh</span></div></pre></td></tr></table></figure>
<p>最后,执行:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ./hello.sh</span></div><div class="line">Hello World!</div></pre></td></tr></table></figure>
<p><code>exit</code>不是必须的,但是每个命令都会返回一个退出状态给父进程,成功返回0,非0值通常被认为是错误码,良好脚本都会带上<code>exit</code>,当一个脚本不带参数<code>exit</code>来结束时,脚本的退出状态由脚本中最后执行命令来决定</p>
<p><code>echo $?</code>可以用来查看前一个命令的退出状态</p>
<a id="more"></a>
<h3 id="赋值"><a href="#赋值" class="headerlink" title="赋值"></a>赋值</h3><p>使用<code>=</code>进行赋值,<strong>并且<code>=</code>左右两边不能有空格</strong>,获取变量值得时候在变量名前面加<code>$</code></p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> a=1 <span class="comment"># 如果是a = 1,那么就会被解释为执行a命令,并带有'= 1'参数</span></span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">$a</span></span></div><div class="line">1</div></pre></td></tr></table></figure>
<h3 id="变量"><a href="#变量" class="headerlink" title="变量"></a>变量</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">hello="a b c d"</div><div class="line">echo $hello # a b c d 变量替换</div><div class="line">echo "$hello" # a b c d 部分引用</div><div class="line">echo "${hello}" # a b c d</div><div class="line">echo '$hello' # $hello 全引用</div></pre></td></tr></table></figure>
<p>正如所见,变量替换会去除掉空白,全引用会禁止所有特殊符号,如果只是想输出变量的值,推荐使用<code>"${}"</code>这种形式</p>
<h4 id="bash中变量的类型"><a href="#bash中变量的类型" class="headerlink" title="bash中变量的类型"></a>bash中变量的类型</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">a=2334 #整形</div><div class="line">b=${a/23/BB} #这将把b变量从整形变为string</div><div class="line">c=${b/BB/23} #这将把c变量从string变为整形</div></pre></td></tr></table></figure>
<p>所以说bash中的变量都是无类型的</p>
<h4 id="特殊变量"><a href="#特殊变量" class="headerlink" title="特殊变量"></a>特殊变量</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ./scriptname 1 2 3 4 5 6 7 8 9 10</span></div></pre></td></tr></table></figure>
<p><code>1 2 3 4 5 6 7 8 9 10</code>是从命令行传入的10个参数,<code>$0</code>表示脚本名称,<code>$1</code>表示第一个参数,<code>${10}</code>表示第10个参数,<code>$#</code>位置参数的个数,<code>$*</code>所有的位置参数,被作为一个单词</p>
<p>每一次执行<code>shift</code>命令能够将所有位置参数向前移动一个位置,而原来第一个位置的参数则被丢弃</p>
<h4 id="内部变量"><a href="#内部变量" class="headerlink" title="内部变量"></a>内部变量</h4><p><code>$BASH</code> - bash二进制执行文件的位置</p>
<p><code>$FUNCNAME</code> - 当前函数的名字</p>
<p><code>$GROUPS</code> - 当前用户属于的组</p>
<p><code>$HOME</code> - 用户home目录</p>
<p><code>$HOSTNAME</code> - 主机名</p>
<p><code>$IFS</code> - 内部域分隔符,该变量决定bash在解释字符串时如何识别域或单词的边界</p>
<p><code>$LINENO</code> - 记录它所在shell脚本中它所在行的行号</p>
<p><code>$OSTYPE</code> - 系统类型</p>
<p><code>$PPID</code> - 一个进程的<code>$PPID</code>就是它的父进程的pid</p>
<p><code>$PWD</code> - 当前工作目录</p>
<p><code>$SECONDS</code> - 这个脚本已经运行的时间</p>
<p><code>$SHLVL</code> - shell层叠的层次</p>
<p><code>$UID</code> - 用户id号</p>
<p><code>$$</code> - 脚本自身进程pid</p>
<h4 id="获取变量名"><a href="#获取变量名" class="headerlink" title="获取变量名"></a>获取变量名</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{!prefix*}</span></div><div class="line"><span class="meta">$</span><span class="bash">{!prefix@}</span></div></pre></td></tr></table></figure>
<p>这两个命令都可以返回以<code>prefix</code>开头的已有变量</p>
<h3 id="Here-Documents"><a href="#Here-Documents" class="headerlink" title="Here Documents"></a>Here Documents</h3><p>here documents是一种重定向的形式</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">command << token</div><div class="line">text</div><div class="line">token</div></pre></td></tr></table></figure>
<p>这里的command是一个可以接受标准输入的命令,token是一个用来指示嵌入文本结束的字符串。上述结构就是将text的内容当作标准输入传给了command</p>
<p>将<code><<</code>改为<code><<-</code>,shell就会忽略text开头的tab字符,这样text内容就可以缩进,从而提高代码的可读性。</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">cat <<- _EOF_</div><div class="line"> hello</div><div class="line"> world</div><div class="line"> !!!!!</div><div class="line">_EOF_</div></pre></td></tr></table></figure>
<p>常用上述方法代替<code>echo</code>输出多行内容</p>
<h3 id="获取用户输入"><a href="#获取用户输入" class="headerlink" title="获取用户输入"></a>获取用户输入</h3><p>使用<code>read</code>来获取用户的输入</p>
<p><code>read a</code>将获取用户的输入到变量a,如果没有提供变量名,默认变量<code>REPLY</code>会包含用户输入</p>
<p><code>read</code>支持以下选项</p>
<p><code>-a array</code> - 把输入赋值到数组array中,从索引号0开始</p>
<p><code>-n num</code> - 读取num个输入字符,而不是整行</p>
<p><code>-p prompt</code> - 为输入显示提示信息</p>
<p><code>-r</code> - raw modw,不会把反斜杠字符解释为转义字符</p>
<p><code>-s</code> - silent mode,不会再屏幕上显示输入的文字</p>
<p><code>-t seconds</code> - 超时,seconds秒之后,如果没有输入,则返回一个非零退出状态</p>
<h3 id="给变量指定默认值"><a href="#给变量指定默认值" class="headerlink" title="给变量指定默认值"></a>给变量指定默认值</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter:-word}</span></div></pre></td></tr></table></figure>
<p>若<code>parameter</code>没有设置或者为空,展开结果为<code>word</code>,若<code>parameter</code>不为空,则展开结果是<code>parameter</code>的值</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter:=word}</span></div></pre></td></tr></table></figure>
<p>若<code>parameter</code>没有设置或者为空,展开结果为<code>word</code>,并且<code>word</code>的值会赋值给<code>parameter</code>,若<code>parameter</code>不为空,则展开结果是<code>parameter</code>的值</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter:?word}</span></div></pre></td></tr></table></figure>
<p>若<code>parameter</code>没有设置或者为空,这种展开导致脚本带有错误退出,并且<code>word</code>的内容会发送到标准错误,若<code>parameter</code>不为空,则展开结果是<code>parameter</code>的值</p>
<h3 id="函数"><a href="#函数" class="headerlink" title="函数"></a>函数</h3><h4 id="函数定义"><a href="#函数定义" class="headerlink" title="函数定义"></a>函数定义</h4><p>函数定义有两种形式</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">function name(){</div><div class="line"> commands</div><div class="line"> return</div><div class="line">}</div></pre></td></tr></table></figure>
<p>或者</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">name(){</div><div class="line"> commands</div><div class="line"> return</div><div class="line">}</div></pre></td></tr></table></figure>
<p>调用函数时,只用写函数名,不用加括号,并且函数的定义要在函数调用之前</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line">function hello(){</div><div class="line"> echo "Hello World!"</div><div class="line"> return</div><div class="line">}</div><div class="line">hello # 函数调用</div></pre></td></tr></table></figure>
<h4 id="局部变量"><a href="#局部变量" class="headerlink" title="局部变量"></a>局部变量</h4><p>在函数内部使用<code>local</code>关键字来定义局部变量</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">function funcname(){</div><div class="line"> local test=1</div><div class="line"> echo $test</div><div class="line"> return</div><div class="line">}</div></pre></td></tr></table></figure>
<h3 id="if"><a href="#if" class="headerlink" title="if"></a>if</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line">x=5</div><div class="line">if [ $x == 5 ]; then # 注意[右边的空格和]左边的空格以及==两边的空格</div><div class="line"> echo "x equals 5"</div><div class="line">else</div><div class="line"> echo "x dose not equals 5"</div><div class="line">fi</div></pre></td></tr></table></figure>
<h3 id="判断"><a href="#判断" class="headerlink" title="判断"></a>判断</h3><p><strong>涉及到判断的地方都是检测命令的退出状态码,如果是0,表示命令成功执行,也就表示当前判断的内容为真,非0则假。</strong></p>
<h4 id="文件表达式"><a href="#文件表达式" class="headerlink" title="文件表达式"></a>文件表达式</h4><p><code>-d file</code> - file存在并且是一个目录</p>
<p><code>-e file</code> - file存在</p>
<p><code>-f file</code> - file存在并且是一个普通文件</p>
<p><code>-s file</code> - file存在并且其长度大于0</p>
<p><code>-r file</code> - file存在并且可读</p>
<p><code>-w file</code> - file存在并且可写</p>
<p><code>-x file</code> - file存在并且可执行</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"></div><div class="line">FILE=~/.bashrc</div><div class="line"></div><div class="line">if [ -f "$FILE" ]; then</div><div class="line"> echo "$FILE is a file"</div><div class="line">fi</div><div class="line"></div><div class="line">exit</div></pre></td></tr></table></figure>
<h4 id="字符串表达式"><a href="#字符串表达式" class="headerlink" title="字符串表达式"></a>字符串表达式</h4><p><code>-n string</code> - 字符串string的长度大于0</p>
<p><code>-z string</code> - 字符串string的长度为0</p>
<p><code>string1 == string2</code> - 字符串string1等于字符串string2</p>
<p><code>string1 > string2</code> - string1排列在string2之后</p>
<h4 id="其他判断"><a href="#其他判断" class="headerlink" title="其他判断"></a>其他判断</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">[[ expression ]]</div></pre></td></tr></table></figure>
<p>类似于<code>test</code></p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">string =~ regex</div></pre></td></tr></table></figure>
<p>如果string匹配正则表达式regex,则返回真</p>
<h3 id="while"><a href="#while" class="headerlink" title="while"></a>while</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"></div><div class="line">count=1</div><div class="line">while [ "${count}" -le 5 ]; do</div><div class="line"> echo "${count}"</div><div class="line"> count=$((count + 1))</div><div class="line">done</div><div class="line">echo "finished!"</div><div class="line"></div><div class="line">exit</div></pre></td></tr></table></figure>
<p>循环中可以使用<code>continue</code>和<code>break</code></p>
<h4 id="循环读取数据"><a href="#循环读取数据" class="headerlink" title="循环读取数据"></a>循环读取数据</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"></div><div class="line">while read para1 para2 para3; do</div><div class="line"> ...</div><div class="line">done < test.txt</div></pre></td></tr></table></figure>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"></div><div class="line">sort -k 1,1 -k 2n test.txt | while read para1 para2 para3; do</div></pre></td></tr></table></figure>
<p><code>read</code>每次读取文本行之后将会返回退出状态码0,知道文件末尾,返回状态码非零才结束while循环</p>
<p>当循环终止时,循环中创建的任意变量或赋值的变量都会消失</p>
<h3 id="until"><a href="#until" class="headerlink" title="until"></a>until</h3><p>与while类似</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"></div><div class="line">count=1</div><div class="line">until [ "${count}" -gt 5 ]; do</div><div class="line"> echo "${count}"</div><div class="line"> count=$((count + 1))</div><div class="line">done</div><div class="line">echo "finished!"</div><div class="line"></div><div class="line">exit</div></pre></td></tr></table></figure>
<h3 id="case"><a href="#case" class="headerlink" title="case"></a>case</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line">read -p "Enter selection [0-3]"</div><div class="line">case $REPLY in</div><div class="line"> 0) echo "Program terminated."</div><div class="line"> exit</div><div class="line"> ;;</div><div class="line"> 1) echo "Hostname: $HOSTNAME"</div><div class="line"> uptime</div><div class="line"> ;;</div><div class="line"> 2) df -h</div><div class="line"> ;;</div><div class="line"> 3) echo "Hello"</div><div class="line"> ;;</div><div class="line"> *) echo "Invalid entry" >&2</div><div class="line"> exit 1</div><div class="line"> ;;</div><div class="line">esac</div></pre></td></tr></table></figure>
<h4 id="匹配模式"><a href="#匹配模式" class="headerlink" title="匹配模式"></a>匹配模式</h4><p><code>a)</code> - 匹配单词<code>a</code></p>
<p><code>a|A)</code> - 匹配单词<code>a</code>或<code>A</code></p>
<p><code>[[:alpha:]]</code> - 若单词是一个字母字符,则匹配</p>
<p><code>???)</code> - 若单词只有3个字符,则匹配</p>
<p><code>*.txt</code> - 若单词以<code>.txt</code>字符结尾,则匹配</p>
<h3 id="for"><a href="#for" class="headerlink" title="for"></a>for</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">for i in A B C D; do</div><div class="line"> echo "$i"</div><div class="line">done</div></pre></td></tr></table></figure>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">for i in {A..D}; do</div><div class="line"> echo "$i"</div><div class="line">done</div></pre></td></tr></table></figure>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">for i in cloud*.txt; do</div><div class="line"> echo "$i"</div><div class="line">done</div></pre></td></tr></table></figure>
<p>也可以使用c语言格式:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">for (( expression1; expression2; expression3 )); do</div><div class="line"> commands</div><div class="line">done</div></pre></td></tr></table></figure>
<h3 id="字符串操作"><a href="#字符串操作" class="headerlink" title="字符串操作"></a>字符串操作</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{<span class="comment">#parameter}</span></span></div></pre></td></tr></table></figure>
<p>会展开为<code>parameter</code>所包含的字符串的长度</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter:offset} <span class="comment"># 提取从offset到末尾的字符串</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter:offset:length} <span class="comment"># 提取offset开始,指定长度的字符串</span></span></div></pre></td></tr></table></figure>
<h4 id="子串消除"><a href="#子串消除" class="headerlink" title="子串消除"></a>子串消除</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter<span class="comment">#pattern} # 展开为删除parameter中从开头开始匹配pattern的最短字符串</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter<span class="comment">##pattern} # 展开为删除parameter中从开头开始匹配pattern的最长字符串</span></span></div></pre></td></tr></table></figure>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> foo=file.txt.zip</span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">${foo#*.}</span></span></div><div class="line">txt.zip</div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">${foo##*.}</span></span></div><div class="line">zip</div></pre></td></tr></table></figure>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter%pattern}</span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter%%pattern}</span></div></pre></td></tr></table></figure>
<p>功能与<code>#</code>和<code>##</code>类似,只是是从结尾开始匹配</p>
<h4 id="字符串替换"><a href="#字符串替换" class="headerlink" title="字符串替换"></a>字符串替换</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter/pattern/string} <span class="comment"># 用string替换第一个匹配pattern的字符串</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter//pattern/string} <span class="comment"># 替换掉全部匹配的</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter/<span class="comment">#pattern/string} # 替换从字符串开头开始匹配的第一个字符串</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter/%pattern/string} <span class="comment"># 替换从字符串结尾开始匹配的第一个字符串</span></span></div></pre></td></tr></table></figure>
<p>原parameter变量值不变</p>
<h4 id="字符串大小写"><a href="#字符串大小写" class="headerlink" title="字符串大小写"></a>字符串大小写</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{parameter,,} <span class="comment"># 把parameter的值全部展开为小写</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter,} <span class="comment"># 仅把第一个字符展开为小写</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter^^} <span class="comment"># 把parameter的值全部展开为大写</span></span></div><div class="line"><span class="meta">$</span><span class="bash">{parameter^} <span class="comment"># 仅把第一个字符展开为大写</span></span></div></pre></td></tr></table></figure>
<p>原parameter变量值不变</p>
<h3 id="数组"><a href="#数组" class="headerlink" title="数组"></a>数组</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">declare</span> <span class="_">-a</span> array <span class="comment"># 声明array为一个数组</span></span></div><div class="line"><span class="meta">$</span><span class="bash"> array[0]=0</span></div><div class="line"><span class="meta">$</span><span class="bash"> array[1]=1</span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">${array[0]}</span></span></div><div class="line">0</div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">${array[1]}</span></span></div><div class="line">1</div></pre></td></tr></table></figure>
<h4 id="多值赋值"><a href="#多值赋值" class="headerlink" title="多值赋值"></a>多值赋值</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">test</span>=(a b c d)</span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">${test[0]}</span></span></div><div class="line">a</div></pre></td></tr></table></figure>
<h4 id="输出整个数组内容"><a href="#输出整个数组内容" class="headerlink" title="输出整个数组内容"></a>输出整个数组内容</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> animals=(<span class="string">"a dog"</span> <span class="string">"a cat"</span> <span class="string">"a fish"</span>)</span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="string">"<span class="variable">${animals[*]}</span>"</span>; <span class="keyword">do</span> <span class="built_in">echo</span> <span class="variable">$i</span>; <span class="keyword">done</span></span></div><div class="line">a dog a cat a fish</div><div class="line"><span class="meta">$</span><span class="bash"> <span class="keyword">for</span> i <span class="keyword">in</span> <span class="string">"<span class="variable">${animals[@]}</span>"</span>; <span class="keyword">do</span> <span class="built_in">echo</span> <span class="variable">$i</span>; <span class="keyword">done</span></span></div><div class="line">a dog</div><div class="line">a cat</div><div class="line">a fish</div></pre></td></tr></table></figure>
<p>下标<code>*</code>和<code>@</code>可以被用来访问数组中的每一个元素</p>
<h4 id="关联数组"><a href="#关联数组" class="headerlink" title="关联数组"></a>关联数组</h4><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">declare</span> -A colors</span></div><div class="line"><span class="meta">$</span><span class="bash"> colors[<span class="string">"red"</span>]=<span class="string">"#ff0000"</span></span></div><div class="line"><span class="meta">$</span><span class="bash"> colors[<span class="string">"green"</span>]=<span class="string">"#00ff00"</span></span></div><div class="line"><span class="meta">$</span><span class="bash"> colors[<span class="string">"blue"</span>]=<span class="string">"#0000ff"</span></span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">${colors["blue"]}</span></span></div><div class="line"><span class="meta">#</span><span class="bash">0000ff</span></div></pre></td></tr></table></figure>
<h4 id="找到数组使用的下标"><a href="#找到数组使用的下标" class="headerlink" title="找到数组使用的下标"></a>找到数组使用的下标</h4><p>bash允许数组下标包含空格,有时候确定哪个元素真正存在是很有用的</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash">{!array[*]}</span></div><div class="line"><span class="meta">$</span><span class="bash">{!array[@]}</span></div></pre></td></tr></table></figure>
<h3 id="组命令和子shell"><a href="#组命令和子shell" class="headerlink" title="组命令和子shell"></a>组命令和子shell</h3><p>组命令</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">{ command1; command2; [commands3; ...] } # 注意花括号旁边的空格</div></pre></td></tr></table></figure>
<p>子shell</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">(command1; command2; [command3; ...])</div></pre></td></tr></table></figure>
<p>组命令和子shell都是用来管理重定向的</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">{ ls -l; echo "test"; cat foo.txt } > output.txt</div></pre></td></tr></table></figure>
<p>会将三个命令的结果合成在一起然后重定向到<code>output.txt</code>中</p>
<p>组命令是在当前shell中执行它所有的命令,而子shell是在一个子shell中执行命令,在子shell中执行命令对环境变量等修改在子shell消失之后便会消失,大多数情况下,我们使用组命令。</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="string">"foo"</span> | <span class="built_in">read</span></span></div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> <span class="variable">$REPLY</span></span></div></pre></td></tr></table></figure>
<p>该<code>REPLY</code>变量的内容总是空,<strong>是应为在管道线中的命令总是在子shell中执行的</strong>,bash提供进程替换来解决这个问题</p>
<h4 id="进程替换"><a href="#进程替换" class="headerlink" title="进程替换"></a>进程替换</h4><p><code><(list)</code> - 一种适用于产生标准输出的进程</p>
<p><code>>(list)</code> - 一种适用于接受标准输入的进程</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">read < <(echo "foo")</div><div class="line">echo $REPLY</div></pre></td></tr></table></figure>
<p>进程替换允许我们把一个子shell的输出结果当作一个用于重定向的普通文件,事实上,它就是一种展开形式</p>]]></content>
<summary type="html">
<h2 id="Hello-World"><a href="#Hello-World" class="headerlink" title="Hello World"></a>Hello World</h2><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="meta">#</span><span class="bash">!/bin/bash</span></div><div class="line"><span class="meta">#</span><span class="bash"> this is a comment</span></div><div class="line">echo 'Hello World!'</div><div class="line">exit</div></pre></td></tr></table></figure>
<p>文件保存为<code>hello.sh</code>,然后修改文件的权限:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> chmod 755 hello.sh</span></div></pre></td></tr></table></figure>
<p>最后,执行:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ./hello.sh</span></div><div class="line">Hello World!</div></pre></td></tr></table></figure>
<p><code>exit</code>不是必须的,但是每个命令都会返回一个退出状态给父进程,成功返回0,非0值通常被认为是错误码,良好脚本都会带上<code>exit</code>,当一个脚本不带参数<code>exit</code>来结束时,脚本的退出状态由脚本中最后执行命令来决定</p>
<p><code>echo $?</code>可以用来查看前一个命令的退出状态</p>
</summary>
<category term="linux" scheme="https://xin053.github.io/categories/linux/"/>
<category term="linux" scheme="https://xin053.github.io/tags/linux/"/>
<category term="shell" scheme="https://xin053.github.io/tags/shell/"/>
</entry>
<entry>
<title>linux命令学习</title>
<link href="https://xin053.github.io/2017/03/08/linux%E5%91%BD%E4%BB%A4%E5%AD%A6%E4%B9%A0/"/>
<id>https://xin053.github.io/2017/03/08/linux命令学习/</id>
<published>2017-03-08T05:26:10.000Z</published>
<updated>2017-05-27T13:20:48.775Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="Linux-命令学习"><a href="#Linux-命令学习" class="headerlink" title="Linux 命令学习"></a>Linux 命令学习</h2><h3 id="常用命令"><a href="#常用命令" class="headerlink" title="常用命令"></a>常用命令</h3><p>显示磁盘容量</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> df -h</span></div></pre></td></tr></table></figure>
<p>显示内存信息</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ free -h</div></pre></td></tr></table></figure>
<p>确定文件类型</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">file 文件名</div></pre></td></tr></table></figure>
<p><code>less</code>和<code>more</code>都能浏览文件,但是前者可以前后分页浏览,后者只支持向前分页浏览</p>
<a id="more"></a>
<p>以管理员模式打开资源管理器</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> sudo nautilus</span></div></pre></td></tr></table></figure>
<p>说明怎样解释一个命令名</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">type 命令名</div></pre></td></tr></table></figure>
<p>获取命令简介</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">whatis 命令名</div></pre></td></tr></table></figure>
<p><code>help</code>和<code>man</code>都可以查看命令帮助文档,但是前者是shell内部命令的帮助文档</p>
<p>输入文件前多少行</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">head -n 行数 文件名</div></pre></td></tr></table></figure>
<p>输出文件后多少行</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">tail -n 行数 文件名</div></pre></td></tr></table></figure>
<p>清空屏幕,与<code>ctrl+l</code>功能一样</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">clear</div></pre></td></tr></table></figure>
<p>显示历史列表内容</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">history</div></pre></td></tr></table></figure>
<p>显示所有服务的运行状态</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> service --status-all</span></div></pre></td></tr></table></figure>
<p>显示单个服务的运行状态,例如ssh服务</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> service ssh status</span></div></pre></td></tr></table></figure>
<h3 id="特殊符号"><a href="#特殊符号" class="headerlink" title="特殊符号"></a>特殊符号</h3><p><code>;</code>命令分隔符,可以用来在一行中来写多个命令</p>
<p><code>""</code>部分引用,阻止了一部分特殊字符</p>
<p><code>''</code>全引用,阻止了全部特殊字符</p>
<p><code>` </code>反引号,命令替换</p>
<p><code>?</code>测试操作,在参数替换中,可以测试一个变量是够被set</p>
<p><code>$?</code>退出状态变量</p>
<p><code>$$</code>进程ID变量,保存运行脚本进程ID</p>
<h3 id="文件操作"><a href="#文件操作" class="headerlink" title="文件操作"></a>文件操作</h3><p><code>cp</code> - 复制文件和目录</p>
<p><code>mv</code> - 移动/重命名文件和目录</p>
<p><code>mkdir</code> - 创建目录</p>
<p><code>rm</code> - 删除文件和目录</p>
<p><code>ln</code> - 创建硬链和符号链接</p>
<h3 id="命令"><a href="#命令" class="headerlink" title="命令"></a>命令</h3><p>命令可以是下面四种形式之一:</p>
<ol>
<li>是一个可执行程序,就像我们所看到的位于目录<code>/usr/bin</code> 中的文件一样。属于这一类的程序,可以编译成二进制文件,诸如用 C 和 C++ 语言写成的程序, 也可以是由脚本语言写成的程序,比如说 shell, perl, python, ruby,等等。</li>
<li>是一个内建于 shell 自身的命令。bash 支持若干命令,内部叫做 shell 内部命令<br>(builtins)。例如, cd 命令,就是一个 shell 内部命令。</li>
<li>是一个 shell 函数。这些是小规模的 shell 脚本,它们混合到环境变量中。在后续的章节里,我们将讨论配置环境变量以及书写 shell 函数。但是现在,仅仅意识到它们的存在就可以了。</li>
<li>是一个命令别名。我们可以定义自己的命令,建立在其它命令之上。</li>
</ol>
<h3 id="重定向"><a href="#重定向" class="headerlink" title="重定向"></a>重定向</h3><p><code>></code>会删除文件中的内容,然后将内容定向到文件中,<code>>></code>则是在文件末尾中追加</p>
<p>标准输入和标准输出以及标准错误流是各自重定向的,shell内部参考它们文件描述符为0,1,2</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ls <span class="_">-l</span> /bin/use 2>> ls-error.txt</span></div></pre></td></tr></table></figure>
<p>上述命令就是将错误流输出到<code>ls-error.txt</code>文件中</p>
<p>如果我们想实现将标准输出和标准错误重定向到同一个文件中,我们可以:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ls <span class="_">-l</span> /bin/usr > ls-output.txt 2>&1</span></div></pre></td></tr></table></figure>
<p>上述命令就是先将标准输出重定向到文件, 然后将标准错误重定向到标准输出</p>
<p><strong>注意重定向的顺序很重要,标准错误的重定向必须总是出现在标准输出重定向之后,要不然它不起作用</strong></p>
<p>现在的bash也支持使用以下更精简的方法来将标准输出和错误重定向到同一个文件中</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ls <span class="_">-l</span> /bin/usr &> ls-output.txt</span></div></pre></td></tr></table></figure>
<p>有时候,我们不想要一个命令的输出结果,只想把它扔掉,我们就可以利用一个特殊的设备<code>/dev/null</code>(相当于垃圾桶)</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ls <span class="_">-l</span> /bin/usr 2> /dev/null</span></div></pre></td></tr></table></figure>
<p>上述命令就是将标准错误流扔掉了</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> cat /dev/null > filename</span></div></pre></td></tr></table></figure>
<p>将文件内容清空,如果文件不存在,则创建文件,与下面命令功能一样</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> : > filename</span></div></pre></td></tr></table></figure>
<p><code>:</code>是空命令</p>
<p>管道命令<code>|</code>是将一个命令的标准输出重定向到另一个命令的标准输入</p>
<p>例如,我们使用:</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ll | less</span></div></pre></td></tr></table></figure>
<p>就能更方便的查看当前目录下的所有文件了</p>
<p><code>tee</code>命令从标准输入读取数据,并同时输出到标准输出和文件中。</p>
<h3 id="花括号展开"><a href="#花括号展开" class="headerlink" title="花括号展开"></a>花括号展开</h3><figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> {1..5}</span></div><div class="line">1 2 3 4 5</div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">echo</span> {z..a}</span></div><div class="line">z y x w v u t s r q p o n m l k j i h g f e d c b a</div></pre></td></tr></table></figure>
<h3 id="命令替换"><a href="#命令替换" class="headerlink" title="命令替换"></a>命令替换</h3><p>命令替换允许我们把一个命令的输出作为一个展开模式来使用</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ll $(<span class="built_in">which</span> cp)</span></div><div class="line">-rwxr-xr-x 1 root root 151024 2月 18 2016 /bin/cp*</div></pre></td></tr></table></figure>
<p>也可以使用反引号来代替美元符号和括号</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> ll `<span class="built_in">which</span> cp`</span></div><div class="line">-rwxr-xr-x 1 root root 151024 2月 18 2016 /bin/cp*</div></pre></td></tr></table></figure>
<h3 id="特殊权限"><a href="#特殊权限" class="headerlink" title="特殊权限"></a>特殊权限</h3><h4 id="setuid"><a href="#setuid" class="headerlink" title="setuid"></a>setuid</h4><p>当应用到一个可执行文件时,它把有效用户ID从真正的用户(实际运行程序的用户)设置成程序所有者的ID</p>
<h4 id="setgid"><a href="#setgid" class="headerlink" title="setgid"></a>setgid</h4><p>与setuid位相似,把有效用户组ID从真正的用户组ID更改为文件所有者的组的ID</p>
<h4 id="sticky"><a href="#sticky" class="headerlink" title="sticky"></a>sticky</h4><p>linux会忽略文件的sticky位,但是如果一个目录设置了sticky位,那么它能阻止用户删除或重命名,除非用户是这个目录的所有者,或是文件的所有者,或是超级用户</p>
<h3 id="进程"><a href="#进程" class="headerlink" title="进程"></a>进程</h3><p><code>ps</code>显示当前有TTY(进程的控制终端)的进程,<code>ps x</code>显示所有进程,不管它们由什么终端控制,<code>px aux</code>还可以显示进程的所有者,CPU和内存使用率等</p>
<h4 id="进程状态"><a href="#进程状态" class="headerlink" title="进程状态"></a>进程状态</h4><ol>
<li><code>R</code> - 运行</li>
<li><code>S</code> - 正在睡眠</li>
<li><code>D</code> - 不可中断睡眠,进程正在等待I/O</li>
<li><code>T</code> - 已停止</li>
<li><code>Z</code> - 僵尸进程</li>
<li><code><</code> - 高优先级进程</li>
<li><code>N</code> - 低优先级进程 </li>
</ol>
<p><code>ps</code>只是进程快照,而<code>top</code>命令可以动态的显示系统进程更新的信息(默认情况下,每3秒更新一次).<code>pstree</code>可以输出一个树形结构的进程列表</p>
<h4 id="进程控制"><a href="#进程控制" class="headerlink" title="进程控制"></a>进程控制</h4><p>可以在命令之后加上<code>&</code>,让它立即在后台执行</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> xlogo &</span></div><div class="line">[1] 28236</div></pre></td></tr></table></figure>
<p><code>jobs</code>可以显示当前终端后头运行的任务以及状态</p>
<p><strong>一个在后台运行的进程对一切来自键盘的输入都免疫,也不能用<code>ctrl+c</code>来中断它。</strong></p>
<p>使用<code>fg</code>将一个进程返回前台执行</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> xlogo &</span></div><div class="line">[1] 55692</div><div class="line"><span class="meta">$</span><span class="bash"> <span class="built_in">fg</span> %1 //这里的%1被称为jobspec</span></div></pre></td></tr></table></figure>
<p>有时候我们需要停止一个进程,而不是终止。这样会把一个前台进程移到后台等待,输入<code>ctrl+z</code>,可以停止一个前台进程。处于停止的进程可以使用<code>fg</code>命令恢复程序到前台运行或者用<code>bg</code>命令把程序移到后台。</p>
<p>可以使用<code>kill PID</code>或<code>kill jobspec</code>来终止进程</p>
<h3 id="vim"><a href="#vim" class="headerlink" title="vim"></a>vim</h3><p>常用命令:</p>
<ol>
<li><code>yy</code> - 复制当前行</li>
<li><code>5yy</code> - 复制当前行以及随后的四行文本</li>
<li><code>y0</code> - 复制当前光标位置到当前行首的内容</li>
<li><code>y$</code> - 复制当前光标位置到当前行的尾部</li>
<li><code>p</code> - 粘贴</li>
<li><code>d</code> - 删除/剪切文本</li>
</ol>
<h3 id="文本处理"><a href="#文本处理" class="headerlink" title="文本处理"></a>文本处理</h3><p><code>cat -A 文件名</code>可以查看文件中的特殊符号</p>
<p><code>cat -n 文件名</code>输出文件内容并显示行号</p>
<p><code>sort</code>对标准输入的内容,或命令行中指定的一个或多个文件进行排序,然后把排序结果发送到标准输出。</p>
<p><code>cut</code>用来从文本行中抽取文本,并把它输入到标准输出</p>
<p><code>paste</code>功能与<code>cut</code>相反,它会添加一个或多个文本列到文件中,而不是从文件中抽取文本列。它通过读取多个文件,然后把每个文件中的字段整合成单个单个文本流,输入到标准输出。</p>
<p><code>sed</code>命令对文本流就行编辑,一般用来做替换操作。</p>]]></content>
<summary type="html">
<h2 id="Linux-命令学习"><a href="#Linux-命令学习" class="headerlink" title="Linux 命令学习"></a>Linux 命令学习</h2><h3 id="常用命令"><a href="#常用命令" class="headerlink" title="常用命令"></a>常用命令</h3><p>显示磁盘容量</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="meta">$</span><span class="bash"> df -h</span></div></pre></td></tr></table></figure>
<p>显示内存信息</p>
<figure class="highlight bash"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">$ free -h</div></pre></td></tr></table></figure>
<p>确定文件类型</p>
<figure class="highlight shell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">file 文件名</div></pre></td></tr></table></figure>
<p><code>less</code>和<code>more</code>都能浏览文件,但是前者可以前后分页浏览,后者只支持向前分页浏览</p>
</summary>
<category term="linux" scheme="https://xin053.github.io/categories/linux/"/>
<category term="linux" scheme="https://xin053.github.io/tags/linux/"/>
</entry>
<entry>
<title>Python3.6更新内容</title>
<link href="https://xin053.github.io/2016/12/23/Python3.6%E6%9B%B4%E6%96%B0%E5%86%85%E5%AE%B9/"/>
<id>https://xin053.github.io/2016/12/23/Python3.6更新内容/</id>
<published>2016-12-23T11:15:12.000Z</published>
<updated>2017-05-27T13:20:48.771Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="Python3-6"><a href="#Python3-6" class="headerlink" title="Python3.6"></a>Python3.6</h2><p>北京时间2016年12月23日晚上6点半左右,<a href="https://www.python.org/" target="_blank" rel="external">python官网</a>放出了python3.6.0正式版,安装后,可以看到windows版具体编译时间是2016年12月23日早上8点6分。可以说python3.6从测试到正式发布已经有很长一段时间了,并且官方表示,2017年初开始对3.6版本进行各种bug修复等改进,也就是3.6.x的版本,关于python3.6相较于3.5有哪些变化,请看<a href="https://docs.python.org/3.6/whatsnew/3.6.html" target="_blank" rel="external">What’s New In Python 3.6</a><br>本文主要讲解如何将工作环境从python3.5转到python3.6,以及python3.6新功能的介绍。</p>
<p><img src="https://www.python.org/static/img/python-logo.png" alt=""></p>
<a id="more"></a>
<h2 id="工作环境"><a href="#工作环境" class="headerlink" title="工作环境"></a>工作环境</h2><p>由于python的每个版本,例如3.5和3.6安装时安装目录是分开的(windows环境),而如果我们将python第三方库安装在python安装目录下的话,那么现在我如果使用3.6,又得重新将3.6的安装目录添加到环境变量<code>PATH</code>,并且将大量第三方库安装到3.6安装目录,但是这样就引发了一个问题,那就是多份第三方库都存在于电脑中,当然也可以删除3.5相关的所有文件,但是实际上重新安装常用的那些库又很麻烦,所以我将python虚拟环境当作我的工作环境,也就是在<code>F:\pythonVE</code>目录创建一个python虚拟环境,将第三方库都安装在这个虚拟环境中,所以现在刚刚安装好python3.6,只用在cmd执行:</p>
<figure class="highlight powershell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">python -m venv --upgrade F:\pythonVE</div></pre></td></tr></table></figure>
<p>注意这里的<code>python</code>是3.6中的<code>python.exe</code>,<code>--upgrade</code>参数的意思就是将虚拟环境中的python版本升级为此python版本(3.6版本)</p>
<p>所以<code>PAHT</code>中只用添加虚拟环境的路径就可以了,然后就是慢慢更新第三方包了,毕竟第三方包适配3.6也需要时间,但是毫无疑问,会很快。<strong>jupyter的<code>ipython-qtconsole.exe</code>现在就用不了,因为pyqt还没支持3.6(毕竟3.6今天才出23333),不过相信过几天就可以用了,python3已经是趋势,不要告诉我你的主要工作环境是python2(话说12月17号更新了python2.7.13)</strong></p>
<p><strong>注意有些包还是要手动更新的,例如windows上无法编译lxml,所以一般都是下载编译好的进行安装,之前下载的是支持python3.5的lxml,现在需要卸载当前库,并手动下载编译好的支持3.6的lxml进行安装,有些包使用pip安装的时候会提示编码问题,简单的方法就是从<a href="http://www.lfd.uci.edu/~gohlke/pythonlibs/" target="_blank" rel="external">Unofficial Windows Binaries for Python Extension Packages</a>下载,然后直接安装</strong></p>
<p><strong><em>以上只是本人环境,因为我目前只把python当作工具,所以不会像开发库一样考虑版本兼容等情况,不过一般还是建议将常用包放在python安装目录下,对于特定的项目构建虚拟环境,在虚拟环境中安装与python版本相适应的包进行开发。</em></strong></p>
<h2 id="What’s-New-In-Python-3-6"><a href="#What’s-New-In-Python-3-6" class="headerlink" title="What’s New In Python 3.6"></a>What’s New In Python 3.6</h2><p>主要改变:</p>
<ul>
<li>PEP 468 - Preserving the order of **kwargs in a function</li>
<li>PEP 487 - Simpler customization of class creation</li>
<li>PEP 495 - Local Time Disambiguation</li>
<li>PEP 498 - Literal String Formatting</li>
<li>PEP 506 - Adding A Secrets Module To The Standard Library</li>
<li>PEP 509 - Add a private version to dict</li>
<li>PEP 515 - Underscores in Numeric Literals</li>
<li>PEP 519 - Adding a file system path protocol</li>
<li>PEP 520 - Preserving Class Attribute Definition Order</li>
<li>PEP 523 - Adding a frame evaluation API to CPython</li>
<li>PEP 524 - Make os.urandom() blocking on Linux (during system startup)</li>
<li>PEP 525 - Asynchronous Generators (provisional)</li>
<li>PEP 526 - Syntax for Variable Annotations (provisional)</li>
<li>PEP 528 - Change Windows console encoding to UTF-8</li>
<li>PEP 529 - Change Windows filesystem encoding to UTF-8</li>
<li>PEP 530 - Asynchronous Comprehensions</li>
</ul>
<h3 id="PEP-498-Formatted-string-literals"><a href="#PEP-498-Formatted-string-literals" class="headerlink" title="PEP 498: Formatted string literals"></a>PEP 498: Formatted string literals</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>name = <span class="string">"Fred"</span></div><div class="line"><span class="meta">>>> </span><span class="string">f"He said his name is <span class="subst">{name}</span>."</span></div><div class="line"><span class="string">'He said his name is Fred.'</span></div><div class="line"><span class="meta">>>> </span>width = <span class="number">10</span></div><div class="line"><span class="meta">>>> </span>precision = <span class="number">4</span></div><div class="line"><span class="meta">>>> </span>value = decimal.Decimal(<span class="string">"12.34567"</span>)</div><div class="line"><span class="meta">>>> </span><span class="string">f"result: <span class="subst">{value:{width}</span>.<span class="subst">{precision}</span>}"</span> <span class="comment"># nested fields</span></div><div class="line"><span class="string">'result: 12.35'</span></div></pre></td></tr></table></figure>
<p>在字符串前面加<code>f</code>,表示该字符串将被格式化,类似于对字符串进行<code>str.format()</code>操作,不得不说,确实很方便</p>
<h3 id="PEP-526-Syntax-for-variable-annotations"><a href="#PEP-526-Syntax-for-variable-annotations" class="headerlink" title="PEP 526: Syntax for variable annotations"></a>PEP 526: Syntax for variable annotations</h3><p>提供变量声明语法,,包括类中的变量,实例中的变量和函数参数</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line">primes: List[int] = []</div><div class="line"></div><div class="line">captain: str <span class="comment"># Note: no initial value!</span></div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Starship</span>:</span></div><div class="line"> stats: Dict[str, int] = {}</div></pre></td></tr></table></figure>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span><span class="class"><span class="keyword">class</span> <span class="title">Starship</span>:</span></div><div class="line"><span class="meta">... </span> stats: str</div><div class="line">...</div><div class="line"><span class="meta">>>> </span>Starship.__annotations__</div><div class="line">{<span class="string">'stats'</span>: <<span class="class"><span class="keyword">class</span> '<span class="title">str</span>'>}</span></div></pre></td></tr></table></figure>
<p>当然,python始终是一门动态语言,所以这些类型声明实际上只是将这些类型信息存储在类或者模块的<code>__annotations__</code>属性中,并不会在运行时检擦这些属性,只是起到提示的作用,当然,这个特性确实也很有用处,具体类型声明语法请看<a href="https://www.python.org/dev/peps/pep-0484/" target="_blank" rel="external">PEP 484</a></p>
<h3 id="PEP-515-Underscores-in-Numeric-Literals"><a href="#PEP-515-Underscores-in-Numeric-Literals" class="headerlink" title="PEP 515: Underscores in Numeric Literals"></a>PEP 515: Underscores in Numeric Literals</h3><p>能够在数字间添加下划线以提高阅读性</p>
<figure class="highlight"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line">>>> 1_000_000_000_000_000</div><div class="line">1000000000000000</div><div class="line">>>> type(1_000_000_000_000_000)</div><div class="line"><class 'int'></div><div class="line">>>> 0x_FF_FF_FF_FF</div><div class="line">4294967295</div></pre></td></tr></table></figure>
<p>同时字符串格式化也支持这种下划线的格式化方式:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span><span class="string">'{:_}'</span>.format(<span class="number">1000000</span>)</div><div class="line"><span class="string">'1_000_000'</span></div><div class="line"><span class="meta">>>> </span><span class="string">'{:_x}'</span>.format(<span class="number">0xFFFFFFFF</span>)</div><div class="line"><span class="string">'ffff_ffff'</span></div><div class="line"><span class="meta">>>> </span><span class="string">'{:_X}'</span>.format(<span class="number">0xFFfFFFFF</span>)</div><div class="line"><span class="string">'FFFF_FFFF'</span></div></pre></td></tr></table></figure>
<p>当然也可以使用二进制<code>b</code>,八进制<code>o</code></p>
<h3 id="PEP-525-Asynchronous-Generators"><a href="#PEP-525-Asynchronous-Generators" class="headerlink" title="PEP 525: Asynchronous Generators"></a>PEP 525: Asynchronous Generators</h3><p>异步生成器,python3.6中可以在同一函数体中使用<code>await</code>和<code>yield</code></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Ticker</span>:</span></div><div class="line"> <span class="string">"""Yield numbers from 0 to `to` every `delay` seconds."""</span></div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, delay, to)</span>:</span></div><div class="line"> self.delay = delay</div><div class="line"> self.i = <span class="number">0</span></div><div class="line"> self.to = to</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__aiter__</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self</div><div class="line"></div><div class="line"> <span class="keyword">async</span> <span class="function"><span class="keyword">def</span> <span class="title">__anext__</span><span class="params">(self)</span>:</span></div><div class="line"> i = self.i</div><div class="line"> <span class="keyword">if</span> i >= self.to:</div><div class="line"> <span class="keyword">raise</span> StopAsyncIteration</div><div class="line"> self.i += <span class="number">1</span></div><div class="line"> <span class="keyword">if</span> i:</div><div class="line"> <span class="keyword">await</span> asyncio.sleep(self.delay)</div><div class="line"> <span class="keyword">return</span> i</div></pre></td></tr></table></figure>
<p>以上代码现在可以简写为:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">async</span> <span class="function"><span class="keyword">def</span> <span class="title">ticker</span><span class="params">(delay, to)</span>:</span></div><div class="line"> <span class="string">"""Yield numbers from 0 to `to` every `delay` seconds."""</span></div><div class="line"> <span class="keyword">for</span> i <span class="keyword">in</span> range(to):</div><div class="line"> <span class="keyword">yield</span> i</div><div class="line"> <span class="keyword">await</span> asyncio.sleep(delay)</div></pre></td></tr></table></figure>
<h3 id="PEP-530-Asynchronous-Comprehensions"><a href="#PEP-530-Asynchronous-Comprehensions" class="headerlink" title="PEP 530: Asynchronous Comprehensions"></a>PEP 530: Asynchronous Comprehensions</h3><p>可以在列表,元组,字典,生成器表达式中使用<code>async for</code>和<code>await</code></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">result = []</div><div class="line"><span class="keyword">async</span> <span class="keyword">for</span> i <span class="keyword">in</span> aiter():</div><div class="line"> <span class="keyword">if</span> i % <span class="number">2</span>:</div><div class="line"> result.append(i)</div></pre></td></tr></table></figure>
<p>可以简写为:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">result = [i <span class="keyword">async</span> <span class="keyword">for</span> i <span class="keyword">in</span> aiter() <span class="keyword">if</span> i % <span class="number">2</span>]</div></pre></td></tr></table></figure>
<p>有关<code>await</code>的例子:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">result = [<span class="keyword">await</span> fun() <span class="keyword">for</span> fun <span class="keyword">in</span> funcs <span class="keyword">if</span> <span class="keyword">await</span> condition()]</div></pre></td></tr></table></figure>
<h3 id="PEP-487-Simpler-customization-of-class-creation"><a href="#PEP-487-Simpler-customization-of-class-creation" class="headerlink" title="PEP 487: Simpler customization of class creation"></a>PEP 487: Simpler customization of class creation</h3><p>现在可以不用使用元类来自定义子类的创建</p>
<p>当子类被创建时,基类中的<code>__init_subclass__()</code>类方法将被调用</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">PluginBase</span>:</span></div><div class="line"> subclasses = []</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init_subclass__</span><span class="params">(cls, **kwargs)</span>:</span></div><div class="line"> super().__init_subclass__(**kwargs)</div><div class="line"> cls.subclasses.append(cls)</div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Plugin1</span><span class="params">(PluginBase)</span>:</span></div><div class="line"> <span class="keyword">pass</span></div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Plugin2</span><span class="params">(PluginBase)</span>:</span></div><div class="line"> <span class="keyword">pass</span></div></pre></td></tr></table></figure>
<h3 id="PEP-487-Descriptor-Protocol-Enhancements"><a href="#PEP-487-Descriptor-Protocol-Enhancements" class="headerlink" title="PEP 487: Descriptor Protocol Enhancements"></a>PEP 487: Descriptor Protocol Enhancements</h3><p>描述符中新增了<code>__set_name__()</code>方法,当描述符被实例化时,便会调用<code>__set_name__()</code>方法</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">IntField</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__get__</span><span class="params">(self, instance, owner)</span>:</span></div><div class="line"> <span class="keyword">return</span> instance.__dict__[self.name]</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set__</span><span class="params">(self, instance, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> <span class="keyword">not</span> isinstance(value, int):</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">f'expecting integer in <span class="subst">{self.name}</span>'</span>)</div><div class="line"> instance.__dict__[self.name] = value</div><div class="line"></div><div class="line"> <span class="comment"># this is the new initializer:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set_name__</span><span class="params">(self, owner, name)</span>:</span></div><div class="line"> self.name = name</div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Model</span>:</span></div><div class="line"> int_field = IntField() <span class="comment"># 将会调用__set_name__()方法,将属性名int_field保存起来</span></div></pre></td></tr></table></figure>
<h3 id="PEP-519-Adding-a-file-system-path-protocol"><a href="#PEP-519-Adding-a-file-system-path-protocol" class="headerlink" title="PEP 519: Adding a file system path protocol"></a>PEP 519: Adding a file system path protocol</h3><p>在大多数眼中,路径就是字符串或者是字节对象,以至于python标准库<code>pathlib</code>较少被使用。现在提供了一个<code>os.PathLike</code>接口,只要实现了<code>__fspath__()</code>方法,那么这个对象就表示是一个路径,并且可以使用<code>os.fspath()</code>,<code>os.fsdecode()</code>, 或者 <code>os.fsencode()</code>方法或者这个路径对象的字符串或字节表示</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span><span class="keyword">import</span> pathlib</div><div class="line"><span class="meta">>>> </span><span class="keyword">with</span> open(pathlib.Path(<span class="string">"README"</span>)) <span class="keyword">as</span> f:</div><div class="line"><span class="meta">... </span> contents = f.read()</div><div class="line">...</div><div class="line"><span class="meta">>>> </span><span class="keyword">import</span> os.path</div><div class="line"><span class="meta">>>> </span>os.path.splitext(pathlib.Path(<span class="string">"some_file.txt"</span>))</div><div class="line">(<span class="string">'some_file'</span>, <span class="string">'.txt'</span>)</div><div class="line"><span class="meta">>>> </span>os.path.join(<span class="string">"/a/b"</span>, pathlib.Path(<span class="string">"c"</span>))</div><div class="line"><span class="string">'/a/b/c'</span></div><div class="line"><span class="meta">>>> </span><span class="keyword">import</span> os</div><div class="line"><span class="meta">>>> </span>os.fspath(pathlib.Path(<span class="string">"some_file.txt"</span>))</div><div class="line"><span class="string">'some_file.txt'</span></div></pre></td></tr></table></figure>
<h3 id="PEP-529-Change-Windows-filesystem-encoding-to-UTF-8"><a href="#PEP-529-Change-Windows-filesystem-encoding-to-UTF-8" class="headerlink" title="PEP 529: Change Windows filesystem encoding to UTF-8"></a>PEP 529: Change Windows filesystem encoding to UTF-8</h3><p>现在的python3.6版本使得我们可以在windows平台是正确使用字节对象表示的路径,而不会造成数据丢失,事实上,该字节对象就是通过<code>sys.getfilesystemencoding()</code>编码的,也就是<code>UTF-8</code></p>
<h3 id="PEP-528-Change-Windows-console-encoding-to-UTF-8"><a href="#PEP-528-Change-Windows-console-encoding-to-UTF-8" class="headerlink" title="PEP 528: Change Windows console encoding to UTF-8"></a>PEP 528: Change Windows console encoding to UTF-8</h3><p>The default console on Windows will now accept all Unicode characters and provide correctly read str objects to Python code. <code>sys.stdin</code>, <code>sys.stdout</code> and<code>sys.stderr</code> now default to utf-8 encoding.</p>
<p>只想说,简直是福音,再也不用担心控制台输出乱码了。。。</p>
<h3 id="PEP-520-Preserving-Class-Attribute-Definition-Order"><a href="#PEP-520-Preserving-Class-Attribute-Definition-Order" class="headerlink" title="PEP 520: Preserving Class Attribute Definition Order"></a>PEP 520: Preserving Class Attribute Definition Order</h3><p>类中定义的属性的顺序在<code>__dict__</code>中将被保留</p>
<h3 id="PEP-468-Preserving-Keyword-Argument-Order"><a href="#PEP-468-Preserving-Keyword-Argument-Order" class="headerlink" title="PEP 468: Preserving Keyword Argument Order"></a>PEP 468: Preserving Keyword Argument Order</h3><p><code>**kwargs</code> in a function signature is now guaranteed to be an insertion-order-preserving mapping.</p>
<h4 id="New-dict-implementation"><a href="#New-dict-implementation" class="headerlink" title="New dict implementation"></a>New dict implementation</h4><p>新的dict实现,比原来的实现快20% 到25%不说,还保留了顺序,也就是说dict现在是有序的。。。所以要OrderedDict何用?不过,官方也说了,现在只是暂时这样,有可能之后的版本又变成无序的了</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>b = {<span class="string">'one'</span>: <span class="number">1</span>, <span class="string">'two'</span>: <span class="number">2</span>, <span class="string">'three'</span>: <span class="number">3</span>}</div><div class="line"><span class="meta">>>> </span>b</div><div class="line">{<span class="string">'one'</span>: <span class="number">1</span>, <span class="string">'two'</span>: <span class="number">2</span>, <span class="string">'three'</span>: <span class="number">3</span>}</div></pre></td></tr></table></figure>
<h3 id="其他改动"><a href="#其他改动" class="headerlink" title="其他改动"></a>其他改动</h3><p>添加了<a href="https://docs.python.org/3.6/library/secrets.html#module-secrets" target="_blank" rel="external"><code>secrets</code></a>模块</p>
<p>改进了<code>re</code>模块,在正则表达式中添加了修饰符跨度的支持,Examples: <code>'(i:p)ython'</code> matches <code>'python'</code> and <code>'Python'</code>, but not <code>'PYTHON'</code>; <code>'(?i)g(?-i:v)r'</code>matches <code>'GvR'</code> and <code>'gvr'</code>, but not <code>'GVR'</code></p>
<p>更多细节改动参考<a href="https://docs.python.org/3.6/whatsnew/3.6.html" target="_blank" rel="external">官网What’s New In Python 3.6</a></p>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://docs.python.org/3.6/whatsnew/3.6.html" target="_blank" rel="external">What’s New In Python 3.6</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="Python3-6"><a href="#Python3-6" class="headerlink" title="Python3.6"></a>Python3.6</h2><p>北京时间2016年12月23日晚上6点半左右,<a href="https://www.python.org/" target="_blank" rel="external">python官网</a>放出了python3.6.0正式版,安装后,可以看到windows版具体编译时间是2016年12月23日早上8点6分。可以说python3.6从测试到正式发布已经有很长一段时间了,并且官方表示,2017年初开始对3.6版本进行各种bug修复等改进,也就是3.6.x的版本,关于python3.6相较于3.5有哪些变化,请看<a href="https://docs.python.org/3.6/whatsnew/3.6.html" target="_blank" rel="external">What’s New In Python 3.6</a><br>本文主要讲解如何将工作环境从python3.5转到python3.6,以及python3.6新功能的介绍。</p>
<p><img src="https://www.python.org/static/img/python-logo.png" alt=""></p>
</summary>
<category term="Python" scheme="https://xin053.github.io/categories/Python/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
</entry>
<entry>
<title>cryptography加密库使用详解</title>
<link href="https://xin053.github.io/2016/12/20/cryptography%E5%8A%A0%E5%AF%86%E5%BA%93%E4%BD%BF%E7%94%A8%E8%AF%A6%E8%A7%A3/"/>
<id>https://xin053.github.io/2016/12/20/cryptography加密库使用详解/</id>
<published>2016-12-20T12:59:43.000Z</published>
<updated>2017-05-27T13:20:48.771Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="cryptography简介"><a href="#cryptography简介" class="headerlink" title="cryptography简介"></a>cryptography简介</h2><p>cryptography模块主要分为两类,一类是高层次的加密配方,也就是我们只用关心如何使用它提供的api,并不用关心具体加密过程等细节,这也是我们经常使用的。另一类是低层次的加密原语,如果对密码学不是很了解的话,使用加密原语构造自己的加密算法是很危险的。本片文章介绍高层次的对称加密api和低层次非对称的公钥私钥以及证书</p>
<a id="more"></a>
<h2 id="cryptography使用"><a href="#cryptography使用" class="headerlink" title="cryptography使用"></a>cryptography使用</h2><h3 id="Fernet-对称加密"><a href="#Fernet-对称加密" class="headerlink" title="Fernet(对称加密)"></a>Fernet(对称加密)</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.fernet <span class="keyword">import</span> Fernet</div><div class="line"></div><div class="line">key = Fernet.generate_key()</div><div class="line">key <span class="comment"># A URL-safe base64-encoded 32-byte key</span></div><div class="line"><span class="comment"># b'7A7idpk7MjmvTWqZf4_vWwvXwAJmmi4SFRnomqKTrB8='</span></div><div class="line">f = Fernet(key)</div><div class="line">token = f.encrypt(<span class="string">b"my deep dark secret"</span>)</div><div class="line">token</div><div class="line"><span class="comment"># b'gAAAAABYWUWYZywJx9l3UrSUMGa5OS3dlz15NpUuOu-Wk6UNsLnQmtDx2hGdRRhwe62EhzT7OuvLafjzwjf7fASFRLMBQPhq3fa2U_WsFcEUzCFR0ZcxJC8='</span></div><div class="line">f.decrypt(token)</div><div class="line"><span class="comment"># b'my deep dark secret'</span></div></pre></td></tr></table></figure>
<h4 id="Using-passwords-with-Fernet"><a href="#Using-passwords-with-Fernet" class="headerlink" title="Using passwords with Fernet"></a>Using passwords with Fernet</h4><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span><span class="keyword">import</span> base64</div><div class="line"><span class="meta">>>> </span><span class="keyword">import</span> os</div><div class="line"><span class="meta">>>> </span><span class="keyword">from</span> cryptography.fernet <span class="keyword">import</span> Fernet</div><div class="line"><span class="meta">>>> </span><span class="keyword">from</span> cryptography.hazmat.backends <span class="keyword">import</span> default_backend</div><div class="line"><span class="meta">>>> </span><span class="keyword">from</span> cryptography.hazmat.primitives <span class="keyword">import</span> hashes</div><div class="line"><span class="meta">>>> </span><span class="keyword">from</span> cryptography.hazmat.primitives.kdf.pbkdf2 <span class="keyword">import</span> PBKDF2HMAC</div><div class="line"><span class="meta">>>> </span>password = <span class="string">b"password"</span></div><div class="line"><span class="meta">>>> </span>salt = os.urandom(<span class="number">16</span>)</div><div class="line"><span class="meta">>>> </span>kdf = PBKDF2HMAC(</div><div class="line"><span class="meta">... </span> algorithm=hashes.SHA256(),</div><div class="line"><span class="meta">... </span> length=<span class="number">32</span>,</div><div class="line"><span class="meta">... </span> salt=salt,</div><div class="line"><span class="meta">... </span> iterations=<span class="number">100000</span>,</div><div class="line"><span class="meta">... </span> backend=default_backend()</div><div class="line"><span class="meta">... </span>)</div><div class="line"><span class="meta">>>> </span>key = base64.urlsafe_b64encode(kdf.derive(password))</div><div class="line"><span class="meta">>>> </span>f = Fernet(key)</div><div class="line"><span class="meta">>>> </span>token = f.encrypt(<span class="string">b"Secret message!"</span>)</div><div class="line"><span class="meta">>>> </span>token</div><div class="line"><span class="string">'...'</span></div><div class="line"><span class="meta">>>> </span>f.decrypt(token)</div><div class="line"><span class="string">'Secret message!'</span></div></pre></td></tr></table></figure>
<p>为了以后根据<code>password</code>得到<code>token</code>,需要保存好<code>salt</code></p>
<h3 id="X-509-数字证书标准"><a href="#X-509-数字证书标准" class="headerlink" title="X.509(数字证书标准)"></a>X.509(数字证书标准)</h3><p>数字证书是CA机构签名的含有服务器公钥以及其他网站相关信息的一种电子证书,用来说明该服务器(网站)确实是真的(官方的),而不是伪造的</p>
<p>这里主要使用的是非对称加密,也就是公钥和私钥(RSA),私钥用来签名,公钥用来验签</p>
<h4 id="Creating-a-Certificate-Signing-Request-CSR"><a href="#Creating-a-Certificate-Signing-Request-CSR" class="headerlink" title="Creating a Certificate Signing Request (CSR)"></a>Creating a Certificate Signing Request (CSR)</h4><p>When obtaining a certificate from a certificate authority (CA), the usual flow is:</p>
<ol>
<li>You generate a private/public key pair.</li>
<li>You create a request for a certificate, which is signed by your key (to prove that you own that key).</li>
<li>You give your CSR to a CA (but <em>not</em> the private key).</li>
<li>The CA validates that you own the resource (e.g. domain) you want a certificate for.</li>
<li>The CA gives you a certificate, signed by them, which identifies your public key, and the resource you are authenticated for.</li>
<li>You configure your server to use that certificate, combined with your private key, to server traffic.</li>
</ol>
<p>所以首先要生成密钥对:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.hazmat.backends <span class="keyword">import</span> default_backend</div><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives <span class="keyword">import</span> serialization</div><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives.asymmetric <span class="keyword">import</span> rsa</div><div class="line"></div><div class="line">key = rsa.generate_private_key(</div><div class="line"> public_exponent=<span class="number">65537</span>,</div><div class="line"> key_size=<span class="number">2048</span>,</div><div class="line"> backend=default_backend()</div><div class="line">)</div></pre></td></tr></table></figure>
<p>关于生成certificate signing request,请看<a href="https://cryptography.io/en/latest/x509/tutorial/#creating-a-certificate-signing-request-csr" target="_blank" rel="external">官方文档</a>,然后就可以将生成的证书发送给CA机构,待CA机构处理完,就会返回给你经过他们签名的数字证书,该数字证书也是用户用来核实我们网站的证书。</p>
<h4 id="RSA-常用操作"><a href="#RSA-常用操作" class="headerlink" title="RSA 常用操作"></a>RSA 常用操作</h4><h5 id="生成"><a href="#生成" class="headerlink" title="生成"></a>生成</h5><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.hazmat.backends <span class="keyword">import</span> default_backend</div><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives.asymmetric <span class="keyword">import</span> rsa</div><div class="line"></div><div class="line">private_key = rsa.generate_private_key(</div><div class="line"> public_exponent=<span class="number">65537</span>,</div><div class="line"> key_size=<span class="number">2048</span>,</div><div class="line"> backend=default_backend()</div><div class="line">)</div></pre></td></tr></table></figure>
<p>这样就生成了一个<code>RSAPrivateKey</code>对象。参数保持上面就可以了,具体参数解析看<a href="https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/#generation" target="_blank" rel="external">官方文档</a></p>
<p><strong>私钥公钥是成对生成的,所以当我们使用<code>generate_private_key</code>生成<code>RSAPrivateKey</code>对象时,我们可以通过生成的对象获取到<code>RSAPublicKey</code>对象</strong></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">public_key = private_key.public_key()</div></pre></td></tr></table></figure>
<p>当然,肯定是不可以从<code>RSAPublicKey</code>对象中获取到<code>RSAPrivateKey</code>对象的。</p>
<h5 id="从pem文件导入"><a href="#从pem文件导入" class="headerlink" title="从pem文件导入"></a>从pem文件导入</h5><p>也可以从一个pem格式的文件导入一个<code>RSAPrivateKey</code>对象</p>
<p>pem格式文件就是类似:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line">-----BEGIN CERTIFICATE-----</div><div class="line">MIICKjCCAZMCCQDQ8o4kHKdCPDANBgkqhkiG9w0BAQUFADB6MQswCQYDVQQGEwJV</div><div class="line">UzELMAkGA1UECBMCQ0ExCzAJBgNVBAcTAlNGMQ8wDQYDVQQKEwZKb3llbnQxEDAO</div><div class="line">BgNVBAsTB05vZGUuanMxDDAKBgNVBAMTA2NhMTEgMB4GCSqGSIb3DQEJARYRcnlA</div><div class="line">dGlueWNsb3Vkcy5vcmcwHhcNMTEwMzE0MTgyOTEyWhcNMzgwNzI5MTgyOTEyWjB9</div><div class="line">MQswCQYDVQQGEwJVUzELMAkGA1UECBMCQ0ExCzAJBgNVBAcTAlNGMQ8wDQYDVQQK</div><div class="line">EwZKb3llbnQxEDAOBgNVBAsTB05vZGUuanMxDzANBgNVBAMTBmFnZW50MTEgMB4G</div><div class="line">CSqGSIb3DQEJARYRcnlAdGlueWNsb3Vkcy5vcmcwXDANBgkqhkiG9w0BAQEFAANL</div><div class="line">ADBIAkEAnzpAqcoXZxWJz/WFK7BXwD23jlREyG11x7gkydteHvn6PrVBbB5yfu6c</div><div class="line">bk8w3/Ar608AcyMQ9vHjkLQKH7cjEQIDAQABMA0GCSqGSIb3DQEBBQUAA4GBAKha</div><div class="line">HqjCfTIut+m/idKy3AoFh48tBHo3p9Nl5uBjQJmahKdZAaiksL24Pl+NzPQ8LIU+</div><div class="line">FyDHFp6OeJKN6HzZ72Bh9wpBVu6Uj1hwhZhincyTXT80wtSI/BoUAW8Ls2kwPdus</div><div class="line">64LsJhhxqj2m4vPKNRbHB2QxnNrGi30CUf3kt3Ia</div><div class="line">-----END CERTIFICATE-----</div></pre></td></tr></table></figure>
<p><strong><em>A PEM block which starts with <code>-----BEGIN CERTIFICATE-----</code> is not a public or private key, it’s an<a href="https://cryptography.io/en/latest/x509/" target="_blank" rel="external">X.509 Certificate</a>. You can load it using <a href="https://cryptography.io/en/latest/x509/reference/#cryptography.x509.load_pem_x509_certificate" target="_blank" rel="external"><code>load_pem_x509_certificate()</code></a> and extract the public key with <a href="https://cryptography.io/en/latest/x509/reference/#cryptography.x509.Certificate.public_key" target="_blank" rel="external"><code>Certificate.public_key</code></a></em></strong></p>
<p>当然这个文件也可以被加密,我们使用如下方法从pem文件中导入<code>RSAPrivateKey</code>对象</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives <span class="keyword">import</span> serialization</div><div class="line"></div><div class="line"><span class="keyword">with</span> open(<span class="string">"path/to/key.pem"</span>, <span class="string">"rb"</span>) <span class="keyword">as</span> key_file:</div><div class="line"> private_key = serialization.load_pem_private_key(</div><div class="line"> key_file.read(),</div><div class="line"> password=<span class="keyword">None</span>,</div><div class="line"> backend=default_backend()</div><div class="line"> )</div></pre></td></tr></table></figure>
<p>同理也可以从cer文件和ssh格式文件中导入私钥或公钥。</p>
<h5 id="序列化"><a href="#序列化" class="headerlink" title="序列化"></a>序列化</h5><p><code>RSAPrivateKey</code>对象和<code>RSAPublicKey</code>对象都可以序列化为pem文件</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives <span class="keyword">import</span> serialization</div><div class="line"></div><div class="line">pem = private_key.private_bytes(</div><div class="line"> encoding=serialization.Encoding.PEM,</div><div class="line"> format=serialization.PrivateFormat.PKCS8,</div><div class="line"> encryption_algorithm=serialization.BestAvailableEncryption(<span class="string">b'mypassword'</span>)</div><div class="line">)</div><div class="line"></div><div class="line">pem.splitlines()</div><div class="line"><span class="comment"># [b'-----BEGIN ENCRYPTED PRIVATE KEY-----',</span></div><div class="line"><span class="comment"># b'MIIFHzBJBgkqhkiG9w0BBQ0wPDAbBgkqhkiG9w0BBQwwDgQI4LyuGo+hDoACAggA',</span></div><div class="line"><span class="comment"># b'MB0GCWCGSAFlAwQBKgQQGuA8UxHCt7qLEF29noqffQSCBNBH0rZH59FTTWaPWEV/',</span></div><div class="line"><span class="comment"># ......</span></div><div class="line"><span class="comment"># b'Y6Dt0ACOPHcd8Z2Y9MTJ0QFY8A==',</span></div><div class="line"><span class="comment"># b'-----END ENCRYPTED PRIVATE KEY-----']</span></div></pre></td></tr></table></figure>
<p>强烈建议对私钥进行序列化的时候用自己的密钥进行加密,这样不会将私钥完全暴露</p>
<p><strong>我们之所以说上述过程是序列化,而不是保存私钥,是因为该pem文件不止包含私钥,还包括一些有关私钥的重要信息,具体pem格式请查阅相关文档。而且实际上用的时候并不需要我们手动对pem文件进行解析,只用使用库提供的api就行</strong></p>
<p>也可以不加密,改变如下</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">encryption_algorithm=serialization.NoEncryption()</div></pre></td></tr></table></figure>
<p>对于公钥的序列化,如下:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives <span class="keyword">import</span> serialization</div><div class="line">public_key = private_key.public_key()</div><div class="line"></div><div class="line">pem = public_key.public_bytes(</div><div class="line"> encoding=serialization.Encoding.PEM,</div><div class="line"> format=serialization.PublicFormat.SubjectPublicKeyInfo</div><div class="line">)</div><div class="line"></div><div class="line">pem.splitlines()</div><div class="line"><span class="comment"># [b'-----BEGIN PUBLIC KEY-----',</span></div><div class="line"><span class="comment"># b'MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAtboyGrCz1JIVru4+eoKG',</span></div><div class="line"><span class="comment"># b'n/adEsavPDb2FQ6/UkIum392ni/Q9H27chliPXEZWZmEorbJvWeHupuL0ld3IWXi',</span></div><div class="line"><span class="comment"># ......</span></div><div class="line"><span class="comment"># b'LwIDAQAB',</span></div><div class="line"><span class="comment"># b'-----END PUBLIC KEY-----']</span></div></pre></td></tr></table></figure>
<h5 id="签名"><a href="#签名" class="headerlink" title="签名"></a>签名</h5><p>使用私钥可以对一段信息进行签名,然后别人就可以使用公钥进行验证。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives <span class="keyword">import</span> hashes</div><div class="line"><span class="keyword">from</span> cryptography.hazmat.primitives.asymmetric <span class="keyword">import</span> padding</div><div class="line"></div><div class="line">signer = private_key.signer(</div><div class="line"> padding.PSS(</div><div class="line"> mgf=padding.MGF1(hashes.SHA256()),</div><div class="line"> salt_length=padding.PSS.MAX_LENGTH</div><div class="line"> ),</div><div class="line"> hashes.SHA256()</div><div class="line">)</div><div class="line"></div><div class="line">message = <span class="string">b"A message I want to sign"</span></div><div class="line">signer.update(message)</div><div class="line">signature = signer.finalize()</div><div class="line"></div><div class="line">signature</div><div class="line"><span class="comment"># b'\x19\x87!5\xc0\xe3s\x01M\xa5-\xf3......\xce\xf5\x03=F\xb3\xd5\xd1\xf9\xc2\xf2\xbak'</span></div></pre></td></tr></table></figure>
<p><code>padding</code>也就是填充,就是将不够长度的信息填充成指定长度(这里为256),具体为什么需要填充请参考SHA256算法实现</p>
<p>也可以使用更简单的方法进行签名:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line">message = <span class="string">b"A message I want to sign"</span></div><div class="line">signature = private_key.sign(</div><div class="line"> message,</div><div class="line"> padding.PSS(</div><div class="line"> mgf=padding.MGF1(hashes.SHA256()),</div><div class="line"> salt_length=padding.PSS.MAX_LENGTH</div><div class="line"> ),</div><div class="line"> hashes.SHA256()</div><div class="line">)</div></pre></td></tr></table></figure>
<h5 id="验证"><a href="#验证" class="headerlink" title="验证"></a>验证</h5><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line">public_key = private_key.public_key()</div><div class="line">verifier = public_key.verifier(</div><div class="line"> signature,</div><div class="line"> padding.PSS(</div><div class="line"> mgf=padding.MGF1(hashes.SHA256()),</div><div class="line"> salt_length=padding.PSS.MAX_LENGTH</div><div class="line"> ),</div><div class="line"> hashes.SHA256()</div><div class="line">)</div><div class="line"></div><div class="line">verifier.update(message)</div><div class="line">verifier.verify()</div></pre></td></tr></table></figure>
<p>如果验证不通过,将会触发异常,同样,也有以下简单的方式进行验证:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line">public_key.verify(</div><div class="line"> signature,</div><div class="line"> message,</div><div class="line"> padding.PSS(</div><div class="line"> mgf=padding.MGF1(hashes.SHA256()),</div><div class="line"> salt_length=padding.PSS.MAX_LENGTH</div><div class="line"> ),</div><div class="line"> hashes.SHA256()</div><div class="line">)</div></pre></td></tr></table></figure>
<h5 id="加密"><a href="#加密" class="headerlink" title="加密"></a>加密</h5><p><strong>使用私钥对信息加密没有意义,因为全世界都有你的公钥,毕竟公钥是公开的</strong>,当然,如果你不公开你的公钥,那更失去了意义,所以加密指的是用公钥进行加密,然后我们使用私钥来解密</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line">message = <span class="string">b"encrypted data"</span></div><div class="line">ciphertext = public_key.encrypt(</div><div class="line"> message,</div><div class="line"> padding.OAEP(</div><div class="line"> mgf=padding.MGF1(algorithm=hashes.SHA1()),</div><div class="line"> algorithm=hashes.SHA1(),</div><div class="line"> label=<span class="keyword">None</span></div><div class="line"> )</div><div class="line">)</div><div class="line"></div><div class="line">ciphertext</div><div class="line"><span class="comment"># b'J\x95\xadC\xa9......\x18\xbb\\\xa3\xb3\x13f_N\x89\x07`\xa1'</span></div></pre></td></tr></table></figure>
<h5 id="解密"><a href="#解密" class="headerlink" title="解密"></a>解密</h5><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div></pre></td><td class="code"><pre><div class="line">plaintext = private_key.decrypt(</div><div class="line"> ciphertext,</div><div class="line"> padding.OAEP(</div><div class="line"> mgf=padding.MGF1(algorithm=hashes.SHA1()),</div><div class="line"> algorithm=hashes.SHA1(),</div><div class="line"> label=<span class="keyword">None</span></div><div class="line"> )</div><div class="line">)</div><div class="line"></div><div class="line">plaintext</div><div class="line"><span class="comment"># b'encrypted data'</span></div></pre></td></tr></table></figure>
<p>可以看到目前对公钥私钥的操作很多都是使用固定参数就完全够了,所以可以对此进一步封装,于是就出现了<a href="https://github.com/istommao/cryptokit/blob/master/cryptokit/rsa.py" target="_blank" rel="external">该项目</a></p>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://cryptography.io/en/latest/" target="_blank" rel="external">cryptography官方文档</a></li>
<li><a href="https://cryptography.io/en/latest/hazmat/primitives/asymmetric/rsa/" target="_blank" rel="external">cryptography RSA</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="cryptography简介"><a href="#cryptography简介" class="headerlink" title="cryptography简介"></a>cryptography简介</h2><p>cryptography模块主要分为两类,一类是高层次的加密配方,也就是我们只用关心如何使用它提供的api,并不用关心具体加密过程等细节,这也是我们经常使用的。另一类是低层次的加密原语,如果对密码学不是很了解的话,使用加密原语构造自己的加密算法是很危险的。本片文章介绍高层次的对称加密api和低层次非对称的公钥私钥以及证书</p>
</summary>
<category term="Python模块学习" scheme="https://xin053.github.io/categories/Python%E6%A8%A1%E5%9D%97%E5%AD%A6%E4%B9%A0/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
<category term="cryptography" scheme="https://xin053.github.io/tags/cryptography/"/>
</entry>
<entry>
<title>yagmail邮件发送库使用详解</title>
<link href="https://xin053.github.io/2016/12/17/yagmail%E9%82%AE%E4%BB%B6%E5%8F%91%E9%80%81%E5%BA%93%E4%BD%BF%E7%94%A8%E8%AF%A6%E8%A7%A3/"/>
<id>https://xin053.github.io/2016/12/17/yagmail邮件发送库使用详解/</id>
<published>2016-12-17T08:26:07.000Z</published>
<updated>2017-05-27T13:20:48.775Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="yagmail简介"><a href="#yagmail简介" class="headerlink" title="yagmail简介"></a>yagmail简介</h2><p>使用python标准库进行邮件的处理比较复杂,所以产生了yagmail,但是yagmail目前只能用SMTP协议进行邮件发送,并不能读取邮件,也不支持其他的邮件相关协议,但是对于一般使用完全够了。</p>
<p><img src="https://github.com/kootenpv/yagmail/raw/master/resources/icon.png" style="zoom:35%"></p>
<a id="more"></a>
<h2 id="yagmail使用"><a href="#yagmail使用" class="headerlink" title="yagmail使用"></a>yagmail使用</h2><p>首先是通过<code>yagmail.SMTP()</code>生成一个客户端,但是为了不将我们的密码暴露下脚本文件中,yagmail使用<a href="https://github.com/jaraco/keyring/" target="_blank" rel="external">keyring</a>模块将密码存放在系统keyring服务中。</p>
<p>关于keyring是什么,请看:<a href="https://askubuntu.com/questions/32164/what-does-a-keyring-do" target="_blank" rel="external">What does a Keyring do?</a></p>
<p>官方文档中,</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">yagmail.register(<span class="string">'mygmailusername'</span>, <span class="string">'mygmailpassword'</span>)</div></pre></td></tr></table></figure>
<p>实际上是对<code>keyring.set_password('yagmail', 'mygmailusername', 'mygmailpassword')</code>的封装。</p>
<p><code>SMTP()</code>方法会去用户主文件夹读取<code>.yagmail</code>文件,但是以上操作并不会生成这个文件,所以需要自己创建,并将自己的邮箱写入文件中。</p>
<p>例如,我测试过程中写入<code>.yagmail</code>文件中的内容为:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">[email protected]</div></pre></td></tr></table></figure>
<p>而之前我已经通过<code>register()</code>方法将该邮箱的密码保存到了系统keyring中,所以接下来就可以初始化一个SMTP客户端</p>
<p>另外还需要注意的是,经过测试,163邮箱很容易将邮件识别为垃圾邮件,导致邮件发送错误,而qq邮箱需要关闭<a href="https://aq.qq.com/cn2/safe_service/device_lock" target="_blank" rel="external">邮件保护</a>,其他邮箱没有测试,这里推荐使用qq邮箱。</p>
<h3 id="常用邮箱SMTP服务器地址和端口"><a href="#常用邮箱SMTP服务器地址和端口" class="headerlink" title="常用邮箱SMTP服务器地址和端口"></a>常用邮箱SMTP服务器地址和端口</h3><figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div><div class="line">60</div><div class="line">61</div><div class="line">62</div><div class="line">63</div><div class="line">64</div><div class="line">65</div><div class="line">66</div><div class="line">67</div><div class="line">68</div><div class="line">69</div><div class="line">70</div><div class="line">71</div><div class="line">72</div><div class="line">73</div><div class="line">74</div><div class="line">75</div><div class="line">76</div><div class="line">77</div><div class="line">78</div><div class="line">79</div></pre></td><td class="code"><pre><div class="line">sina.com: </div><div class="line">POP3服务器地址:pop3.sina.com.cn(端口:110) </div><div class="line">SMTP服务器地址:smtp.sina.com.cn(端口:25) </div><div class="line"></div><div class="line">sinaVIP: </div><div class="line">POP3服务器:pop3.vip.sina.com (端口:110) </div><div class="line">SMTP服务器:smtp.vip.sina.com (端口:25) </div><div class="line"></div><div class="line">sohu.com: </div><div class="line">POP3服务器地址:pop3.sohu.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.sohu.com(端口:25) </div><div class="line"></div><div class="line">126邮箱: </div><div class="line">POP3服务器地址:pop.126.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.126.com(端口:25) </div><div class="line"></div><div class="line">139邮箱: </div><div class="line">POP3服务器地址:POP.139.com(端口:110) </div><div class="line">SMTP服务器地址:SMTP.139.com(端口:25) </div><div class="line"></div><div class="line">163.com: </div><div class="line">POP3服务器地址:pop.163.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.163.com(端口:25) </div><div class="line"></div><div class="line">QQ邮箱 </div><div class="line">POP3服务器地址:pop.qq.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.qq.com (端口:25) </div><div class="line"></div><div class="line">QQ企业邮箱 </div><div class="line">POP3服务器地址:pop.exmail.qq.com (SSL启用 端口:995) </div><div class="line">SMTP服务器地址:smtp.exmail.qq.com(SSL启用 端口:587/465)</div><div class="line"></div><div class="line">yahoo.com: </div><div class="line">POP3服务器地址:pop.mail.yahoo.com </div><div class="line">SMTP服务器地址:smtp.mail.yahoo.com </div><div class="line"></div><div class="line">yahoo.com.cn: </div><div class="line">POP3服务器地址:pop.mail.yahoo.com.cn(端口:995) </div><div class="line">SMTP服务器地址:smtp.mail.yahoo.com.cn(端口:587) </div><div class="line"></div><div class="line">HotMail </div><div class="line">POP3服务器地址:pop3.live.com (端口:995) </div><div class="line">SMTP服务器地址:smtp.live.com (端口:587) </div><div class="line"></div><div class="line">gmail(google.com) </div><div class="line">POP3服务器地址:pop.gmail.com(SSL启用 端口:995) </div><div class="line">SMTP服务器地址:smtp.gmail.com(SSL启用 端口:587) </div><div class="line"></div><div class="line">263.net: </div><div class="line">POP3服务器地址:pop3.263.net(端口:110) </div><div class="line">SMTP服务器地址:smtp.263.net(端口:25) </div><div class="line"></div><div class="line">263.net.cn: </div><div class="line">POP3服务器地址:pop.263.net.cn(端口:110) </div><div class="line">SMTP服务器地址:smtp.263.net.cn(端口:25) </div><div class="line"></div><div class="line">x263.net: </div><div class="line">POP3服务器地址:pop.x263.net(端口:110) </div><div class="line">SMTP服务器地址:smtp.x263.net(端口:25) </div><div class="line"></div><div class="line">21cn.com: </div><div class="line">POP3服务器地址:pop.21cn.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.21cn.com(端口:25) </div><div class="line"></div><div class="line">Foxmail: </div><div class="line">POP3服务器地址:POP.foxmail.com(端口:110) </div><div class="line">SMTP服务器地址:SMTP.foxmail.com(端口:25) </div><div class="line"></div><div class="line">china.com: </div><div class="line">POP3服务器地址:pop.china.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.china.com(端口:25) </div><div class="line"></div><div class="line">tom.com: </div><div class="line">POP3服务器地址:pop.tom.com(端口:110) </div><div class="line">SMTP服务器地址:smtp.tom.com(端口:25) </div><div class="line"></div><div class="line">etang.com: </div><div class="line">POP3服务器地址:pop.etang.com </div><div class="line">SMTP服务器地址:smtp.etang.com</div></pre></td></tr></table></figure>
<p><code>yagmail.SMTP()</code>默认使用的gmail的SMTP服务,所以我们如果使用qq邮箱,则使用如下代码初始化一个SMTP客户端</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">yag = yagmail.SMTP(<span class="string">'[email protected]'</span>, host=<span class="string">'smtp.qq.com'</span>, port=<span class="string">'25'</span>)</div></pre></td></tr></table></figure>
<p>紧接着就可以发送邮件了</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">yag.send(<span class="string">'[email protected]'</span>, <span class="string">'邮件主题'</span>, <span class="string">'这是邮件内容'</span>)</div></pre></td></tr></table></figure>
<p>至此,便像<code>[email protected]</code>这个邮箱发送了一封邮件。</p>
<p>注意<code>send()</code>方法的定义:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line"><span class="function"><span class="keyword">def</span> <span class="title">send</span><span class="params">(self, to=None, subject=None, contents=None, attachments=None, cc=None, bcc=None,preview_only=False, validate_email=True, throw_invalid_exception=False, headers=None)</span></span></div></pre></td></tr></table></figure>
<p>如果不指定<code>to</code>参数,则发送给自己,如果<code>to</code>参数是一个列表,则将该邮件发送给列表中的所有用户,<code>attachments</code>表示附件,该参数可以是列表,表示发送多个附件</p>
<p>对于<code>contents</code>参数,官方说明如下:</p>
<ul>
<li>If it is a dictionary it will assume the key is the content and the value is an alias (only for images currently!) e.g. {‘/path/to/image.png’ : ‘MyPicture’}</li>
<li>It will try to see if the content (string) can be read as a file locally, e.g. ‘/path/to/image.png’</li>
<li>if impossible, it will check if the string is valid html e.g. <code>This is a big title</code></li>
<li>if not, it must be text. e.g. ‘Hi Dorika!’</li>
</ul>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://github.com/kootenpv/yagmail#no-more-password-and-username" target="_blank" rel="external">yagmail官方文档</a></li>
<li><a href="http://wenku.baidu.com/link?url=dzf8yMnLf6TwrW44kjjl364hD_qSkRsjtc3T9nUuxwjrzo6ohG-9RxJSES5YupoXuzYe2S4vYRCcTvCE8mwH_8EJEqZOslUxo_nxQmtqAXi" target="_blank" rel="external">常用的邮箱服务器(SMTP、POP3)地址、端口</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="yagmail简介"><a href="#yagmail简介" class="headerlink" title="yagmail简介"></a>yagmail简介</h2><p>使用python标准库进行邮件的处理比较复杂,所以产生了yagmail,但是yagmail目前只能用SMTP协议进行邮件发送,并不能读取邮件,也不支持其他的邮件相关协议,但是对于一般使用完全够了。</p>
<p><img src="https://github.com/kootenpv/yagmail/raw/master/resources/icon.png" style="zoom:35%"></p>
</summary>
<category term="Python模块学习" scheme="https://xin053.github.io/categories/Python%E6%A8%A1%E5%9D%97%E5%AD%A6%E4%B9%A0/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
<category term="yagmail" scheme="https://xin053.github.io/tags/yagmail/"/>
</entry>
<entry>
<title>计算机重点问题集锦</title>
<link href="https://xin053.github.io/2016/12/10/%E8%AE%A1%E7%AE%97%E6%9C%BA%E9%87%8D%E7%82%B9%E9%97%AE%E9%A2%98%E9%9B%86%E9%94%A6/"/>
<id>https://xin053.github.io/2016/12/10/计算机重点问题集锦/</id>
<published>2016-12-10T08:10:12.000Z</published>
<updated>2017-05-27T13:20:48.775Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h2><p>计算机行业重点问题,需要深入理解,<strong>持续更新</strong></p>
<a id="more"></a>
<h2 id="阻塞非阻塞与同步异步以及并发并行的区别"><a href="#阻塞非阻塞与同步异步以及并发并行的区别" class="headerlink" title="阻塞非阻塞与同步异步以及并发并行的区别"></a>阻塞非阻塞与同步异步以及并发并行的区别</h2><ul>
<li><a href="https://www.zhihu.com/question/19732473/answer/14413599" target="_blank" rel="external">怎样理解阻塞非阻塞与同步异步的区别?</a></li>
<li><a href="http://blog.csdn.net/qq_24541459/article/details/51704918" target="_blank" rel="external">多线程与异步的区别</a></li>
<li><a href="深入理解并发/并行,阻塞/非阻塞,同步/异步">深入理解并发/并行,阻塞/非阻塞,同步/异步</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h2><p>计算机行业重点问题,需要深入理解,<strong>持续更新</strong></p>
</summary>
<category term="WeNeedToKnow" scheme="https://xin053.github.io/categories/WeNeedToKnow/"/>
<category term="集锦" scheme="https://xin053.github.io/tags/%E9%9B%86%E9%94%A6/"/>
</entry>
<entry>
<title>Scrapy爬虫库使用详解</title>
<link href="https://xin053.github.io/2016/12/10/Scrapy%E7%88%AC%E8%99%AB%E5%BA%93%E4%BD%BF%E7%94%A8%E8%AF%A6%E8%A7%A3/"/>
<id>https://xin053.github.io/2016/12/10/Scrapy爬虫库使用详解/</id>
<published>2016-12-10T04:36:04.000Z</published>
<updated>2017-05-27T13:20:48.771Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="Scrapy简介"><a href="#Scrapy简介" class="headerlink" title="Scrapy简介"></a>Scrapy简介</h2><p><img src="https://scrapy.org/img/scrapylogo.png" alt=""></p>
<p>scrapy发出的请求是异步的,默认过滤掉相同的url。能做html/xml解析,数据能导出多种格式,还有强大的插件系统</p>
<p>scrapy(1.2.2)目前支持python 3,但是官方文档是也有说明,并不支持windows平台上的python3,因为scrapy的核心依赖<code>Twisted</code>目前并不支持windows平台上的python 3,所以知乎上有人推荐使用python 2.7,并需要安装<a href="https://www.microsoft.com/en-us/download/details.aspx?id=44266" target="_blank" rel="external">Visual C++ Compiler for Python 2.7</a>,并且window10 也支持这个软件,但是按照python开发者手册上的说明,<a href="https://docs.python.org/devguide/#status-of-python-branches" target="_blank" rel="external">python2.7只会维护到2020年</a>,并且python的未来也是指向python 3,基本上主流库都支持了python 3,并且很多库已经开始不支持python 2了,所以这里我还是想使用python 3.</p>
<p>关于为什么不支持windows平台,原因是windows上不能编译scrapy的依赖<code>lxml</code>和<code>Twisted</code>,但是我们可以下载已经编译好的<code>whl</code>包,用<code>pip</code>安装即可,详情,可以参考这篇博客: <a href="https://my.oschina.net/wangyuefive/blog/784171" target="_blank" rel="external">python 3.5 + scrapy1.2 windows下的安装</a></p>
<a id="more"></a>
<h2 id="Scrapy使用"><a href="#Scrapy使用" class="headerlink" title="Scrapy使用"></a>Scrapy使用</h2><h3 id="创建项目"><a href="#创建项目" class="headerlink" title="创建项目"></a>创建项目</h3><figure class="highlight powershell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">scrapy startproject test_scrapy</div></pre></td></tr></table></figure>
<p>将会在当前工作目录下创建<code>test_scrapy</code>文件夹,文件下下有以下内容:</p>
<figure class="highlight plain"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line">test_scrapy/</div><div class="line"> scrapy.cfg # deploy configuration file</div><div class="line"></div><div class="line"> test_scrapy/ # project's Python module, you'll import your code from here</div><div class="line"> __init__.py</div><div class="line"></div><div class="line"> items.py # project items definition file</div><div class="line"></div><div class="line"> middlewares.py # Define here the models for your spider middleware</div><div class="line"></div><div class="line"> pipelines.py # project pipelines file</div><div class="line"></div><div class="line"> settings.py # project settings file</div><div class="line"></div><div class="line"> spiders/ # a directory where you'll later put your spiders</div><div class="line"> __init__.py</div></pre></td></tr></table></figure>
<h3 id="第一个爬虫"><a href="#第一个爬虫" class="headerlink" title="第一个爬虫"></a>第一个爬虫</h3><p>我们编写的爬虫类必须继承<code>scrapy.Spider</code>并定义好初始请求链接,并且应该将文件放置在<code>spiders</code>目录下。</p>
<p>我们在<code>spiders</code>目录下创建<code>quotes_spider.py</code>:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> scrapy</div><div class="line"></div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">QuotesSpider</span><span class="params">(scrapy.Spider)</span>:</span></div><div class="line"> name = <span class="string">"quotes"</span></div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">start_requests</span><span class="params">(self)</span>:</span></div><div class="line"> urls = [</div><div class="line"> <span class="string">'http://quotes.toscrape.com/page/1/'</span>,</div><div class="line"> <span class="string">'http://quotes.toscrape.com/page/2/'</span>,</div><div class="line"> ]</div><div class="line"> <span class="keyword">for</span> url <span class="keyword">in</span> urls:</div><div class="line"> <span class="keyword">yield</span> scrapy.Request(url=url, callback=self.parse)</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse</span><span class="params">(self, response)</span>:</span></div><div class="line"> page = response.url.split(<span class="string">"/"</span>)[<span class="number">-2</span>]</div><div class="line"> filename = <span class="string">'quotes-%s.html'</span> % page</div><div class="line"> <span class="keyword">with</span> open(filename, <span class="string">'wb'</span>) <span class="keyword">as</span> f:</div><div class="line"> f.write(response.body)</div><div class="line"> self.log(<span class="string">'Saved file %s'</span> % filename)</div></pre></td></tr></table></figure>
<p><code>name</code>是spider名称,同一项目中不能同名</p>
<p><code>start_requests()</code>必须返回可迭代的<code>Requests</code>(一个<code>Requests</code>列表或者是生成器对象),这些请求是爬虫初始的爬取对象.scrapy提供一种简单实现<code>start_requests()</code>的方式,就是使用<code>start_urls</code>列表,该列表在后台会被自动封装成<code>Requests</code>生成器并使用默认的回掉函数<code>parse()</code></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> scrapy</div><div class="line"></div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">QuotesSpider</span><span class="params">(scrapy.Spider)</span>:</span></div><div class="line"> name = <span class="string">"quotes"</span></div><div class="line"> start_urls = [</div><div class="line"> <span class="string">'http://quotes.toscrape.com/page/1/'</span>,</div><div class="line"> <span class="string">'http://quotes.toscrape.com/page/2/'</span>,</div><div class="line"> ]</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse</span><span class="params">(self, response)</span>:</span></div><div class="line"> page = response.url.split(<span class="string">"/"</span>)[<span class="number">-2</span>]</div><div class="line"> filename = <span class="string">'quotes-%s.html'</span> % page</div><div class="line"> <span class="keyword">with</span> open(filename, <span class="string">'wb'</span>) <span class="keyword">as</span> f:</div><div class="line"> f.write(response.body)</div></pre></td></tr></table></figure>
<p><code>parse()</code>是默认的回调函数。<code>Request</code>可以设置得到响应后的回调函数。</p>
<h3 id="运行爬虫"><a href="#运行爬虫" class="headerlink" title="运行爬虫"></a>运行爬虫</h3><p>在项目的根目录执行:</p>
<figure class="highlight powershell"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">scrapy crawl quotes</div></pre></td></tr></table></figure>
<p><code>quotes</code>是爬虫名</p>
<p>将会看到以下输出:</p>
<figure class="highlight powershell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line">...</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">27</span> [scrapy] INFO: Spider opened</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">27</span> [scrapy] INFO: Crawled <span class="number">0</span> pages (at <span class="number">0</span> pages/min), scraped <span class="number">0</span> items (at <span class="number">0</span> items/min)</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">27</span> [scrapy] DEBUG: Telnet console listening on <span class="number">127.0</span>.<span class="number">0.1</span>:<span class="number">6023</span></div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">28</span> [scrapy] DEBUG: Crawled (<span class="number">404</span>) <GET http://quotes.toscrape.com/robots.txt> (referer: None)</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">28</span> [scrapy] DEBUG: Crawled (<span class="number">200</span>) <GET http://quotes.toscrape.com/page/<span class="number">1</span>/> (referer: None)</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">28</span> [quotes] DEBUG: Saved file quotes-<span class="number">1</span>.html</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">29</span> [scrapy] DEBUG: Crawled (<span class="number">200</span>) <GET http://quotes.toscrape.com/page/<span class="number">2</span>/> (referer: None)</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">29</span> [quotes] DEBUG: Saved file quotes-<span class="number">2</span>.html</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">29</span> [scrapy] INFO: Closing spider (finished)</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">29</span> [scrapy] INFO: Dumping Scrapy stats:</div><div class="line">{<span class="string">'downloader/request_bytes'</span>: <span class="number">675</span>,</div><div class="line"> <span class="string">'downloader/request_count'</span>: <span class="number">3</span>,</div><div class="line"> <span class="string">'downloader/request_method_count/GET'</span>: <span class="number">3</span>,</div><div class="line"> <span class="string">'downloader/response_bytes'</span>: <span class="number">5976</span>,</div><div class="line"> <span class="string">'downloader/response_count'</span>: <span class="number">3</span>,</div><div class="line"> <span class="string">'downloader/response_status_count/200'</span>: <span class="number">2</span>,</div><div class="line"> <span class="string">'downloader/response_status_count/404'</span>: <span class="number">1</span>,</div><div class="line"> <span class="string">'finish_reason'</span>: <span class="string">'finished'</span>,</div><div class="line"> <span class="string">'finish_time'</span>: datetime.datetime(<span class="number">2016</span>, <span class="number">12</span>, <span class="number">11</span>, <span class="number">6</span>, <span class="number">39</span>, <span class="number">29</span>, <span class="number">492581</span>),</div><div class="line"> <span class="string">'log_count/DEBUG'</span>: <span class="number">6</span>,</div><div class="line"> <span class="string">'log_count/INFO'</span>: <span class="number">7</span>,</div><div class="line"> <span class="string">'response_received_count'</span>: <span class="number">3</span>,</div><div class="line"> <span class="string">'scheduler/dequeued'</span>: <span class="number">2</span>,</div><div class="line"> <span class="string">'scheduler/dequeued/memory'</span>: <span class="number">2</span>,</div><div class="line"> <span class="string">'scheduler/enqueued'</span>: <span class="number">2</span>,</div><div class="line"> <span class="string">'scheduler/enqueued/memory'</span>: <span class="number">2</span>,</div><div class="line"> <span class="string">'start_time'</span>: datetime.datetime(<span class="number">2016</span>, <span class="number">12</span>, <span class="number">11</span>, <span class="number">6</span>, <span class="number">39</span>, <span class="number">27</span>, <span class="number">724826</span>)}</div><div class="line"><span class="number">2016</span>-<span class="number">12</span>-<span class="number">11</span> <span class="number">14</span>:<span class="number">39</span>:<span class="number">29</span> [scrapy] INFO: Spider closed (finished)</div></pre></td></tr></table></figure>
<p>并在根目录生成<code>quotes-1.html</code>和<code>quotes-2.html</code></p>
<h3 id="解析网页"><a href="#解析网页" class="headerlink" title="解析网页"></a>解析网页</h3><p>使用类选择器对html/xml进行解析,同时scrapy也支持<a href="http://www.w3school.com.cn/xpath/" target="_blank" rel="external">XPath表达式</a></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title'</span>)</div><div class="line">[<Selector xpath=<span class="string">'descendant-or-self::title'</span> data=<span class="string">'<title>Quotes to Scrape</title>'</span>>]</div><div class="line"></div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title::text'</span>).extract()</div><div class="line">[<span class="string">'Quotes to Scrape'</span>]</div><div class="line"></div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title'</span>).extract()</div><div class="line">[<span class="string">'<title>Quotes to Scrape</title>'</span>]</div><div class="line"></div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'li.next a'</span>).extract_first()</div><div class="line"><span class="string">'<a href="/page/2/">Next <span aria-hidden="true">→</span></a>'</span></div><div class="line"></div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'li.next a::attr(href)'</span>).extract_first()</div><div class="line"><span class="string">'/page/2/'</span></div></pre></td></tr></table></figure>
<p><code>response.css()</code>返回列表,如果想提取第一个,可以这样:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title::text'</span>).extract_first()</div><div class="line"><span class="string">'Quotes to Scrape'</span></div><div class="line"></div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title::text'</span>)[<span class="number">0</span>].extract()</div><div class="line"><span class="string">'Quotes to Scrape'</span></div></pre></td></tr></table></figure>
<p>推荐使用第一种方式,这样,如果<code>response.css()</code>返回空列表,前者会返回<code>None</code>,后者会触发异常</p>
<p>除了使用 <code>extract()</code> 和 <code>extract_first()</code>提取数据,也可以使用<code>re()</code>进行正则提取</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title::text'</span>).re(<span class="string">r'Quotes.*'</span>)</div><div class="line">[<span class="string">'Quotes to Scrape'</span>]</div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title::text'</span>).re(<span class="string">r'Q\w+'</span>)</div><div class="line">[<span class="string">'Quotes'</span>]</div><div class="line"><span class="meta">>>> </span>response.css(<span class="string">'title::text'</span>).re(<span class="string">r'(\w+) to (\w+)'</span>)</div><div class="line">[<span class="string">'Quotes'</span>, <span class="string">'Scrape'</span>]</div></pre></td></tr></table></figure>
<h3 id="Following-links"><a href="#Following-links" class="headerlink" title="Following links"></a>Following links</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> scrapy</div><div class="line"></div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">QuotesSpider</span><span class="params">(scrapy.Spider)</span>:</span></div><div class="line"> name = <span class="string">"quotes"</span></div><div class="line"> start_urls = [</div><div class="line"> <span class="string">'http://quotes.toscrape.com/page/1/'</span>,</div><div class="line"> ]</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse</span><span class="params">(self, response)</span>:</span></div><div class="line"> <span class="keyword">for</span> quote <span class="keyword">in</span> response.css(<span class="string">'div.quote'</span>):</div><div class="line"> <span class="keyword">yield</span> {</div><div class="line"> <span class="string">'text'</span>: quote.css(<span class="string">'span.text::text'</span>).extract_first(),</div><div class="line"> <span class="string">'author'</span>: quote.css(<span class="string">'span small::text'</span>).extract_first(),</div><div class="line"> <span class="comment"># 'author': quote.xpath('span/small/text()').extract_first(),</span></div><div class="line"> <span class="string">'tags'</span>: quote.css(<span class="string">'div.tags a.tag::text'</span>).extract(),</div><div class="line"> }</div><div class="line"></div><div class="line"> next_page = response.css(<span class="string">'li.next a::attr(href)'</span>).extract_first()</div><div class="line"> <span class="keyword">if</span> next_page <span class="keyword">is</span> <span class="keyword">not</span> <span class="keyword">None</span>:</div><div class="line"> next_page = response.urljoin(next_page) <span class="comment"># urljoin()获取完整url地址</span></div><div class="line"> <span class="keyword">yield</span> scrapy.Request(next_page, callback=self.parse)</div></pre></td></tr></table></figure>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> scrapy</div><div class="line"></div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">AuthorSpider</span><span class="params">(scrapy.Spider)</span>:</span></div><div class="line"> name = <span class="string">'author'</span></div><div class="line"></div><div class="line"> start_urls = [<span class="string">'http://quotes.toscrape.com/'</span>]</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse</span><span class="params">(self, response)</span>:</span></div><div class="line"> <span class="comment"># follow links to author pages</span></div><div class="line"> <span class="keyword">for</span> href <span class="keyword">in</span> response.css(<span class="string">'.author+a::attr(href)'</span>).extract():</div><div class="line"> <span class="keyword">yield</span> scrapy.Request(response.urljoin(href),</div><div class="line"> callback=self.parse_author)</div><div class="line"></div><div class="line"> <span class="comment"># follow pagination links</span></div><div class="line"> next_page = response.css(<span class="string">'li.next a::attr(href)'</span>).extract_first()</div><div class="line"> <span class="keyword">if</span> next_page <span class="keyword">is</span> <span class="keyword">not</span> <span class="keyword">None</span>:</div><div class="line"> next_page = response.urljoin(next_page)</div><div class="line"> <span class="keyword">yield</span> scrapy.Request(next_page, callback=self.parse)</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse_author</span><span class="params">(self, response)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">extract_with_css</span><span class="params">(query)</span>:</span></div><div class="line"> <span class="keyword">return</span> response.css(query).extract_first().strip()</div><div class="line"></div><div class="line"> <span class="keyword">yield</span> {</div><div class="line"> <span class="string">'name'</span>: extract_with_css(<span class="string">'h3.author-title::text'</span>),</div><div class="line"> <span class="string">'birthdate'</span>: extract_with_css(<span class="string">'.author-born-date::text'</span>),</div><div class="line"> <span class="string">'bio'</span>: extract_with_css(<span class="string">'.author-description::text'</span>),</div><div class="line"> }</div></pre></td></tr></table></figure>
<h3 id="命令行工具"><a href="#命令行工具" class="headerlink" title="命令行工具"></a>命令行工具</h3><figure class="highlight powershell"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div></pre></td><td class="code"><pre><div class="line">C:\WINDOWS\system32>scrapy</div><div class="line">Scrapy <span class="number">1.2</span>.<span class="number">2</span> - no active project</div><div class="line"></div><div class="line">Usage:</div><div class="line"> scrapy <command> [options] [args]</div><div class="line"></div><div class="line">Available commands:</div><div class="line"> bench Run quick benchmark test</div><div class="line"> commands</div><div class="line"> fetch Fetch a URL using the Scrapy downloader</div><div class="line"> genspider Generate new spider using pre-defined templates</div><div class="line"> runspider Run a self-contained spider (without creating a project)</div><div class="line"> settings Get settings values</div><div class="line"> shell Interactive scraping console</div><div class="line"> startproject Create new project</div><div class="line"> version Print Scrapy version</div><div class="line"> view Open URL <span class="keyword">in</span> browser, as seen by Scrapy</div><div class="line"></div><div class="line"> [ more ] More commands available when run from project directory</div><div class="line"></div><div class="line">Use <span class="string">"scrapy <command> -h"</span> to see more info about a command</div></pre></td></tr></table></figure>
<p>更多命令以及命令的详细使用方法请参考<a href="https://doc.scrapy.org/en/latest/topics/commands.html#available-tool-commands" target="_blank" rel="external">官方文档</a></p>
<h3 id="CrawlSpider"><a href="#CrawlSpider" class="headerlink" title="CrawlSpider"></a>CrawlSpider</h3><p>除了继承<code>scrapy.Spider</code>,常用的还有<code>scrapy.spiders.CrawlSpider</code>,该类可以在前者的基础上添加<code>Rule</code>。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> scrapy</div><div class="line"><span class="keyword">from</span> scrapy.spiders <span class="keyword">import</span> CrawlSpider, Rule</div><div class="line"><span class="keyword">from</span> scrapy.linkextractors <span class="keyword">import</span> LinkExtractor</div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">MySpider</span><span class="params">(CrawlSpider)</span>:</span></div><div class="line"> name = <span class="string">'example.com'</span></div><div class="line"> allowed_domains = [<span class="string">'example.com'</span>]</div><div class="line"> start_urls = [<span class="string">'http://www.example.com'</span>]</div><div class="line"></div><div class="line"> rules = (</div><div class="line"> <span class="comment"># Extract links matching 'category.php' (but not matching 'subsection.php')</span></div><div class="line"> <span class="comment"># and follow links from them (since no callback means follow=True by default).</span></div><div class="line"> Rule(LinkExtractor(allow=(<span class="string">'category\.php'</span>, ), deny=(<span class="string">'subsection\.php'</span>, ))),</div><div class="line"></div><div class="line"> <span class="comment"># Extract links matching 'item.php' and parse them with the spider's method parse_item</span></div><div class="line"> Rule(LinkExtractor(allow=(<span class="string">'item\.php'</span>, )), callback=<span class="string">'parse_item'</span>),</div><div class="line"> )</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse_item</span><span class="params">(self, response)</span>:</span></div><div class="line"> self.logger.info(<span class="string">'Hi, this is an item page! %s'</span>, response.url)</div><div class="line"> item = scrapy.Item()</div><div class="line"> item[<span class="string">'id'</span>] = response.xpath(<span class="string">'//td[@id="item_id"]/text()'</span>).re(<span class="string">r'ID: (\d+)'</span>)</div><div class="line"> item[<span class="string">'name'</span>] = response.xpath(<span class="string">'//td[@id="item_name"]/text()'</span>).extract()</div><div class="line"> item[<span class="string">'description'</span>] = response.xpath(<span class="string">'//td[@id="item_description"]/text()'</span>).extract()</div><div class="line"> <span class="keyword">return</span> item</div></pre></td></tr></table></figure>
<h3 id="SitemapSpider"><a href="#SitemapSpider" class="headerlink" title="SitemapSpider"></a>SitemapSpider</h3><p><code>scrapy.spiders.SitemapSpider</code>可以根据sitemaps和robots.txt进行爬去</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> scrapy.spiders <span class="keyword">import</span> SitemapSpider</div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">MySpider</span><span class="params">(SitemapSpider)</span>:</span></div><div class="line"> sitemap_urls = [<span class="string">'http://www.example.com/robots.txt'</span>]</div><div class="line"> sitemap_rules = [</div><div class="line"> (<span class="string">'/shop/'</span>, <span class="string">'parse_shop'</span>),</div><div class="line"> ]</div><div class="line"> sitemap_follow = [<span class="string">'/sitemap_shops'</span>]</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">parse_shop</span><span class="params">(self, response)</span>:</span></div><div class="line"> <span class="keyword">pass</span> <span class="comment"># ... scrape shop here ...</span></div></pre></td></tr></table></figure>
<p>规则中表示含有<code>/shop/</code>的url的回调函数为<code>parse_shop</code>,<code>sitemap_follow</code>表示只跟随包含<code>/sitemap_shops</code>的url</p>
<h3 id="Item"><a href="#Item" class="headerlink" title="Item"></a>Item</h3><p>python自带的<code>dict</code>没有结构体的概念,所以scrapy提供了<code>Item</code>类</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> scrapy</div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Product</span><span class="params">(scrapy.Item)</span>:</span></div><div class="line"> name = scrapy.Field()</div><div class="line"> price = scrapy.Field()</div><div class="line"> stock = scrapy.Field()</div><div class="line"> last_updated = scrapy.Field(serializer=str)</div></pre></td></tr></table></figure>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>product = Product(name=<span class="string">'Desktop PC'</span>, price=<span class="number">1000</span>)</div><div class="line"><span class="meta">>>> </span><span class="keyword">print</span> product</div><div class="line">Product(name=<span class="string">'Desktop PC'</span>, price=<span class="number">1000</span>)</div><div class="line"><span class="meta">>>> </span>product[<span class="string">'name'</span>]</div><div class="line">Desktop PC</div><div class="line"><span class="meta">>>> </span>product.get(<span class="string">'name'</span>)</div><div class="line">Desktop PC</div><div class="line"><span class="meta">>>> </span>product[<span class="string">'price'</span>]</div><div class="line"><span class="number">1000</span></div><div class="line"></div><div class="line"><span class="meta">>>> </span>product.keys()</div><div class="line">[<span class="string">'price'</span>, <span class="string">'name'</span>]</div><div class="line"><span class="meta">>>> </span>product.items()</div><div class="line">[(<span class="string">'price'</span>, <span class="number">1000</span>), (<span class="string">'name'</span>, <span class="string">'Desktop PC'</span>)]</div></pre></td></tr></table></figure>
<p>Item Loader能够更好将<code>response</code>中的数据注入到<code>Item</code>中</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> scrapy.loader <span class="keyword">import</span> ItemLoader</div><div class="line"><span class="keyword">from</span> myproject.items <span class="keyword">import</span> Product</div><div class="line"></div><div class="line"><span class="function"><span class="keyword">def</span> <span class="title">parse</span><span class="params">(self, response)</span>:</span></div><div class="line"> l = ItemLoader(item=Product(), response=response)</div><div class="line"> l.add_xpath(<span class="string">'name'</span>, <span class="string">'//div[@class="product_name"]'</span>)</div><div class="line"> l.add_xpath(<span class="string">'name'</span>, <span class="string">'//div[@class="product_title"]'</span>)</div><div class="line"> l.add_xpath(<span class="string">'price'</span>, <span class="string">'//p[@id="price"]'</span>)</div><div class="line"> l.add_css(<span class="string">'stock'</span>, <span class="string">'p#stock]'</span>)</div><div class="line"> l.add_value(<span class="string">'last_updated'</span>, <span class="string">'today'</span>) <span class="comment"># you can also use literal values</span></div><div class="line"> <span class="keyword">return</span> l.load_item()</div></pre></td></tr></table></figure>
<h3 id="Item-Pipeline"><a href="#Item-Pipeline" class="headerlink" title="Item Pipeline"></a>Item Pipeline</h3><p><code>Item</code>被爬取后会发送给pipeline进行处理,一般pipeline是只用实现<code>process_item</code>的类,也可以实现<code>open_spider()</code>(爬虫开始前执行)和<code>close_spider()</code></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> pymongo</div><div class="line"></div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">MongoPipeline</span><span class="params">(object)</span>:</span></div><div class="line"></div><div class="line"> collection_name = <span class="string">'scrapy_items'</span></div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, mongo_uri, mongo_db)</span>:</span></div><div class="line"> self.mongo_uri = mongo_uri</div><div class="line"> self.mongo_db = mongo_db</div><div class="line"></div><div class="line"><span class="meta"> @classmethod</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">from_crawler</span><span class="params">(cls, crawler)</span>:</span></div><div class="line"> <span class="keyword">return</span> cls(</div><div class="line"> mongo_uri=crawler.settings.get(<span class="string">'MONGO_URI'</span>),</div><div class="line"> mongo_db=crawler.settings.get(<span class="string">'MONGO_DATABASE'</span>, <span class="string">'items'</span>)</div><div class="line"> )</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">open_spider</span><span class="params">(self, spider)</span>:</span></div><div class="line"> self.client = pymongo.MongoClient(self.mongo_uri)</div><div class="line"> self.db = self.client[self.mongo_db]</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">close_spider</span><span class="params">(self, spider)</span>:</span></div><div class="line"> self.client.close()</div><div class="line"></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">process_item</span><span class="params">(self, item, spider)</span>:</span></div><div class="line"> self.db[self.collection_name].insert(dict(item))</div><div class="line"> <span class="keyword">return</span> item</div></pre></td></tr></table></figure>
<p>以上是scrapy基础内容,更多有关scrapy,如log和email等查看<a href="https://doc.scrapy.org/en/latest/index.html" target="_blank" rel="external">官方文档</a></p>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://doc.scrapy.org/en/latest/index.html" target="_blank" rel="external">Scrapy官方文档</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="Scrapy简介"><a href="#Scrapy简介" class="headerlink" title="Scrapy简介"></a>Scrapy简介</h2><p><img src="https://scrapy.org/img/scrapylogo.png" alt=""></p>
<p>scrapy发出的请求是异步的,默认过滤掉相同的url。能做html/xml解析,数据能导出多种格式,还有强大的插件系统</p>
<p>scrapy(1.2.2)目前支持python 3,但是官方文档是也有说明,并不支持windows平台上的python3,因为scrapy的核心依赖<code>Twisted</code>目前并不支持windows平台上的python 3,所以知乎上有人推荐使用python 2.7,并需要安装<a href="https://www.microsoft.com/en-us/download/details.aspx?id=44266" target="_blank" rel="external">Visual C++ Compiler for Python 2.7</a>,并且window10 也支持这个软件,但是按照python开发者手册上的说明,<a href="https://docs.python.org/devguide/#status-of-python-branches" target="_blank" rel="external">python2.7只会维护到2020年</a>,并且python的未来也是指向python 3,基本上主流库都支持了python 3,并且很多库已经开始不支持python 2了,所以这里我还是想使用python 3.</p>
<p>关于为什么不支持windows平台,原因是windows上不能编译scrapy的依赖<code>lxml</code>和<code>Twisted</code>,但是我们可以下载已经编译好的<code>whl</code>包,用<code>pip</code>安装即可,详情,可以参考这篇博客: <a href="https://my.oschina.net/wangyuefive/blog/784171" target="_blank" rel="external">python 3.5 + scrapy1.2 windows下的安装</a></p>
</summary>
<category term="Python模块学习" scheme="https://xin053.github.io/categories/Python%E6%A8%A1%E5%9D%97%E5%AD%A6%E4%B9%A0/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
<category term="Scrapy" scheme="https://xin053.github.io/tags/Scrapy/"/>
<category term="爬虫" scheme="https://xin053.github.io/tags/%E7%88%AC%E8%99%AB/"/>
</entry>
<entry>
<title>re正则库使用详解</title>
<link href="https://xin053.github.io/2016/12/01/re%E6%AD%A3%E5%88%99%E5%BA%93%E4%BD%BF%E7%94%A8%E8%AF%A6%E8%A7%A3/"/>
<id>https://xin053.github.io/2016/12/01/re正则库使用详解/</id>
<published>2016-12-01T08:00:45.000Z</published>
<updated>2017-05-27T13:20:48.775Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="re简介"><a href="#re简介" class="headerlink" title="re简介"></a>re简介</h2><p>正则表达式会被python解释器编译成字节码,这样查找的效率比单纯用python代码实现查找要快,但是匹配统一内容可以有多种不同的正则表达式,并且他们的效率各不相同</p>
<h2 id="特殊符号"><a href="#特殊符号" class="headerlink" title="特殊符号"></a>特殊符号</h2><figure class="highlight"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">. ^ $ * + ? { } [ ] \ | ( )</div></pre></td></tr></table></figure>
<p>匹配这些特殊符号需要使用<code>\</code>进行转义</p>
<a id="more"></a>
<h3 id=""><a href="#" class="headerlink" title="."></a><code>.</code></h3><p>匹配除换行符以外的任意字符,如果指定了<code>DOTALL</code>标志,则匹配所有字符,但注意<code>.</code>表示仅仅匹配一个字符</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> re</div><div class="line">re.findall(<span class="string">r'.'</span>, <span class="string">'\r\nabc'</span>)</div><div class="line"><span class="comment"># ['\r', 'a', 'b', 'c']</span></div><div class="line">re.findall(<span class="string">r'.'</span>, <span class="string">'\r\nabc'</span>, flags=re.DOTALL)</div><div class="line"><span class="comment"># ['\r', '\n', 'a', 'b', 'c']</span></div></pre></td></tr></table></figure>
<h3 id="-1"><a href="#-1" class="headerlink" title="^"></a><code>^</code></h3><p>匹配字符串的开始,当指定<code>MULTILINE</code>标志,则匹配每一行的开头</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'ab.'</span>, <span class="string">'abcdefabhy'</span>)</div><div class="line"><span class="comment"># ['abc', 'abh']</span></div><div class="line">re.findall(<span class="string">r'^ab.'</span>, <span class="string">'abcdefabhy'</span>)</div><div class="line"><span class="comment"># ['abc']</span></div><div class="line">re.findall(<span class="string">r'^ab.'</span>,</div><div class="line"> <span class="string">'''abcd</span></div><div class="line"> abcd</div><div class="line"> acd</div><div class="line"> abcd''')</div><div class="line"><span class="comment"># ['abc']</span></div><div class="line">re.findall(<span class="string">r'^ab.'</span>,</div><div class="line"> <span class="string">'''abcd</span></div><div class="line"> abcd</div><div class="line"> acd</div><div class="line"> abcd''', flags=re.MULTILINE)</div><div class="line"><span class="comment"># ['abc', 'abc', 'abc']</span></div></pre></td></tr></table></figure>
<h3 id="-2"><a href="#-2" class="headerlink" title="###"></a><code>###</code></h3><p>匹配字符串的结尾,当指定<code>MULTILINE</code>标志,则匹配每一行的结尾(匹配换行符之前的)</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'.ab$'</span>, <span class="string">'aabcbab'</span>)</div><div class="line"><span class="comment"># ['bab']</span></div><div class="line">re.findall(<span class="string">r'ab.$'</span>, <span class="string">'aabcbab'</span>)</div><div class="line"><span class="comment"># []</span></div><div class="line">re.findall(<span class="string">r'ab.$'</span>, <span class="string">'aabcbab1\n'</span>) <span class="comment"># 注意换行符不是结尾,换行符之前的才是结尾</span></div><div class="line"><span class="comment"># ['ab1']</span></div></pre></td></tr></table></figure>
<h3 id="-3"><a href="#-3" class="headerlink" title="*"></a><code>*</code></h3><p><code>*</code>表示0个或多个前一字符或正则</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'ab*c'</span>, <span class="string">'ac.abc.abbbbc'</span>)</div><div class="line"><span class="comment"># ['ac', 'abc', 'abbbbc']</span></div></pre></td></tr></table></figure>
<h3 id="-4"><a href="#-4" class="headerlink" title="+"></a><code>+</code></h3><p><code>+</code>表示1个或多个前一字符或正则</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'ab+c'</span>, <span class="string">'ac.abc.abbbbc'</span>)</div><div class="line"><span class="comment"># ['abc', 'abbbbc']</span></div></pre></td></tr></table></figure>
<h3 id="-5"><a href="#-5" class="headerlink" title="?"></a><code>?</code></h3><p><code>?</code>表示0个或1个前一字符或正则</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'ab?c'</span>, <span class="string">'ac.abc.abbbbc'</span>)</div><div class="line"><span class="comment"># ['ac', 'abc']</span></div></pre></td></tr></table></figure>
<h3 id="-6"><a href="#-6" class="headerlink" title="*? +? ??"></a><code>*?</code> <code>+?</code> <code>??</code></h3><p><code>*</code> <code>+</code> <code>?</code> 都是贪婪的,会匹配最长的</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'<.*>'</span>, <span class="string">'<a> b <c>'</span>)</div><div class="line"><span class="comment"># ['<a> b <c>']</span></div></pre></td></tr></table></figure>
<p>在这些操作符后面添加<code>?</code>能够使之变为不贪婪的,也就是匹配最短的</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'<.*?>'</span>, <span class="string">'<a> b <c>'</span>)</div><div class="line"><span class="comment"># ['<a>', '<c>']</span></div></pre></td></tr></table></figure>
<h3 id="m"><a href="#m" class="headerlink" title="{m}"></a><code>{m}</code></h3><p><code>{m}</code>表示m个前一字符或正则</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'a{3}b'</span>, <span class="string">'aabaaabaaaab'</span>)</div><div class="line"><span class="comment"># ['aaab', 'aaab']</span></div></pre></td></tr></table></figure>
<h3 id="m-n"><a href="#m-n" class="headerlink" title="{m,n}"></a><code>{m,n}</code></h3><p><code>{m,n}</code>表示m到n个前一字符或正则 注意:<code>,</code>后面没有空格</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'a{2,3}b'</span>, <span class="string">'aabaaabaaaab'</span>)</div><div class="line"><span class="comment"># ['aab', 'aaab', 'aaab']</span></div></pre></td></tr></table></figure>
<p>省略m表示没有下限,省略n表示没有上限</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'a{,3}b'</span>, <span class="string">'babaabaaabaaaab'</span>)</div><div class="line"><span class="comment"># ['b', 'ab', 'aab', 'aaab', 'aaab']</span></div><div class="line">re.findall(<span class="string">r'a{2,}b'</span>, <span class="string">'babaabaaabaaaab'</span>)</div><div class="line"><span class="comment"># ['aab', 'aaab', 'aaaab']</span></div></pre></td></tr></table></figure>
<h3 id="m-n-1"><a href="#m-n-1" class="headerlink" title="{m,n}?"></a><code>{m,n}?</code></h3><p><code>{m,n}</code>会匹配最长的,在后面加<code>?</code>,则匹配最短的</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'a{2,4}'</span>, <span class="string">'aaaa'</span>)</div><div class="line"><span class="comment"># ['aaaa']</span></div><div class="line">re.findall(<span class="string">r'a{2,4}?'</span>, <span class="string">'aaaa'</span>)</div><div class="line"><span class="comment"># ['aa', 'aa']</span></div></pre></td></tr></table></figure>
<h3 id="-7"><a href="#-7" class="headerlink" title="[]"></a><code>[]</code></h3><p><code>[]</code>指定一组字符</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'[a-z]'</span>, <span class="string">'adfzADFZ059'</span>)</div><div class="line"><span class="comment"># ['a', 'd', 'f', 'z']</span></div><div class="line">re.findall(<span class="string">r'[a-zA-Z0-9]'</span>, <span class="string">'adfzADFZ059'</span>)</div><div class="line"><span class="comment"># ['a', 'd', 'f', 'z', 'A', 'D', 'F', 'Z', '0', '5', '9']</span></div></pre></td></tr></table></figure>
<p>很多特殊符号在<code>[]</code>环境内无效,其他特殊符号需要转义:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'[.$*+?{}|()]'</span>, <span class="string">'.^$*+?{}[]\|()'</span>)</div><div class="line"><span class="comment"># ['.', '$', '*', '+', '?', '{', '}', '|', '(', ')']</span></div></pre></td></tr></table></figure>
<p><code>[]</code>内的<code>^</code>表示非,<code>^^</code>表示除<code>^</code>以外的全部字符:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'[^5]'</span>, <span class="string">'1359'</span>)</div><div class="line"><span class="comment"># ['1', '3', '9']</span></div><div class="line">re.findall(<span class="string">r'[^^]'</span>, <span class="string">'1359^'</span>)</div><div class="line"><span class="comment"># ['1', '3', '5', '9']</span></div></pre></td></tr></table></figure>
<h3 id="-8"><a href="#-8" class="headerlink" title="|"></a><code>|</code></h3><p><code>|</code>也就是或,注意也是短路操作</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'a|bc'</span>, <span class="string">'acbcabc'</span>)</div><div class="line"><span class="comment"># ['a', 'bc', 'a', 'bc']</span></div><div class="line">re.findall(<span class="string">r'[a|b]c'</span>, <span class="string">'acbcabc'</span>)</div><div class="line"><span class="comment"># ['ac', 'bc', 'bc']</span></div></pre></td></tr></table></figure>
<h3 id="-9"><a href="#-9" class="headerlink" title="(...)"></a><code>(...)</code></h3><p>匹配圆括号里的RE匹配的内容,并指定组的开始和结束位置。组里面的内容可以被提取,要匹配<code>(</code>和<code>)</code>,则需要使用转义符号或者是<code>[(]</code>,<code>[)]</code></p>
<h3 id="aiLmsux"><a href="#aiLmsux" class="headerlink" title="(?aiLmsux)"></a><code>(?aiLmsux)</code></h3><p><code>i</code>,<code>L</code>,<code>m</code>,<code>s</code>,<code>u</code>,<code>x</code>里的一个或多个字母。表达式不匹配任何字符,但是指定相应的标志:<code>re.I</code>(忽略大小写)、<code>re.L</code>(依赖locale)、<code>re.M</code>(多行模式)、<code>re.S</code>(.匹配所有字符)、<code>re.U</code>(依赖Unicode)、<code>re.X</code>(详细模式)</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'(?i)ab'</span>, <span class="string">'abABAbaB'</span>)</div><div class="line"><span class="comment"># ['ab', 'AB', 'Ab', 'aB']</span></div></pre></td></tr></table></figure>
<h3 id="P-lt-name-gt"><a href="#P-lt-name-gt" class="headerlink" title="(?P<name>...)"></a><code>(?P<name>...)</code></h3><p>和普通的圆括号类似,但是子串匹配到的内容将可以用命名的<code>name</code>参数来提取。组的<code>name</code>必须是有效的python标识符,而且在本表达式内不重名。命名了的组和普通组一样,也用数字来提取,也就是说名字只是个额外的属性。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">m = re.match(<span class="string">'(?P<name>\w+)'</span>, <span class="string">'zzx:22'</span>)</div><div class="line">m.group(<span class="string">'name'</span>)</div><div class="line"><span class="comment"># 'zzx'</span></div><div class="line">m.group(<span class="number">1</span>)</div><div class="line"><span class="comment"># 'zzx'</span></div></pre></td></tr></table></figure>
<h2 id="special-sequences"><a href="#special-sequences" class="headerlink" title="special sequences"></a>special sequences</h2><h3 id="number"><a href="#number" class="headerlink" title="\number"></a><code>\number</code></h3><p>表示之前的分组</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.match(<span class="string">r'(.+) \1 (abc) \2'</span>, <span class="string">'55 55 abc abc'</span>)</div><div class="line"><span class="comment"># <_sre.SRE_Match object; span=(0, 13), match='55 55 abc abc'></span></div></pre></td></tr></table></figure>
<h3 id="A"><a href="#A" class="headerlink" title="\A"></a><code>\A</code></h3><p>仅匹配字符串的开头</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'\Aabc'</span>, <span class="string">'abcabc'</span>)</div><div class="line"><span class="comment"># ['abc']</span></div></pre></td></tr></table></figure>
<h3 id="b"><a href="#b" class="headerlink" title="\b"></a><code>\b</code></h3><p>表示单词开始和结尾处的空白字符以及非字母非数字的字符</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'\babc\b'</span>, <span class="string">'abc.'</span>)</div><div class="line"><span class="comment"># ['abc']</span></div><div class="line">re.findall(<span class="string">r'\babc\b'</span>, <span class="string">'abc!'</span>)</div><div class="line"><span class="comment"># ['abc']</span></div><div class="line">re.findall(<span class="string">r'\babc\b'</span>, <span class="string">'abca'</span>)</div><div class="line"><span class="comment"># []</span></div></pre></td></tr></table></figure>
<h3 id="B"><a href="#B" class="headerlink" title="\B"></a><code>\B</code></h3><p><code>\b</code>的反面</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'py\B'</span>, <span class="string">'python'</span>)</div><div class="line"><span class="comment"># ['py']</span></div><div class="line">re.findall(<span class="string">r'py\B'</span>, <span class="string">'py.'</span>)</div><div class="line"><span class="comment"># []</span></div></pre></td></tr></table></figure>
<h3 id="s"><a href="#s" class="headerlink" title="\s"></a><code>\s</code></h3><p>匹配空白字符,包括<code>[ \t\n\r\f\v]</code></p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'aa\s+bb'</span>, <span class="string">'aa \n\t bb'</span>)</div><div class="line"><span class="comment"># ['aa \n\t bb']</span></div></pre></td></tr></table></figure>
<h3 id="S"><a href="#S" class="headerlink" title="\S"></a><code>\S</code></h3><p><code>\s</code>的反面</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'aa\S+bb'</span>, <span class="string">'aahg.!bb'</span>)</div><div class="line"><span class="comment"># ['aahg.!bb']</span></div><div class="line">re.findall(<span class="string">r'aa\S+bb'</span>, <span class="string">'aa bb'</span>)</div><div class="line"><span class="comment"># []</span></div></pre></td></tr></table></figure>
<h3 id="w"><a href="#w" class="headerlink" title="\w"></a><code>\w</code></h3><p>匹配数字和字母</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'\w+'</span>, <span class="string">'aa3bb 45AS'</span>)</div><div class="line"><span class="comment"># ['aa3bb', '45AS']</span></div></pre></td></tr></table></figure>
<h3 id="W"><a href="#W" class="headerlink" title="\W"></a><code>\W</code></h3><p><code>\w</code>的反面</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'\W+'</span>, <span class="string">'aa3bb .! 45AS'</span>)</div><div class="line"><span class="comment"># [' .! ']</span></div></pre></td></tr></table></figure>
<h3 id="Z"><a href="#Z" class="headerlink" title="\Z"></a><code>\Z</code></h3><p>匹配字符串结尾</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">re.findall(<span class="string">r'ab\Z'</span>, <span class="string">'abab'</span>)</div><div class="line"><span class="comment"># ['ab']</span></div></pre></td></tr></table></figure>
<h2 id="re模块方法"><a href="#re模块方法" class="headerlink" title="re模块方法"></a><code>re</code>模块方法</h2><h3 id="re-compile-pattern-flags-0"><a href="#re-compile-pattern-flags-0" class="headerlink" title="re.compile(pattern, flags=0)"></a><code>re.compile(pattern, flags=0)</code></h3><p>编译一个正则表达式为一个正则表达式对象,之后就可以使用该对象对字符串进行匹配了</p>
<h3 id="re-search-pattern-string-flags-0"><a href="#re-search-pattern-string-flags-0" class="headerlink" title="re.search(pattern, string, flags=0)"></a><code>re.search(pattern, string, flags=0)</code></h3><p>从字符串的开头开始搜索匹配,返回匹配到的第一个</p>
<h3 id="re-match-pattern-string-flags-0"><a href="#re-match-pattern-string-flags-0" class="headerlink" title="re.match(pattern, string, flags=0)"></a><code>re.match(pattern, string, flags=0)</code></h3><p>返回字符串中匹配的第一个</p>
<h3 id="re-fullmatch-pattern-string-flags-0"><a href="#re-fullmatch-pattern-string-flags-0" class="headerlink" title="re.fullmatch(pattern, string, flags=0)"></a><code>re.fullmatch(pattern, string, flags=0)</code></h3><p>对整个字符串进行匹配</p>
<h3 id="re-split-pattern-string-maxsplit-0-flags-0"><a href="#re-split-pattern-string-maxsplit-0-flags-0" class="headerlink" title="re.split(pattern, string, maxsplit=0, flags=0)"></a><code>re.split(pattern, string, maxsplit=0, flags=0)</code></h3><p>凭正则表达式分割字符串</p>
<h3 id="re-findall-pattern-string-flags-0"><a href="#re-findall-pattern-string-flags-0" class="headerlink" title="re.findall(pattern, string, flags=0)"></a><code>re.findall(pattern, string, flags=0)</code></h3><p>如果匹配模式中包含分组,则返回分组,如果有多个分组,则返回分组组成的元组</p>
<h3 id="re-finditer-pattern-string-flags-0"><a href="#re-finditer-pattern-string-flags-0" class="headerlink" title="re.finditer(pattern, string, flags=0)"></a><code>re.finditer(pattern, string, flags=0)</code></h3><p>返回迭代器</p>
<h3 id="re-sub-pattern-repl-string-count-0-flags-0"><a href="#re-sub-pattern-repl-string-count-0-flags-0" class="headerlink" title="re.sub(pattern, repl, string, count=0, flags=0)"></a><code>re.sub(pattern, repl, string, count=0, flags=0)</code></h3><p>替换</p>
<h2 id="Match-Objects"><a href="#Match-Objects" class="headerlink" title="Match Objects"></a>Match Objects</h2><p>像<code>match()</code> <code>search()</code>等方法返回的就是一个<code>Match</code>对象,该对象包括的属性和方法请看<a href="https://docs.python.org/3/library/re.html#match-objects" target="_blank" rel="external">官方文档</a></p>
<p>注意,关于分组,第0组就是匹配到的字符串</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div></pre></td><td class="code"><pre><div class="line">a = re.match(<span class="string">r'\babc\b'</span>, <span class="string">'abc!'</span>)</div><div class="line">a.group()</div><div class="line"><span class="comment"># 'abc'</span></div></pre></td></tr></table></figure>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://docs.python.org/3/library/re.html" target="_blank" rel="external">re官方文档</a></li>
<li><a href="https://docs.python.org/3/howto/regex.html" target="_blank" rel="external">Regular Expression HOWTO</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="re简介"><a href="#re简介" class="headerlink" title="re简介"></a>re简介</h2><p>正则表达式会被python解释器编译成字节码,这样查找的效率比单纯用python代码实现查找要快,但是匹配统一内容可以有多种不同的正则表达式,并且他们的效率各不相同</p>
<h2 id="特殊符号"><a href="#特殊符号" class="headerlink" title="特殊符号"></a>特殊符号</h2><figure class="highlight"><table><tr><td class="gutter"><pre><div class="line">1</div></pre></td><td class="code"><pre><div class="line">. ^ $ * + ? &#123; &#125; [ ] \ | ( )</div></pre></td></tr></table></figure>
<p>匹配这些特殊符号需要使用<code>\</code>进行转义</p>
</summary>
<category term="Python模块学习" scheme="https://xin053.github.io/categories/Python%E6%A8%A1%E5%9D%97%E5%AD%A6%E4%B9%A0/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
<category term="re" scheme="https://xin053.github.io/tags/re/"/>
</entry>
<entry>
<title>Python描述符descriptor</title>
<link href="https://xin053.github.io/2016/11/29/Python%E6%8F%8F%E8%BF%B0%E7%AC%A6descriptor/"/>
<id>https://xin053.github.io/2016/11/29/Python描述符descriptor/</id>
<published>2016-11-29T10:40:14.000Z</published>
<updated>2017-05-27T13:20:48.771Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h2><h3 id="Python描述符-descriptor-解密"><a href="#Python描述符-descriptor-解密" class="headerlink" title="Python描述符(descriptor)解密"></a>Python描述符(descriptor)解密</h3><p>原文链接: <a href="http://nbviewer.ipython.org/urls/gist.github.com/ChrisBeaumont/5758381/raw/descriptor_writeup.ipynb" target="_blank" rel="external">Chris Beaumont</a> 翻译: <a href="http://www.geekfan.net/" target="_blank" rel="external">极客范 </a>- <a href="http://www.geekfan.net/author/murong/" target="_blank" rel="external">慕容老匹夫</a></p>
<p>转载链接: <a href="http://www.geekfan.net/7862/" target="_blank" rel="external">http://www.geekfan.net/7862/</a></p>
<p>Python中包含了许多内建的语言特性,它们使得代码简洁且易于理解。这些特性包括列表/集合/字典推导式,属性(property)、以及装饰器(decorator)。对于大部分特性来说,这些“中级”的语言特性有着完善的文档,并且易于学习。</p>
<p>但是这里有个例外,那就是描述符。至少对于我来说,描述符是Python语言核心中困扰我时间最长的一个特性。这里有几点原因如下:</p>
<ol>
<li>有关描述符的官方文档相当难懂,而且没有包含优秀的示例告诉你为什么需要编写描述符(我得为Raymond Hettinger辩护一下,他写的其他主题的Python文章和视频对我的帮助还是非常大的)</li>
<li>编写描述符的语法显得有些怪异</li>
<li>自定义描述符可能是Python中用的最少的特性,因此你很难在开源项目中找到优秀的示例</li>
</ol>
<p>但是一旦你理解了之后,描述符的确还是有它的应用价值的。这篇文章告诉你描述符可以用来做什么,以及为什么应该引起你的注意。</p>
<a id="more"></a>
<h2 id="一句话概括:描述符就是可重用的属性"><a href="#一句话概括:描述符就是可重用的属性" class="headerlink" title="一句话概括:描述符就是可重用的属性"></a>一句话概括:描述符就是可重用的属性</h2><p>在这里我要告诉你:从根本上讲,描述符就是可以重复使用的属性。也就是说,描述符可以让你编写这样的代码:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">f = Foo()</div><div class="line">b = f.bar</div><div class="line">f.bar = c</div><div class="line"><span class="keyword">del</span> f.bar</div></pre></td></tr></table></figure>
<p>而在解释器执行上述代码时,当发现你试图访问属性<code>b = f.bar</code>、对属性赋值<code>f.bar = c</code>或者删除一个实例变量的属性<code>del f.bar</code>时,就会去调用自定义的方法。</p>
<p>让我们先来解释一下为什么把对函数的调用伪装成对属性的访问是大有好处的。</p>
<h2 id="property——把函数调用伪装成对属性的访问"><a href="#property——把函数调用伪装成对属性的访问" class="headerlink" title="property——把函数调用伪装成对属性的访问"></a>property——把函数调用伪装成对属性的访问</h2><p>想象一下你正在编写管理电影信息的代码。你最后写好的Movie类可能看上去是这样的:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Movie</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, title, rating, runtime, budget, gross)</span>:</span></div><div class="line"> self.title = title</div><div class="line"> self.rating = rating</div><div class="line"> self.runtime = runtime</div><div class="line"> self.budget = budget</div><div class="line"> self.gross = gross</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">profit</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.gross - self.budget</div></pre></td></tr></table></figure>
<p>你开始在项目的其他地方使用这个类,但是之后你意识到:如果不小心给电影打了负分怎么办?你觉得这是错误的行为,希望<code>Movie</code>类可以阻止这个错误。 你首先想到的办法是将<code>Movie</code>类修改为这样:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Movie</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, title, rating, runtime, budget, gross)</span>:</span></div><div class="line"> self.title = title</div><div class="line"> self.rating = rating</div><div class="line"> self.runtime = runtime</div><div class="line"> self.gross = gross</div><div class="line"> <span class="keyword">if</span> budget < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % budget)</div><div class="line"> self.budget = budget</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">profit</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.gross - self.budget</div></pre></td></tr></table></figure>
<p>但这行不通。因为其他部分的代码都是直接通过<code>Movie.budget</code>来赋值的,这个新修改的类只会在<code>__init__</code>方法中捕获错误的数据,但对于已经存在的类实例就无能为力了。如果有人试着运行<code>m.budget = -100</code>,那么谁也没法阻止。作为一个Python程序员同时也是电影迷,你该怎么办?</p>
<p>幸运的是,Python的<code>property</code>解决了这个问题。如果你从未见过<code>property</code>的用法,下面是一个示例:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Movie</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, title, rating, runtime, budget, gross)</span>:</span></div><div class="line"> self._budget = <span class="keyword">None</span></div><div class="line"> </div><div class="line"> self.title = title</div><div class="line"> self.rating = rating</div><div class="line"> self.runtime = runtime</div><div class="line"> self.gross = gross</div><div class="line"> self.budget = budget</div><div class="line"> </div><div class="line"><span class="meta"> @property</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">budget</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self._budget</div><div class="line"> </div><div class="line"><span class="meta"> @budget.setter</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">budget</span><span class="params">(self, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self._budget = value</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">profit</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.gross - self.budget</div><div class="line"> </div><div class="line">m = Movie(<span class="string">'Casablanca'</span>, <span class="number">97</span>, <span class="number">102</span>, <span class="number">964000</span>, <span class="number">1300000</span>)</div><div class="line"><span class="keyword">print</span> m.budget <span class="comment"># calls m.budget(), returns result</span></div><div class="line"><span class="keyword">try</span>:</div><div class="line"> m.budget = <span class="number">-100</span> <span class="comment"># calls budget.setter(-100), and raises ValueError</span></div><div class="line"><span class="keyword">except</span> ValueError:</div><div class="line"> <span class="keyword">print</span> <span class="string">"Woops. Not allowed"</span></div><div class="line"> </div><div class="line"><span class="number">964000</span></div><div class="line">Woops. Not allowed</div></pre></td></tr></table></figure>
<p>我们用<code>@property</code>装饰器指定了一个<code>getter</code>方法,用<code>@budget.setter</code>装饰器指定了一个<code>setter</code>方法。当我们这么做时,每当有人试着访问<code>budget</code>属性,Python就会自动调用相应的<code>getter/setter</code>方法。比方说,当遇到<code>m.budget = value</code>这样的代码时就会自动调用<code>budget.setter</code></p>
<p>花点时间来欣赏一下Python这么做是多么的优雅:如果没有<code>property</code>,我们将不得不把所有的实例属性隐藏起来,提供大量显式的类似<code>get_budget</code>和<code>set_budget</code>方法。像这样编写类的话,使用起来就会不断的去调用这些<code>getter/setter</code>方法,这看起来就像臃肿的Java代码一样。更糟的是,如果我们不采用这种编码风格,直接对实例属性进行访问。那么稍后就没法以清晰的方式增加对非负数的条件检查——我们不得不重新创建<code>set_budget</code>方法,然后搜索整个工程中的源代码,将<code>m.budget = value</code>这样的代码替换为<code>m.set_budget(value)</code>。太蛋疼了!!</p>
<p>因此,<code>property</code>让我们将自定义的代码同变量的访问/设定联系在了一起,同时为你的类保持一个简单的访问属性的接口。干得漂亮!</p>
<h2 id="property的不足"><a href="#property的不足" class="headerlink" title="property的不足"></a>property的不足</h2><p>对<code>property</code>来说,最大的缺点就是它们不能重复使用。举个例子,假设你想为<code>rating</code>,<code>runtime</code>和<code>gross</code>这些字段也添加非负检查。下面是修改过的新类:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div><div class="line">51</div><div class="line">52</div><div class="line">53</div><div class="line">54</div><div class="line">55</div><div class="line">56</div><div class="line">57</div><div class="line">58</div><div class="line">59</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Movie</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, title, rating, runtime, budget, gross)</span>:</span></div><div class="line"> self._rating = <span class="keyword">None</span></div><div class="line"> self._runtime = <span class="keyword">None</span></div><div class="line"> self._budget = <span class="keyword">None</span></div><div class="line"> self._gross = <span class="keyword">None</span></div><div class="line"> </div><div class="line"> self.title = title</div><div class="line"> self.rating = rating</div><div class="line"> self.runtime = runtime</div><div class="line"> self.gross = gross</div><div class="line"> self.budget = budget</div><div class="line"> </div><div class="line"> <span class="comment">#nice</span></div><div class="line"><span class="meta"> @property</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">budget</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self._budget</div><div class="line"> </div><div class="line"><span class="meta"> @budget.setter</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">budget</span><span class="params">(self, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self._budget = value</div><div class="line"> </div><div class="line"> <span class="comment">#ok </span></div><div class="line"><span class="meta"> @property</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">rating</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self._rating</div><div class="line"> </div><div class="line"><span class="meta"> @rating.setter</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">rating</span><span class="params">(self, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self._rating = value</div><div class="line"> </div><div class="line"> <span class="comment">#uhh...</span></div><div class="line"><span class="meta"> @property</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">runtime</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self._runtime</div><div class="line"> </div><div class="line"><span class="meta"> @runtime.setter</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">runtime</span><span class="params">(self, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self._runtime = value </div><div class="line"> </div><div class="line"> <span class="comment">#is this forever?</span></div><div class="line"><span class="meta"> @property</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">gross</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self._gross</div><div class="line"> </div><div class="line"><span class="meta"> @gross.setter</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">gross</span><span class="params">(self, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self._gross = value </div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">profit</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.gross - self.budget</div></pre></td></tr></table></figure>
<p>可以看到代码增加了不少,但重复的逻辑也出现了不少。虽然<code>property</code>可以让类从外部看起来接口整洁漂亮,<strong>但是却做不到内部同样整洁漂亮。</strong></p>
<h2 id="描述符登场(最终的大杀器)"><a href="#描述符登场(最终的大杀器)" class="headerlink" title="描述符登场(最终的大杀器)"></a>描述符登场(最终的大杀器)</h2><p>这就是描述符所解决的问题。描述符是<code>property</code>的升级版,允许你为重复的<code>property</code>逻辑编写单独的类来处理。下面的示例展示了描述符是如何工作的(现在还不必担心<code>NonNegative</code>类的实现):</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div><div class="line">42</div><div class="line">43</div><div class="line">44</div><div class="line">45</div><div class="line">46</div><div class="line">47</div><div class="line">48</div><div class="line">49</div><div class="line">50</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">from</span> weakref <span class="keyword">import</span> WeakKeyDictionary</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">NonNegative</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="string">"""A descriptor that forbids negative values"""</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, default)</span>:</span></div><div class="line"> self.default = default</div><div class="line"> self.data = WeakKeyDictionary()</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__get__</span><span class="params">(self, instance, owner)</span>:</span></div><div class="line"> <span class="comment"># we get here when someone calls x.d, and d is a NonNegative instance</span></div><div class="line"> <span class="comment"># instance = x</span></div><div class="line"> <span class="comment"># owner = type(x)</span></div><div class="line"> <span class="keyword">return</span> self.data.get(instance, self.default)</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set__</span><span class="params">(self, instance, value)</span>:</span></div><div class="line"> <span class="comment"># we get here when someone calls x.d = val, and d is a NonNegative instance</span></div><div class="line"> <span class="comment"># instance = x</span></div><div class="line"> <span class="comment"># value = val</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self.data[instance] = value</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Movie</span><span class="params">(object)</span>:</span></div><div class="line"> </div><div class="line"> <span class="comment">#always put descriptors at the class-level</span></div><div class="line"> rating = NonNegative(<span class="number">0</span>)</div><div class="line"> runtime = NonNegative(<span class="number">0</span>)</div><div class="line"> budget = NonNegative(<span class="number">0</span>)</div><div class="line"> gross = NonNegative(<span class="number">0</span>)</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, title, rating, runtime, budget, gross)</span>:</span></div><div class="line"> self.title = title</div><div class="line"> self.rating = rating</div><div class="line"> self.runtime = runtime</div><div class="line"> self.budget = budget</div><div class="line"> self.gross = gross</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">profit</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.gross - self.budget</div><div class="line"> </div><div class="line">m = Movie(<span class="string">'Casablanca'</span>, <span class="number">97</span>, <span class="number">102</span>, <span class="number">964000</span>, <span class="number">1300000</span>)</div><div class="line"><span class="keyword">print</span> m.budget <span class="comment"># calls Movie.budget.__get__(m, Movie)</span></div><div class="line">m.rating = <span class="number">100</span> <span class="comment"># calls Movie.budget.__set__(m, 100)</span></div><div class="line"><span class="keyword">try</span>:</div><div class="line"> m.rating = <span class="number">-1</span> <span class="comment"># calls Movie.budget.__set__(m, -100)</span></div><div class="line"><span class="keyword">except</span> ValueError:</div><div class="line"> <span class="keyword">print</span> <span class="string">"Woops, negative value"</span></div><div class="line"> </div><div class="line"><span class="number">964000</span></div><div class="line">Woops, negative value</div></pre></td></tr></table></figure>
<p>这里引入了一些新的语法,我们一条条的来看:</p>
<p><code>NonNegative</code>是一个描述符对象,因为它定义了<code>__get__</code>,<code>__set__</code>或<code>__delete__</code>方法。</p>
<p><code>Movie</code>类现在看起来非常清晰。我们在类的层面上创建了4个描述符,把它们当做普通的实例属性。显然,描述符在这里为我们做非负检查。</p>
<h3 id="访问描述符"><a href="#访问描述符" class="headerlink" title="访问描述符"></a>访问描述符</h3><p>当解释器遇到<code>print m.buget</code>时,它就会把<code>budget</code>当作一个带有<code>__get__</code>方法的描述符,调用<code>Movie.budget.__get__</code>方法并将方法的返回值打印出来,而不是直接传递<code>m.budget</code>来打印。这和你访问一个<code>property</code>相似,Python自动调用一个方法,同时返回结果。</p>
<p><code>__get__</code>接收2个参数:一个是点号左边的实例对象(在这里,就是m.budget中的m),另一个是这个实例的类型<code>Movie</code>。在一些Python<a href="http://docs.python.org/2/reference/datamodel.html#implementing-descriptors" target="_blank" rel="external">文档</a>中,<code>Movie</code>被称作描述符的所有者(owner)。如果我们需要访问<code>Movie.budget</code>,Python将会调用<code>Movie.budget.__get__(None, Movie)</code>。可以看到,第一个参数要么是所有者的实例,要么是<code>None</code>。这些输入参数可能看起来很怪,但是这里它们告诉了你描述符属于哪个对象的一部分。当我们看到<code>NonNegative</code>类的实现时这一切就合情合理了。</p>
<h3 id="对描述符赋值"><a href="#对描述符赋值" class="headerlink" title="对描述符赋值"></a>对描述符赋值</h3><p>当解释器看到<code>m.rating = 100</code>时,Python识别出<code>rating</code>是一个带有<code>__set__</code>方法的描述符,于是就调用<code>Movie.rating.__set__(m, 100)</code>。和<code>__get__</code>一样,<code>__set__</code>的第一个参数是点号左边的类实例<code>m.rating = 100</code>中的<code>m</code>。第二个参数是所赋的值(100)。</p>
<h3 id="删除描述符"><a href="#删除描述符" class="headerlink" title="删除描述符"></a>删除描述符</h3><p>为了说明的完整,这里提一下删除。如果你调用<code>del m.budget</code>,Python就会调用<code>Movie.budget.__delete__(m)</code>。</p>
<h2 id="NonNegative类是如何工作的?"><a href="#NonNegative类是如何工作的?" class="headerlink" title="NonNegative类是如何工作的?"></a>NonNegative类是如何工作的?</h2><p>带着前面的困惑,我们终于要揭示<code>NonNegative</code>类是如何工作的了。每个<code>NonNegative</code>的实例都维护着一个字典,其中保存着所有者实例和对应数据的映射关系。当我们调用<code>m.budget</code>时,<code>__get__</code>方法会查找与<code>m</code>相关联的数据,并返回这个结果(如果这个值不存在,则会返回一个默认值)。<code>__set__</code>采用的方式相同,但是这里会包含额外的非负检查。我们使用<code>WeakKeyDictionary</code>来取代普通的字典以防止内存泄露——我们可不想仅仅因为它在描述符的字典中就让一个无用的实例一直存活着。</p>
<p>使用描述符会有一点别扭。因为它们作用于类的层次上,每一个类实例都共享同一个描述符。这就意味着对不同的实例对象而言,描述符不得不手动地管理不同的状态,同时需要显式的将类实例作为第一个参数准确传递给<code>__get__</code>、<code>__set__</code>以及<code>__delete__</code>方法。</p>
<p>我希望这个例子解释清楚了描述符可以用来做什么——它们提供了一种方法将<code>property</code>的逻辑隔离到单独的类中来处理。如果你发现自己正在不同的<code>property</code>之间重复着相同的逻辑,那么本文也许会成为一个线索供你思考为何用描述符重构代码是值得一试的。</p>
<h2 id="秘诀和陷阱"><a href="#秘诀和陷阱" class="headerlink" title="秘诀和陷阱"></a>秘诀和陷阱</h2><h3 id="把描述符放在类的层次上(class-level)"><a href="#把描述符放在类的层次上(class-level)" class="headerlink" title="把描述符放在类的层次上(class level)"></a>把描述符放在类的层次上(class level)</h3><p>为了让描述符能够正常工作,它们必须定义在类的层次上。如果你不这么做,那么Python无法自动为你调用<code>__get__</code>和<code>__set__</code>方法。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Broken</span><span class="params">(object)</span>:</span></div><div class="line"> y = NonNegative(<span class="number">5</span>)</div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></div><div class="line"> self.x = NonNegative(<span class="number">0</span>) <span class="comment"># NOT a good descriptor</span></div><div class="line"> </div><div class="line">b = Broken()</div><div class="line"><span class="keyword">print</span> <span class="string">"X is %s, Y is %s"</span> % (b.x, b.y)</div><div class="line"> </div><div class="line">X <span class="keyword">is</span> <__main__.NonNegative object at <span class="number">0x10432c250</span>>, Y <span class="keyword">is</span> <span class="number">5</span></div></pre></td></tr></table></figure>
<p>可以看到,访问类层次上的描述符<code>y</code>可以自动调用<code>__get__</code>。但是访问实例层次上的描述符x只会返回描述符本身,真是魔法一般的存在啊。</p>
<h3 id="确保实例的数据只属于实例本身"><a href="#确保实例的数据只属于实例本身" class="headerlink" title="确保实例的数据只属于实例本身"></a>确保实例的数据只属于实例本身</h3><p>你可能会像这样编写<code>NonNegative</code>描述符:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">BrokenNonNegative</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, default)</span>:</span></div><div class="line"> self.value = default</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__get__</span><span class="params">(self, instance, owner)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.value</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set__</span><span class="params">(self, instance, value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">0</span>:</div><div class="line"> <span class="keyword">raise</span> ValueError(<span class="string">"Negative value not allowed: %s"</span> % value)</div><div class="line"> self.value = value</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Foo</span><span class="params">(object)</span>:</span></div><div class="line"> bar = BrokenNonNegative(<span class="number">5</span>) </div><div class="line"> </div><div class="line">f = Foo()</div><div class="line"><span class="keyword">try</span>:</div><div class="line"> f.bar = <span class="number">-1</span></div><div class="line"><span class="keyword">except</span> ValueError:</div><div class="line"> <span class="keyword">print</span> <span class="string">"Caught the invalid assignment"</span></div><div class="line"> </div><div class="line">Caught the invalid assignment</div></pre></td></tr></table></figure>
<p>这么做看起来似乎能正常工作。但这里的问题就在于所有<code>Foo</code>的实例都共享相同的<code>bar</code>,这会产生一些令人痛苦的结果:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Foo</span><span class="params">(object)</span>:</span></div><div class="line"> bar = BrokenNonNegative(<span class="number">5</span>) </div><div class="line"> </div><div class="line">f = Foo()</div><div class="line">g = Foo()</div><div class="line"> </div><div class="line"><span class="keyword">print</span> <span class="string">"f.bar is %s\ng.bar is %s"</span> % (f.bar, g.bar)</div><div class="line"><span class="keyword">print</span> <span class="string">"Setting f.bar to 10"</span></div><div class="line">f.bar = <span class="number">10</span></div><div class="line"><span class="keyword">print</span> <span class="string">"f.bar is %s\ng.bar is %s"</span> % (f.bar, g.bar) <span class="comment">#ouch</span></div><div class="line">f.bar <span class="keyword">is</span> <span class="number">5</span></div><div class="line">g.bar <span class="keyword">is</span> <span class="number">5</span></div><div class="line">Setting f.bar to <span class="number">10</span></div><div class="line">f.bar <span class="keyword">is</span> <span class="number">10</span></div><div class="line">g.bar <span class="keyword">is</span> <span class="number">10</span></div></pre></td></tr></table></figure>
<p>这就是为什么我们要在<code>NonNegative</code>中使用数据字典的原因。<code>__get__</code>和<code>__set__</code>的第一个参数告诉我们需要关心哪一个实例。<code>NonNegative</code>使用这个参数作为字典的<code>key</code>,为每一个<code>Foo</code>实例单独保存一份数据。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Foo</span><span class="params">(object)</span>:</span></div><div class="line"> bar = NonNegative(<span class="number">5</span>)</div><div class="line"> </div><div class="line">f = Foo()</div><div class="line">g = Foo()</div><div class="line"><span class="keyword">print</span> <span class="string">"f.bar is %s\ng.bar is %s"</span> % (f.bar, g.bar)</div><div class="line"><span class="keyword">print</span> <span class="string">"Setting f.bar to 10"</span></div><div class="line">f.bar = <span class="number">10</span></div><div class="line"><span class="keyword">print</span> <span class="string">"f.bar is %s\ng.bar is %s"</span> % (f.bar, g.bar) <span class="comment">#better</span></div><div class="line">f.bar <span class="keyword">is</span> <span class="number">5</span></div><div class="line">g.bar <span class="keyword">is</span> <span class="number">5</span></div><div class="line">Setting f.bar to <span class="number">10</span></div><div class="line">f.bar <span class="keyword">is</span> <span class="number">10</span></div><div class="line">g.bar <span class="keyword">is</span> <span class="number">5</span></div></pre></td></tr></table></figure>
<p>这就是描述符最令人感到别扭的地方(坦白的说,我不理解为什么Python不让你在实例的层次上定义描述符,并且总是需要将实际的处理分发给<code>__get__</code>和<code>__set__</code>。这么做行不通一定是有原因的)</p>
<h3 id="注意不可哈希的描述符所有者"><a href="#注意不可哈希的描述符所有者" class="headerlink" title="注意不可哈希的描述符所有者"></a>注意不可哈希的描述符所有者</h3><p><code>NonNegative</code>类使用了一个字典来单独保存专属于实例的数据。这个一般来说是没问题的,除非你用到了不可哈希(unhashable)的对象:</p>
<figure class="highlight"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div></pre></td><td class="code"><pre><div class="line">class MoProblems(list): #you can't use lists as dictionary keys</div><div class="line"> x = NonNegative(5)</div><div class="line"> </div><div class="line">m = MoProblems()</div><div class="line">print m.x # womp womp</div><div class="line"> </div><div class="line">TypeError</div><div class="line">Traceback (most recent call last)</div><div class="line"><ipython-input-8-dd73b177bd8d> in <module>()</div><div class="line"> 3 </div><div class="line"> 4 m = MoProblems()</div><div class="line">----> 5 print m.x # womp womp</div><div class="line"> </div><div class="line"><ipython-input-3-6671804ce5d5> in __get__(self, instance, owner)</div><div class="line"> 9 # instance = x</div><div class="line"> 10 # owner = type(x)</div><div class="line">---> 11 return self.data.get(instance, self.default)</div><div class="line"> 12 </div><div class="line"> 13 def __set__(self, instance, value):</div><div class="line"> </div><div class="line">TypeError: unhashable type: 'MoProblems'</div></pre></td></tr></table></figure>
<p>因为<code>MoProblems</code>的实例(<code>list</code>的子类)是不可哈希的,因此它们不能为<code>MoProblems</code>.<code>x</code>用做数据字典的key。有一些方法可以规避这个问题,但是都不完美。最好的方法可能就是给你的描述符加标签了。</p>
<figure class="highlight"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div></pre></td><td class="code"><pre><div class="line">class Descriptor(object):</div><div class="line"> </div><div class="line"> def __init__(self, label):</div><div class="line"> self.label = label</div><div class="line"> </div><div class="line"> def __get__(self, instance, owner):</div><div class="line"> print '__get__', instance, owner</div><div class="line"> return instance.__dict__.get(self.label)</div><div class="line"> </div><div class="line"> def __set__(self, instance, value):</div><div class="line"> print '__set__'</div><div class="line"> instance.__dict__[self.label] = value</div><div class="line"> </div><div class="line">class Foo(list):</div><div class="line"> x = Descriptor('x')</div><div class="line"> y = Descriptor('y')</div><div class="line"> </div><div class="line">f = Foo()</div><div class="line">f.x = 5</div><div class="line">print f.x</div><div class="line"> </div><div class="line">__set__</div><div class="line">__get__ [] <class '__main__.Foo'></div><div class="line">5</div></pre></td></tr></table></figure>
<p>这种方法依赖于Python的方法解析顺序(即,MRO)。我们给Foo中的每个描述符加上一个标签名,名称和我们赋值给描述符的变量名相同,比如<code>x = Descriptor(‘x’)</code>。之后,描述符将特定于实例的数据保存在<code>f.__dict__['x']</code>中。这个字典条目通常是当我们请求<code>f.x</code>时Python给出的返回值。然而,由于<code>Foo.x</code>是一个描述符,Python不能正常的使用<code>f.__dict__[‘x’]</code>,但是描述符可以安全的在这里存储数据。只是要记住,不要在别的地方也给这个描述符添加标签。</p>
<figure class="highlight"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div></pre></td><td class="code"><pre><div class="line">class Foo(object):</div><div class="line"> x = Descriptor('y')</div><div class="line"> </div><div class="line">f = Foo()</div><div class="line">f.x = 5</div><div class="line">print f.x</div><div class="line"> </div><div class="line">f.y = 4 #oh no!</div><div class="line">print f.x</div><div class="line">__set__</div><div class="line">__get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'></div><div class="line">5</div><div class="line">__get__ <__main__.Foo object at 0x10432c810> <class '__main__.Foo'></div><div class="line">4</div></pre></td></tr></table></figure>
<p>我不喜欢这种方式,因为这样的代码很脆弱也有很多微妙之处。但这个方法的确很普遍,可以用在不可哈希的所有者类上。David Beazley在他的<a href="http://www.amazon.com/Python-Essential-Reference-4th-Edition/dp/0672329786/" target="_blank" rel="external">书</a>中用到了这个方法。</p>
<h3 id="在元类中使用带标签的描述符"><a href="#在元类中使用带标签的描述符" class="headerlink" title="在元类中使用带标签的描述符"></a>在元类中使用带标签的描述符</h3><p>由于描述符的标签名和赋给它的变量名相同,所以有人使用元类来自动处理这个簿记(bookkeeping)任务。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Descriptor</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self)</span>:</span></div><div class="line"> <span class="comment">#notice we aren't setting the label here</span></div><div class="line"> self.label = <span class="keyword">None</span></div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__get__</span><span class="params">(self, instance, owner)</span>:</span></div><div class="line"> <span class="keyword">print</span> <span class="string">'__get__. Label = %s'</span> % self.label</div><div class="line"> <span class="keyword">return</span> instance.__dict__.get(self.label, <span class="keyword">None</span>)</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set__</span><span class="params">(self, instance, value)</span>:</span></div><div class="line"> <span class="keyword">print</span> <span class="string">'__set__'</span></div><div class="line"> instance.__dict__[self.label] = value</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">DescriptorOwner</span><span class="params">(type)</span>:</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__new__</span><span class="params">(cls, name, bases, attrs)</span>:</span></div><div class="line"> <span class="comment"># find all descriptors, auto-set their labels</span></div><div class="line"> <span class="keyword">for</span> n, v <span class="keyword">in</span> attrs.items():</div><div class="line"> <span class="keyword">if</span> isinstance(v, Descriptor):</div><div class="line"> v.label = n</div><div class="line"> <span class="keyword">return</span> super(DescriptorOwner, cls).__new__(cls, name, bases, attrs)</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">Foo</span><span class="params">(object)</span>:</span></div><div class="line"> __metaclass__ = DescriptorOwner</div><div class="line"> x = Descriptor()</div><div class="line"> </div><div class="line">f = Foo()</div><div class="line">f.x = <span class="number">10</span></div><div class="line"><span class="keyword">print</span> f.x</div><div class="line"> </div><div class="line">__set__</div><div class="line">__get__. Label = x</div><div class="line"><span class="number">10</span></div></pre></td></tr></table></figure>
<p>我不会去解释有关元类的细节——参考文献中David Beazley已经在他的文章中解释的很清楚了。 需要指出的是元类自动的为描述符添加标签,并且和赋给描述符的变量名字相匹配。</p>
<p>尽管这样解决了描述符的标签和变量名不一致的问题,但是却引入了复杂的元类。虽然我很怀疑,但是你可以自行判断这么做是否值得。</p>
<h3 id="访问描述符的方法"><a href="#访问描述符的方法" class="headerlink" title="访问描述符的方法"></a>访问描述符的方法</h3><p>描述符仅仅是类,也许你想要为它们增加一些方法。举个例子,描述符是一个用来回调<code>property</code>的很好的手段。比如我们想要一个类的某个部分的状态发生变化时就立刻通知我们。下面的大部分代码是用来做这个的:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">CallbackProperty</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="string">"""A property that will alert observers when upon updates"""</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, default=None)</span>:</span></div><div class="line"> self.data = WeakKeyDictionary()</div><div class="line"> self.default = default</div><div class="line"> self.callbacks = WeakKeyDictionary()</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__get__</span><span class="params">(self, instance, owner)</span>:</span></div><div class="line"> <span class="keyword">return</span> self.data.get(instance, self.default)</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set__</span><span class="params">(self, instance, value)</span>:</span> </div><div class="line"> <span class="keyword">for</span> callback <span class="keyword">in</span> self.callbacks.get(instance, []):</div><div class="line"> <span class="comment"># alert callback function of new value</span></div><div class="line"> callback(value)</div><div class="line"> self.data[instance] = value</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">add_callback</span><span class="params">(self, instance, callback)</span>:</span></div><div class="line"> <span class="string">"""Add a new function to call everytime the descriptor updates"""</span></div><div class="line"> <span class="comment">#but how do we get here?!?!</span></div><div class="line"> <span class="keyword">if</span> instance <span class="keyword">not</span> <span class="keyword">in</span> self.callbacks:</div><div class="line"> self.callbacks[instance] = []</div><div class="line"> self.callbacks[instance].append(callback)</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">BankAccount</span><span class="params">(object)</span>:</span></div><div class="line"> balance = CallbackProperty(<span class="number">0</span>)</div><div class="line"> </div><div class="line"><span class="function"><span class="keyword">def</span> <span class="title">low_balance_warning</span><span class="params">(value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">100</span>:</div><div class="line"> <span class="keyword">print</span> <span class="string">"You are poor"</span></div><div class="line"> </div><div class="line">ba = BankAccount()</div><div class="line"> </div><div class="line"><span class="comment"># will not work -- try it</span></div><div class="line"><span class="comment">#ba.balance.add_callback(ba, low_balance_warning)</span></div></pre></td></tr></table></figure>
<p>这是一个很有吸引力的模式——我们可以自定义回调函数用来响应一个类中的状态变化,而且完全无需修改这个类的代码。这样做可真是替人分忧解难呀。现在,我们所要做的就是调用<code>ba.balance.add_callback(ba, low_balance_warning)</code>,以使得每次<code>balance</code>变化时<code>low_balance_warning</code>都会被调用。</p>
<p>但是我们是如何做到的呢?当我们试图访问它们时,描述符总是会调用<code>__get__</code>。就好像<code>add_callback</code>方法是无法触及的一样!其实关键在于利用了一种特殊的情况,即,当从类的层次访问时,<code>__get__</code>方法的第一个参数是<code>None</code>。</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div><div class="line">17</div><div class="line">18</div><div class="line">19</div><div class="line">20</div><div class="line">21</div><div class="line">22</div><div class="line">23</div><div class="line">24</div><div class="line">25</div><div class="line">26</div><div class="line">27</div><div class="line">28</div><div class="line">29</div><div class="line">30</div><div class="line">31</div><div class="line">32</div><div class="line">33</div><div class="line">34</div><div class="line">35</div><div class="line">36</div><div class="line">37</div><div class="line">38</div><div class="line">39</div><div class="line">40</div><div class="line">41</div></pre></td><td class="code"><pre><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">CallbackProperty</span><span class="params">(object)</span>:</span></div><div class="line"> <span class="string">"""A property that will alert observers when upon updates"""</span></div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__init__</span><span class="params">(self, default=None)</span>:</span></div><div class="line"> self.data = WeakKeyDictionary()</div><div class="line"> self.default = default</div><div class="line"> self.callbacks = WeakKeyDictionary()</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__get__</span><span class="params">(self, instance, owner)</span>:</span></div><div class="line"> <span class="keyword">if</span> instance <span class="keyword">is</span> <span class="keyword">None</span>:</div><div class="line"> <span class="keyword">return</span> self </div><div class="line"> <span class="keyword">return</span> self.data.get(instance, self.default)</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">__set__</span><span class="params">(self, instance, value)</span>:</span></div><div class="line"> <span class="keyword">for</span> callback <span class="keyword">in</span> self.callbacks.get(instance, []):</div><div class="line"> <span class="comment"># alert callback function of new value</span></div><div class="line"> callback(value)</div><div class="line"> self.data[instance] = value</div><div class="line"> </div><div class="line"> <span class="function"><span class="keyword">def</span> <span class="title">add_callback</span><span class="params">(self, instance, callback)</span>:</span></div><div class="line"> <span class="string">"""Add a new function to call everytime the descriptor within instance updates"""</span></div><div class="line"> <span class="keyword">if</span> instance <span class="keyword">not</span> <span class="keyword">in</span> self.callbacks:</div><div class="line"> self.callbacks[instance] = []</div><div class="line"> self.callbacks[instance].append(callback)</div><div class="line"> </div><div class="line"><span class="class"><span class="keyword">class</span> <span class="title">BankAccount</span><span class="params">(object)</span>:</span></div><div class="line"> balance = CallbackProperty(<span class="number">0</span>)</div><div class="line"> </div><div class="line"><span class="function"><span class="keyword">def</span> <span class="title">low_balance_warning</span><span class="params">(value)</span>:</span></div><div class="line"> <span class="keyword">if</span> value < <span class="number">100</span>:</div><div class="line"> <span class="keyword">print</span> <span class="string">"You are now poor"</span></div><div class="line"> </div><div class="line">ba = BankAccount()</div><div class="line">BankAccount.balance.add_callback(ba, low_balance_warning)</div><div class="line"> </div><div class="line">ba.balance = <span class="number">5000</span></div><div class="line"><span class="keyword">print</span> <span class="string">"Balance is %s"</span> % ba.balance</div><div class="line">ba.balance = <span class="number">99</span></div><div class="line"><span class="keyword">print</span> <span class="string">"Balance is %s"</span> % ba.balance</div><div class="line">Balance <span class="keyword">is</span> <span class="number">5000</span></div><div class="line">You are now poor</div><div class="line">Balance <span class="keyword">is</span> <span class="number">99</span></div></pre></td></tr></table></figure>
<h2 id="个人总结"><a href="#个人总结" class="headerlink" title="个人总结"></a>个人总结</h2><ul>
<li>描述符伪装成类的属型,而当类的实例通过点操作符访问时,实际是就是调用描述符中三个方法之一</li>
<li>属性查找的顺序是:”类 -> 基类 -> 实例”,并不是首先就在表示实例的那片内存中查找属性,而是首先在类中查找,因为python需要首先判断该’属性’是否是描述符(伪装的属性),如果是描述符,那么则不是调用<code>__setattr__()</code>或者<code>__getattr__()</code>方法对<code>__dict__</code>字典进行处理,而是调用描述符的<code>__get__()</code>,<code>__set__()</code>和<code>__delete__()</code>方法</li>
<li>由于描述符只能作为类的属性,所以该类的多个实例都是公用的这个描述符,所以一般在描述符中的<code>__init__()</code>函数中创建一个字典,以类实例的地址(例子中的<code>instance</code>)参数作为key,以要这个实例的数据作为value</li>
<li>类中的普通方法第一个参数是<code>self</code>,因为实例化类时,会自动将分配给实例的内存地址传递该self,也就是所谓的绑定,该函数也就成为绑定函数了,而给实例动态添加的方法以及类之外定义的方法就不需要<code>self</code>参数了</li>
<li>以底层的思维了看待类和对象,都是内存中分配的地址空间而已,虽然有书上说类也是对象,但是不好理解,从底层就容易理解一些,先划分区域,并写入相应数据,然后这就是类,然后以这个类实例化时,就是再划分一块内存,写于相应数据(为了节省空间,不会完全复制类中的属性和方法,只会简单的赋值一些属性表示该对象是那个类的实例),然后这就是类。类属性就是属性的值只在代表类的那块内存中,而不在代表对象的那块内存中</li>
</ul>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://www.zhihu.com/question/25391709" target="_blank" rel="external">如何理解 Python 的 Descriptor?</a></li>
<li><a href="https://segmentfault.com/a/1190000004478718" target="_blank" rel="external">Python 的 descriptor(上)</a></li>
<li><a href="http://www.geekfan.net/7862/" target="_blank" rel="external">Python描述符(descriptor)解密</a></li>
<li><a href="https://docs.python.org/3/howto/descriptor.html" target="_blank" rel="external">Descriptor HowTo Guide</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="简介"><a href="#简介" class="headerlink" title="简介"></a>简介</h2><h3 id="Python描述符-descriptor-解密"><a href="#Python描述符-descriptor-解密" class="headerlink" title="Python描述符(descriptor)解密"></a>Python描述符(descriptor)解密</h3><p>原文链接: <a href="http://nbviewer.ipython.org/urls/gist.github.com/ChrisBeaumont/5758381/raw/descriptor_writeup.ipynb" target="_blank" rel="external">Chris Beaumont</a> 翻译: <a href="http://www.geekfan.net/" target="_blank" rel="external">极客范 </a>- <a href="http://www.geekfan.net/author/murong/" target="_blank" rel="external">慕容老匹夫</a></p>
<p>转载链接: <a href="http://www.geekfan.net/7862/" target="_blank" rel="external">http://www.geekfan.net/7862/</a></p>
<p>Python中包含了许多内建的语言特性,它们使得代码简洁且易于理解。这些特性包括列表/集合/字典推导式,属性(property)、以及装饰器(decorator)。对于大部分特性来说,这些“中级”的语言特性有着完善的文档,并且易于学习。</p>
<p>但是这里有个例外,那就是描述符。至少对于我来说,描述符是Python语言核心中困扰我时间最长的一个特性。这里有几点原因如下:</p>
<ol>
<li>有关描述符的官方文档相当难懂,而且没有包含优秀的示例告诉你为什么需要编写描述符(我得为Raymond Hettinger辩护一下,他写的其他主题的Python文章和视频对我的帮助还是非常大的)</li>
<li>编写描述符的语法显得有些怪异</li>
<li>自定义描述符可能是Python中用的最少的特性,因此你很难在开源项目中找到优秀的示例</li>
</ol>
<p>但是一旦你理解了之后,描述符的确还是有它的应用价值的。这篇文章告诉你描述符可以用来做什么,以及为什么应该引起你的注意。</p>
</summary>
<category term="Python" scheme="https://xin053.github.io/categories/Python/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
<category term="descriptor" scheme="https://xin053.github.io/tags/descriptor/"/>
</entry>
<entry>
<title>os库常用方法使用介绍</title>
<link href="https://xin053.github.io/2016/11/29/os%E5%BA%93%E5%B8%B8%E7%94%A8%E6%96%B9%E6%B3%95%E4%BD%BF%E7%94%A8%E4%BB%8B%E7%BB%8D/"/>
<id>https://xin053.github.io/2016/11/29/os库常用方法使用介绍/</id>
<published>2016-11-29T01:27:27.000Z</published>
<updated>2017-05-27T13:20:48.775Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="os简介"><a href="#os简介" class="headerlink" title="os简介"></a>os简介</h2><p>与系统相依赖的一些操作,有些操作只支持unix系统</p>
<h2 id="os常用方法"><a href="#os常用方法" class="headerlink" title="os常用方法"></a>os常用方法</h2><h3 id="environ与getenv"><a href="#environ与getenv" class="headerlink" title="environ与getenv"></a>environ与getenv</h3><p>获取环境变量</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> os</div><div class="line">os.environ[<span class="string">"PYTHON_HOME"</span>]</div><div class="line"><span class="comment"># 'F:\\pythonVE'</span></div><div class="line">os.getenv(<span class="string">'PYTHON_HOME'</span>)</div><div class="line"><span class="comment"># 'F:\\pythonVE'</span></div></pre></td></tr></table></figure>
<a id="more"></a>
<h3 id="用户与用户组"><a href="#用户与用户组" class="headerlink" title="用户与用户组"></a>用户与用户组</h3><p>获取当前进程或者指定pid进程的用户和用户组,仅支持unix,详情见<a href="https://docs.python.org/3/library/os.html#os.getegid" target="_blank" rel="external"><code>os</code></a></p>
<p>其中windows平台也可以使用的:</p>
<p>获取当前登陆用户:</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div></pre></td><td class="code"><pre><div class="line">os.getlogin() </div><div class="line"><span class="comment"># 'zzx'</span></div></pre></td></tr></table></figure>
<h3 id="chdir与getcwd"><a href="#chdir与getcwd" class="headerlink" title="chdir与getcwd"></a>chdir与getcwd</h3><p>改变与获取当前工作路径</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line">os.getcwd()</div><div class="line"><span class="comment"># 'F:\\pythonVE\\Scripts'</span></div><div class="line">os.chdir(<span class="string">'..'</span>)</div><div class="line">os.getcwd()</div><div class="line"><span class="comment"># 'F:\\pythonVE'</span></div></pre></td></tr></table></figure>
<h3 id="listdir与scandir"><a href="#listdir与scandir" class="headerlink" title="listdir与scandir"></a>listdir与scandir</h3><p>枚举指定目录,不指定<code>path</code>参数则默认当前路径</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div></pre></td><td class="code"><pre><div class="line">os.listdir()</div><div class="line"><span class="comment"># ['Include', 'Lib', 'pip-selfcheck.json', 'pyvenv.cfg', 'Scripts', 'share']</span></div><div class="line">os.listdir(<span class="string">'.'</span>)</div><div class="line"><span class="comment"># ['Include', 'Lib', 'pip-selfcheck.json', 'pyvenv.cfg', 'Scripts', 'share']</span></div></pre></td></tr></table></figure>
<p><code>scandir()</code>与<code>listdir()</code>作用相同,但是返回的是迭代器</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div></pre></td><td class="code"><pre><div class="line">a = os.scandir()</div><div class="line">a</div><div class="line"><span class="comment"># <nt.ScandirIterator at 0x187d4cc5440></span></div><div class="line">a.__next__()</div><div class="line"><span class="comment"># <DirEntry 'Include'></span></div><div class="line">a.__next__()</div><div class="line"><span class="comment"># <DirEntry 'Lib'></span></div></pre></td></tr></table></figure>
<p>而<code>DirEntry</code>对象包含了与文件相关的属性,详情见:<a href="https://docs.python.org/3/library/os.html#os.DirEntry" target="_blank" rel="external"><code>os.DirEntry</code></a></p>
<h3 id="文件系统相关"><a href="#文件系统相关" class="headerlink" title="文件系统相关"></a>文件系统相关</h3><ul>
<li><code>mkdir()</code> 创建目录</li>
<li><code>remove()</code> 删除文件</li>
<li><code>rmdir()</code> 删除目录</li>
<li><code>rename()</code> 重命名</li>
</ul>
<h3 id="stat"><a href="#stat" class="headerlink" title="stat"></a>stat</h3><p>文件相关信息</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span><span class="keyword">import</span> os</div><div class="line"><span class="meta">>>> </span>statinfo = os.stat(<span class="string">'somefile.txt'</span>)</div><div class="line"><span class="meta">>>> </span>statinfo</div><div class="line">os.stat_result(st_mode=<span class="number">33188</span>, st_ino=<span class="number">7876932</span>, st_dev=<span class="number">234881026</span>,</div><div class="line">st_nlink=<span class="number">1</span>, st_uid=<span class="number">501</span>, st_gid=<span class="number">501</span>, st_size=<span class="number">264</span>, st_atime=<span class="number">1297230295</span>,</div><div class="line">st_mtime=<span class="number">1297230027</span>, st_ctime=<span class="number">1297230027</span>)</div><div class="line"><span class="meta">>>> </span>statinfo.st_size</div><div class="line"><span class="number">264</span></div></pre></td></tr></table></figure>
<h3 id="startfile"><a href="#startfile" class="headerlink" title="startfile"></a>startfile</h3><p>使用电脑上默认应用打开指定文件</p>
<h3 id="分隔符-换行符相关"><a href="#分隔符-换行符相关" class="headerlink" title="分隔符 换行符相关"></a>分隔符 换行符相关</h3><figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div><div class="line">6</div><div class="line">7</div><div class="line">8</div><div class="line">9</div><div class="line">10</div><div class="line">11</div><div class="line">12</div><div class="line">13</div><div class="line">14</div><div class="line">15</div><div class="line">16</div></pre></td><td class="code"><pre><div class="line"><span class="meta">>>> </span>os.curdir</div><div class="line"><span class="string">'.'</span></div><div class="line"><span class="meta">>>> </span>os.pardir</div><div class="line"><span class="string">'..'</span></div><div class="line"><span class="meta">>>> </span>os.sep</div><div class="line"><span class="string">'\\'</span></div><div class="line"><span class="meta">>>> </span>os.altsep</div><div class="line"><span class="string">'/'</span></div><div class="line"><span class="meta">>>> </span>os.extsep</div><div class="line"><span class="string">'.'</span></div><div class="line"><span class="meta">>>> </span>os.pathsep</div><div class="line"><span class="string">';'</span></div><div class="line"><span class="meta">>>> </span>os.defpath</div><div class="line"><span class="string">'.;C:\\bin'</span></div><div class="line"><span class="meta">>>> </span>os.linesep</div><div class="line"><span class="string">'\r\n'</span></div></pre></td></tr></table></figure>
<h3 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h3><ul>
<li><a href="https://docs.python.org/3/library/os.html" target="_blank" rel="external"><code>os</code>官方文档</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="os简介"><a href="#os简介" class="headerlink" title="os简介"></a>os简介</h2><p>与系统相依赖的一些操作,有些操作只支持unix系统</p>
<h2 id="os常用方法"><a href="#os常用方法" class="headerlink" title="os常用方法"></a>os常用方法</h2><h3 id="environ与getenv"><a href="#environ与getenv" class="headerlink" title="environ与getenv"></a>environ与getenv</h3><p>获取环境变量</p>
<figure class="highlight python"><table><tr><td class="gutter"><pre><div class="line">1</div><div class="line">2</div><div class="line">3</div><div class="line">4</div><div class="line">5</div></pre></td><td class="code"><pre><div class="line"><span class="keyword">import</span> os</div><div class="line">os.environ[<span class="string">"PYTHON_HOME"</span>]</div><div class="line"><span class="comment"># 'F:\\pythonVE'</span></div><div class="line">os.getenv(<span class="string">'PYTHON_HOME'</span>)</div><div class="line"><span class="comment"># 'F:\\pythonVE'</span></div></pre></td></tr></table></figure>
</summary>
<category term="Python模块学习" scheme="https://xin053.github.io/categories/Python%E6%A8%A1%E5%9D%97%E5%AD%A6%E4%B9%A0/"/>
<category term="Python" scheme="https://xin053.github.io/tags/Python/"/>
<category term="os" scheme="https://xin053.github.io/tags/os/"/>
</entry>
<entry>
<title>VS Code常用快捷键</title>
<link href="https://xin053.github.io/2016/11/15/VS%20Code%E5%B8%B8%E7%94%A8%E5%BF%AB%E6%8D%B7%E9%94%AE/"/>
<id>https://xin053.github.io/2016/11/15/VS Code常用快捷键/</id>
<published>2016-11-15T02:17:52.000Z</published>
<updated>2017-05-27T13:20:48.771Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="VS-Code常用快捷键"><a href="#VS-Code常用快捷键" class="headerlink" title="VS Code常用快捷键"></a>VS Code常用快捷键</h2><p><img src="https://code.visualstudio.com/home/home-screenshot-win-lg.png" alt=""></p>
<a id="more"></a>
<h3 id="F1-打开命令模式"><a href="#F1-打开命令模式" class="headerlink" title="F1 打开命令模式"></a>F1 打开命令模式</h3><p><img src="http://i.imgur.com/zvthdzV.png" alt=""></p>
<h3 id="Ctrl-X-剪切当前行或选中内容"><a href="#Ctrl-X-剪切当前行或选中内容" class="headerlink" title="Ctrl+X 剪切当前行或选中内容"></a>Ctrl+X 剪切当前行或选中内容</h3><h3 id="Ctrl-C-复制当前行或选中内容"><a href="#Ctrl-C-复制当前行或选中内容" class="headerlink" title="Ctrl+C 复制当前行或选中内容"></a>Ctrl+C 复制当前行或选中内容</h3><h3 id="Alt-↓-↑-上下移动当前行"><a href="#Alt-↓-↑-上下移动当前行" class="headerlink" title="Alt + ↓ / ↑ 上下移动当前行"></a>Alt + ↓ / ↑ 上下移动当前行</h3><p><img src="http://i.imgur.com/mIvWx1g.gif" alt=""></p>
<h3 id="Shift-Alt-↓-↑-复制当前行并上下移动"><a href="#Shift-Alt-↓-↑-复制当前行并上下移动" class="headerlink" title="Shift+Alt + ↓ / ↑ 复制当前行并上下移动"></a>Shift+Alt + ↓ / ↑ 复制当前行并上下移动</h3><p><img src="http://i.imgur.com/31nhx1W.gif" alt=""></p>
<h3 id="Ctrl-Enter-在下一行插入光标"><a href="#Ctrl-Enter-在下一行插入光标" class="headerlink" title="Ctrl+Enter 在下一行插入光标"></a>Ctrl+Enter 在下一行插入光标</h3><p><img src="http://i.imgur.com/y1ADLF0.gif" alt=""></p>
<h3 id="Ctrl-Shift-Enter-在上一行插入光标"><a href="#Ctrl-Shift-Enter-在上一行插入光标" class="headerlink" title="Ctrl+Shift+Enter 在上一行插入光标"></a>Ctrl+Shift+Enter 在上一行插入光标</h3><p><img src="http://i.imgur.com/G47gJ6m.gif" alt=""></p>
<h3 id="Home-跳到当前行的开始"><a href="#Home-跳到当前行的开始" class="headerlink" title="Home 跳到当前行的开始"></a>Home 跳到当前行的开始</h3><h3 id="End-跳到当前行的末尾"><a href="#End-跳到当前行的末尾" class="headerlink" title="End 跳到当前行的末尾"></a>End 跳到当前行的末尾</h3><h3 id="Ctrl-Home-跳到当前文件的开始"><a href="#Ctrl-Home-跳到当前文件的开始" class="headerlink" title="Ctrl+Home 跳到当前文件的开始"></a>Ctrl+Home 跳到当前文件的开始</h3><h3 id="Ctrl-End-跳到当前文件的末尾"><a href="#Ctrl-End-跳到当前文件的末尾" class="headerlink" title="Ctrl+End 跳到当前文件的末尾"></a>Ctrl+End 跳到当前文件的末尾</h3><h3 id="Ctrl-↑-↓-上下滑动滚动条"><a href="#Ctrl-↑-↓-上下滑动滚动条" class="headerlink" title="Ctrl+↑ / ↓ 上下滑动滚动条"></a>Ctrl+↑ / ↓ 上下滑动滚动条</h3><h3 id="Ctrl-G-行跳转"><a href="#Ctrl-G-行跳转" class="headerlink" title="Ctrl+G 行跳转"></a>Ctrl+G 行跳转</h3><p><img src="http://i.imgur.com/6Qe94Rm.gif" alt=""></p>
<h3 id="Ctrl-P-文件跳转"><a href="#Ctrl-P-文件跳转" class="headerlink" title="Ctrl+P 文件跳转"></a>Ctrl+P 文件跳转</h3><p><img src="http://i.imgur.com/2myzjV9.gif" alt=""></p>
<h3 id="Ctrl-Shift-O-符号跳转"><a href="#Ctrl-Shift-O-符号跳转" class="headerlink" title="Ctrl+Shift+O 符号跳转"></a>Ctrl+Shift+O 符号跳转</h3><p><img src="http://i.imgur.com/WLXn40n.gif" alt=""></p>
<h3 id="Alt-←-→-前进或后退-跟鼠标上的宏键功能一样"><a href="#Alt-←-→-前进或后退-跟鼠标上的宏键功能一样" class="headerlink" title="Alt+ ← / → 前进或后退,跟鼠标上的宏键功能一样"></a>Alt+ ← / → 前进或后退,跟鼠标上的宏键功能一样</h3><h3 id="Ctrl-M-通过tab切换焦点"><a href="#Ctrl-M-通过tab切换焦点" class="headerlink" title="Ctrl+M 通过tab切换焦点"></a>Ctrl+M 通过tab切换焦点</h3><p><img src="http://i.imgur.com/vCgFttG.gif" alt=""></p>
<h3 id="Alt-Click-插入光标"><a href="#Alt-Click-插入光标" class="headerlink" title="Alt+Click 插入光标"></a>Alt+Click 插入光标</h3><p><img src="http://i.imgur.com/eLq7XBG.gif" alt=""></p>
<h3 id="Ctrl-U-撤销上次光标操作"><a href="#Ctrl-U-撤销上次光标操作" class="headerlink" title="Ctrl+U 撤销上次光标操作"></a>Ctrl+U 撤销上次光标操作</h3><h3 id="Ctrl-F2-在所有选中单词后面添加光标"><a href="#Ctrl-F2-在所有选中单词后面添加光标" class="headerlink" title="Ctrl+F2 在所有选中单词后面添加光标"></a>Ctrl+F2 在所有选中单词后面添加光标</h3><p><img src="http://i.imgur.com/ZOutnSo.gif" alt=""></p>
<h3 id="Shift-Alt-→-←-控制选中范围"><a href="#Shift-Alt-→-←-控制选中范围" class="headerlink" title="Shift+Alt+ → / ← 控制选中范围"></a>Shift+Alt+ → / ← 控制选中范围</h3><p><img src="http://i.imgur.com/BANxAgX.gif" alt=""></p>
<h3 id="代码提示"><a href="#代码提示" class="headerlink" title="代码提示"></a>代码提示</h3><p>默认快捷键是<code>Ctrl + space</code>,但是和系统输入法的切换冲突了,并且之前java开发使用习惯了<code>Alt + /</code>作为代码提示的快捷键,所有将代码提示的快捷键改为了<code>Alt + /</code></p>
<p><img src="http://i.imgur.com/AlvEqJP.png" alt=""></p>
<h3 id="Trigger-parameter-hints"><a href="#Trigger-parameter-hints" class="headerlink" title="Trigger parameter hints"></a>Trigger parameter hints</h3><p>默认快捷键是<code>Ctrl+Shift+Space</code> ,同样因为冲突改为了<code>alt+shift+/</code></p>
<p><img src="http://i.imgur.com/Fs7CvFL.gif" alt=""></p>
<h3 id="F12-跳转到定义处-与Ctrl-左键效果一样"><a href="#F12-跳转到定义处-与Ctrl-左键效果一样" class="headerlink" title="F12 跳转到定义处 与Ctrl + 左键效果一样"></a>F12 跳转到定义处 与Ctrl + 左键效果一样</h3><h3 id="Alt-F12"><a href="#Alt-F12" class="headerlink" title="Alt + F12"></a>Alt + F12</h3><p><img src="http://i.imgur.com/3YDdhPI.gif" alt=""></p>
<h3 id="Ctrl-Alt-左键-在侧边打开定义"><a href="#Ctrl-Alt-左键-在侧边打开定义" class="headerlink" title="Ctrl + Alt + 左键 在侧边打开定义"></a>Ctrl + Alt + 左键 在侧边打开定义</h3><p>与<code>Ctrl+K F12</code>效果相同</p>
<p><img src="http://i.imgur.com/CeE7wOU.gif" alt=""></p>
<h3 id="Shift-F12-Show-References"><a href="#Shift-F12-Show-References" class="headerlink" title="Shift+F12 Show References"></a>Shift+F12 Show References</h3><p><img src="http://i.imgur.com/NiUKjAM.gif" alt=""></p>
<h3 id="F11-全屏"><a href="#F11-全屏" class="headerlink" title="F11 全屏"></a>F11 全屏</h3><p><strong>以上便是常用的VS Code快捷键,不包括插件提供的快捷键,关于其他的快捷键请看参考文档</strong></p>
<h2 id="参考文档"><a href="#参考文档" class="headerlink" title="参考文档"></a>参考文档</h2><ul>
<li><a href="https://go.microsoft.com/fwlink/?linkid=832145" target="_blank" rel="external">官方快捷键手册</a></li>
</ul>]]></content>
<summary type="html">
<h2 id="VS-Code常用快捷键"><a href="#VS-Code常用快捷键" class="headerlink" title="VS Code常用快捷键"></a>VS Code常用快捷键</h2><p><img src="https://code.visualstudio.com/home/home-screenshot-win-lg.png" alt=""></p>
</summary>
<category term="WeNeedToKnow" scheme="https://xin053.github.io/categories/WeNeedToKnow/"/>
<category term="VS Code" scheme="https://xin053.github.io/tags/VS-Code/"/>
</entry>
<entry>
<title>BeautifulSoup html与xml解析库使用详解</title>
<link href="https://xin053.github.io/2016/11/14/BeautifulSoup%20html%E4%B8%8Exml%E8%A7%A3%E6%9E%90%E5%BA%93%E4%BD%BF%E7%94%A8%E8%AF%A6%E8%A7%A3/"/>
<id>https://xin053.github.io/2016/11/14/BeautifulSoup html与xml解析库使用详解/</id>
<published>2016-11-14T07:38:10.000Z</published>
<updated>2017-05-27T13:20:48.767Z</updated>
<content type="html"><![CDATA[<link rel="stylesheet" type="text/css" href="/assets/css/DPlayer.min.css"><script src="/assets/js/DPlayer.min.js"> </script><script src="/assets/js/APlayer.min.js"> </script><h2 id="BeautifulSoup简介"><a href="#BeautifulSoup简介" class="headerlink" title="BeautifulSoup简介"></a>BeautifulSoup简介</h2><p>BeautifulSoup 3只支持python 2,并且已经停止开发,BeautifulSoup支持python2和3,以下使用方法参考4.4版说明文档</p>
<p><img src="http://www.crummy.com/software/BeautifulSoup/bs4/doc/_images/6.1.jpg" alt=""></p>
<a id="more"></a>
<h2 id="BeautifulSoup使用"><a href="#BeautifulSoup使用" class="headerlink" title="BeautifulSoup使用"></a>BeautifulSoup使用</h2><h3 id="解析器比较"><a href="#解析器比较" class="headerlink" title="解析器比较"></a>解析器比较</h3><table>
<thead>
<tr>
<th>解析器</th>
<th>使用方法</th>
<th>优势</th>
<th>劣势</th>
</tr>
</thead>
<tbody>
<tr>
<td>Python标准库</td>
<td><code>BeautifulSoup(markup,"html.parser")</code></td>
<td>Python的内置标准库执行速度适中文档容错能力强</td>
<td>Python 2.7.3 or 3.2.2)前 的版本中文档容错能力差</td>
</tr>
<tr>
<td>lxml HTML 解析器</td>
<td><code>BeautifulSoup(markup,"lxml")</code></td>
<td>速度快文档容错能力强</td>
<td>需要安装C语言库</td>
</tr>
<tr>
<td>lxml XML 解析器</td>
<td><code>BeautifulSoup(markup,["lxml-xml"])``BeautifulSoup(markup,"xml")</code></td>
<td>速度快唯一支持XML的解析器</td>
<td>需要安装C语言库</td>
</tr>
<tr>
<td>html5lib</td>
<td><code>BeautifulSoup(markup,"html5lib")</code></td>
<td>最好的容错性以浏览器的方式解析文档生成HTML5格式的文档</td>
<td>速度慢不依赖外部扩展</td>
</tr>
</tbody>
</table>