-
Notifications
You must be signed in to change notification settings - Fork 0
/
search.xml
2809 lines (2753 loc) · 347 KB
/
search.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<search>
<entry>
<title>[.NET][C#][SOLID] - DI & IoC (依賴注入與控制反轉) 全面講解</title>
<url>/posts/3588979794/</url>
<content><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>IoC (Inversion of Control) 控制反轉,與OOP SOLID原則中的其中一種設計原則有關,也就是其中的DIP(Dependency Inversion Principle),是OOP一個非常重要的程式設計思想,對於軟體開發來說十分重要,下面我將<strong>十分詳細的介紹何為DIP、IoC以及DI、為何要使用它們以及如何實作</strong>,相信大家閱讀完會對這個重要的思想了解的更加透徹。</p>
<span id="more"></span>
<h2 id="定義"><a href="#定義" class="headerlink" title="定義"></a>定義</h2><p>DIP以簡單的一句話說明就是</p>
<blockquote>
<p><strong>DIP - Dependency Inversion Principle (依賴倒轉原則)</strong></p>
<ul>
<li>一種原則、思想</li>
<li>高層次的模組不應該依賴低層的模組,低層次的模組也不應該依賴高層次的模組<br><strong>兩者都應該依賴抽象</strong></li>
</ul>
</blockquote>
<blockquote>
<p><strong>IoC - Inversion of Control (控制反轉)</strong></p>
<ul>
<li>一種思想</li>
<li>把對於某個物件的<strong>控制權</strong>移轉給<strong>第三方容器 (IoC Container)</strong></li>
</ul>
</blockquote>
<blockquote>
<p><strong>DI - Dependency Injection (依賴注入)</strong></p>
<ul>
<li>一種設計模式</li>
<li>將依賴通過<strong>注入</strong>的方式提供給需要的模組,是 IoC 與 DIP 的具體表現</li>
<li>把被依賴物件注入被動接收物件中</li>
</ul>
</blockquote>
<p>DIP的定義非常重要,請大家牢記在心。<br>也就是說,程式應該依賴抽象,而不是實作的實體,這可以幫助我們對程式之間解耦,能夠被更好地維護。<br>DI是為了實現DIP和IoC而誕生的實現方式,因此大家在使用物件導向的技巧時,<strong>務必清楚自己在做什麼,和為何這樣做</strong>。</p>
<h2 id="好處-為什麼要使用?"><a href="#好處-為什麼要使用?" class="headerlink" title="好處/為什麼要使用?"></a>好處/為什麼要使用?</h2><p>在針對各個名詞解釋與實作之前,我想先讓各位了解DIP以及IoC帶來的好處。</p>
<blockquote>
<ol>
<li>可維護性 (maintainability)</li>
<li>寬鬆耦合 (loose coupling)</li>
</ol>
</blockquote>
<ul>
<li>所謂的<strong>可維護性</strong>,就是你在日<strong>後需要修改或更新程式的時候,所花費的時間和精力</strong>,如果修改起來很費時費力,那我們就說他可維護性低。</li>
<li>所謂的耦合度,就是物件與物件間的依賴、相關程度,如果在A類內去new B,B類內又去new C,彼此相互直接依賴,這樣類別之間相互呼叫令彼此有所牽連,便是耦合(coupling),物件關係越緊密,耦合度越高,耦合度高的程式碼,一旦有任何變動,容易發生<strong>連鎖反應,牽一髮動全身</strong>,因此龐大的軟體更應該考慮<strong>低耦合高內聚</strong>的設計方式。</li>
</ul>
<p>而DIP, IoC和DI可以<strong>讓程式之間解耦</strong>,提高程式的可維護性。<br>基本上「可維護性」和「寬鬆耦合」就是我們學習DIP, IoC和DI的原因</p>
<p>下面我會一一介紹DIP, IoC以及DI,最後附上實際應用,即最後組合出來的結果。</p>
<h2 id="DIP"><a href="#DIP" class="headerlink" title="DIP"></a>DIP</h2><blockquote>
<p><strong>DIP - Dependency Inversion Principle(依賴倒轉原則)</strong></p>
<ul>
<li>一種原則、思想</li>
<li>高層次的模組不應該依賴低層的模組,低層次的模組也不應該依賴高層次的模組<br><strong>兩者都應該依賴抽象</strong></li>
</ul>
</blockquote>
<p>什麼意思呢?讓我們看看下面的範例:</p>
<h3 id="簡單範例"><a href="#簡單範例" class="headerlink" title="簡單範例"></a>簡單範例</h3><figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">Database</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Connect</span>()</span> { <span class="comment">/* database connect logic */</span> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Disconnect</span>()</span> { <span class="comment">/* database disconnect logic */</span> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">SaveData</span>(<span class="params"><span class="built_in">string</span> data</span>)</span> { <span class="comment">/* database save data logic */</span> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">DataAccess</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> Database _database = <span class="keyword">new</span> Database();</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">SaveData</span>(<span class="params"><span class="built_in">string</span> data</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _database.Connect();</span><br><span class="line"> _database.SaveData(data);</span><br><span class="line"> _database.Disconnect();</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>上面的程式碼違反了DIP,因為DataAccess<strong>直接依賴</strong>了’Database’類別,如果Database做了任何變動,DataAccess也需要跟著變動,所有有用到關於Database的class也都需要跟著變動,因此,’DataAccess’ class應該依賴抽象的Interface,而不是具體的實作。</p>
<hr>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">IDatabase</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Connect</span>()</span>;</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Disconnect</span>()</span>;</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">SaveData</span>(<span class="params"><span class="built_in">string</span> data</span>)</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">SqlServerDatabase</span> : <span class="title">IDatabase</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Connect</span>()</span> { <span class="comment">/* SQL Server database connect logic */</span> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Disconnect</span>()</span> { <span class="comment">/* SQL Server database disconnect logic */</span> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">SaveData</span>(<span class="params"><span class="built_in">string</span> data</span>)</span> { <span class="comment">/* SQL Server database save data logic */</span> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">OracleDatabase</span> : <span class="title">IDatabase</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Connect</span>()</span> { <span class="comment">/* Oracle database connect logic */</span> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Disconnect</span>()</span> { <span class="comment">/* Oracle database disconnect logic */</span> }</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">SaveData</span>(<span class="params"><span class="built_in">string</span> data</span>)</span> { <span class="comment">/* Oracle database save data logic */</span> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">DataAccess</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> IDatabase _database;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">DataAccess</span>()</span></span><br><span class="line"> {</span><br><span class="line"> _database = <span class="keyword">new</span> SqlServerDatabase();</span><br><span class="line"> <span class="comment">//_database = new OracleDatabase(); 在這裡抽換</span></span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">SaveData</span>(<span class="params"><span class="built_in">string</span> data</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _database.Connect();</span><br><span class="line"> _database.SaveData(data);</span><br><span class="line"> _database.Disconnect();</span><br><span class="line"> }</span><br><span class="line">} </span><br></pre></td></tr></table></figure>
<p>由上面的範例可知,我們定義了一個IDatabase,規範所有資料庫應該有的action,讓不同的實體Database去實作,而Program本身(DataAccess),只需要去依賴、使用IDatabase,這樣一來就算未來我從SQL-Server DB遷移到Oracle DB,我<strong>只需要抽換IDatabase的實作實體</strong>(_database指標指向的實際記憶體),DataAccess內部使用到IDatabase的程式碼一行都不用更改。</p>
<p>但是,可以發現到,雖然透過依賴倒轉,可以改變為依賴抽象,<strong>但程式(DataAccess)本身還是需要new 出instance</strong>,也就是說程式本身(呼叫者),對於依賴的控制流程具有主導權,這時就需要<strong>控制反轉</strong>。</p>
<h2 id="IoC"><a href="#IoC" class="headerlink" title="IoC"></a>IoC</h2><blockquote>
<ul>
<li>把對於某個物件的<strong>控制權</strong>移轉給<strong>第三方容器 (IoC Container)</strong></li>
</ul>
</blockquote>
<p>IoC 是一種設計原則或思想,它建議我們反轉物件導向設計中的各種控制,以達到各個類別間的解耦。這裡的 <strong>“控制”指的是除了一個類別本身的職責之外的其它所有工作</strong>,如整個軟體的流程控制,物件的依賴或創建等等。</p>
<blockquote>
<ul>
<li>其實意思就是,一個類別本身除了<strong>本身的職責外,不應該擁有太多其他的工作 (SRP)</strong></li>
<li>所以建議將這些對於物件的<strong>控制權(創建、實作實體抽換等等)<strong>,交給</strong>第三方容器</strong> (Framework or Library)。</li>
<li>獲取資源的行為由”主動”轉為”被動”</li>
<li><strong>程式(Application) 依賴物件的「控制流程 (Control Flow)」,由「主動」變成「被動」。就是「控制反轉」</strong></li>
</ul>
</blockquote>
<p>下面這兩張圖簡單解釋了使用IoC後的依賴關係</p>
<ul>
<li><p>這是還沒使用IoC前,我們的應用程式直接依賴於實體類別<br><img data-src="/images/posts/DI-IoC/IoC1.png"></img></p>
</li>
<li><p>這個則是使用IoC後,透過IoC Container,將依賴實體注入至程式中,程式由原來主動的依賴變成被動的接收<br><img data-src="/images/posts/DI-IoC/IoC2.png"></img></p>
</li>
</ul>
<p>好萊塢原則也很貼切的說明了控制反轉的情境</p>
<blockquote>
<p>Don’t call me, I’ll call you.</p>
</blockquote>
<h3 id="IoC-Container"><a href="#IoC-Container" class="headerlink" title="IoC Container"></a>IoC Container</h3><p>廣義上來說, IoC 容器,就是有進行「依賴注入」的地方,<br>你隨便寫一個類別,透過它將所需元件注入給高階模組,便可說是容器。<br>但現在所說的容器通常泛指那些<strong>強大的IoC框架所提供的容器</strong>。</p>
<p>各位可以把IoC容器想像成是儲存一堆使用者<strong>註冊的依賴實體</strong>,IoC Container透過這些使用者註冊的資訊,知道程式需要這個instance並賦予給他,達到不需要修改高階模組的目的。<br>程式在<strong>執行的期間(Runtime)<strong>,需要依賴物件的實體,需要透過IoC Container注入給程式,使用的是</strong>反射原理(Reflection)</strong>-也就是透過程式碼的中間編譯檔,去讀取程式碼內部的資訊。</p>
<p>下面這兩張圖則解釋了IoC如何利用IoC Container做到控制反轉的示意圖</p>
<ul>
<li><p>這個是沒有使用IoC框架時,高階模組主動的去建立所需要的低階模組 (資源)<br><img data-src="/images/posts/DI-IoC/IoC4.png"></img></p>
</li>
<li><p>這個則是使用了IoC框架,透過 <strong>「註冊 (Register)」</strong> 所需要的模組進IoC Container,藉由容器<strong>主動注入依賴實體</strong>進入高階模組<br><img data-src="/images/posts/DI-IoC/IoC3.png"></img></p>
</li>
</ul>
<h3 id="簡單範例-1"><a href="#簡單範例-1" class="headerlink" title="簡單範例"></a>簡單範例</h3><p>這邊提供的簡單範例中,IoC Container用簡單的方式實作,實際上這些工作會交給第三方套件或框架完成,這邊使用簡單的方式實作給大家理解</p>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">ILogger</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Log</span>(<span class="params"><span class="built_in">string</span> message</span>)</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">ConsoleLogger</span> : <span class="title">ILogger</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Log</span>(<span class="params"><span class="built_in">string</span> message</span>)</span></span><br><span class="line"> {</span><br><span class="line"> Console.WriteLine(<span class="string">$"Log: <span class="subst">{message}</span>"</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">UserService</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">readonly</span> ILogger _logger;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">UserService</span>(<span class="params">ILogger logger</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _logger = logger;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">CreateUser</span>(<span class="params"><span class="built_in">string</span> username, <span class="built_in">string</span> password</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _logger.Log(<span class="string">$"Creating user <span class="subst">{username}</span>"</span>);</span><br><span class="line"> <span class="comment">// Implementation to create a user</span></span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title">Program</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">Main</span>(<span class="params"><span class="built_in">string</span>[] args</span>)</span></span><br><span class="line"> {</span><br><span class="line"> ILogger logger = <span class="keyword">new</span> ConsoleLogger();</span><br><span class="line"> UserService userService = <span class="keyword">new</span> UserService(logger);</span><br><span class="line"> userService.CreateUser(<span class="string">"johndoe"</span>, <span class="string">"secret"</span>);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<p>在這裡,<strong>Program就是我們的IoC Container</strong>,UserService依賴ILogger抽象,而透過註冊在Program中的資訊,讓<strong>Program主動將實際的依賴實體(ConsoleLogger物件),注入到UserService的建構元中</strong>,也就是建構元注入,後面講到DI會再提到。<br><strong>之後如果有需要新的Logger,只需要創建並實作ILogger,再透過Program注入給UserService,UserService內部程式碼一行都不用更改。</strong></p>
<h3 id="IoC與DIP的差別"><a href="#IoC與DIP的差別" class="headerlink" title="IoC與DIP的差別"></a>IoC與DIP的差別</h3><p>控制反轉(IoC)與依賴倒轉(DIP)兩者不相等!</p>
<blockquote>
<p>依賴倒轉,倒轉的是「依賴關係」<br>控制反轉,反轉的是程式依賴物件的「控制流程」</p>
</blockquote>
<h2 id="DI"><a href="#DI" class="headerlink" title="DI"></a>DI</h2><blockquote>
<p>將依賴通過<strong>注入</strong>的方式提供給需要的模組,是 IoC 與 DIP 的具體表現<br>把被依賴物件注入被動接收物件中</p>
</blockquote>
<blockquote>
<p><strong>程式或者開發者不必理會物件是如何產生、保持、至銷毀的生命週期</strong><br>在.NET的DI框架中,生命週期有三種,Transient、Scoped、Singleton,後面講解實作時會再談到。</p>
</blockquote>
<p>DI的背後思想主要是:</p>
<blockquote>
<ol>
<li>為了保證DIP,一個類別應該只依賴抽象</li>
<li>於是具體的實現必須透過某種方式”注入”到這個類別</li>
<li>那麼依據IoC原則,最好透過第三方容器來做到這件事</li>
</ol>
</blockquote>
<p>而DI又有主要的三種形式:</p>
<blockquote>
<ol>
<li>建構元注入 (Constructor Injection)</li>
<li>設值方法注入 (Setter Injection)</li>
<li>介面注入 (Interface Injection)</li>
</ol>
</blockquote>
<p>下面舉個簡單的範例</p>
<h3 id="簡單範例-2"><a href="#簡單範例-2" class="headerlink" title="簡單範例"></a>簡單範例</h3><ol>
<li>建構元注入 (Constructor Injection)<br>屬於最常見的注入方式,IoC Container將實體DI到呼叫者的建構元中,當呼叫者被new(建立)時,就會自動注入相關實體進入建構元。<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">ILogger</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Log</span>(<span class="params"><span class="built_in">string</span> message</span>)</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">Logger</span> : <span class="title">ILogger</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Log</span>(<span class="params"><span class="built_in">string</span> message</span>)</span></span><br><span class="line"> {</span><br><span class="line"> Console.WriteLine(<span class="string">"Log: "</span> + message);</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">UserService</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">readonly</span> ILogger _logger;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">UserService</span>(<span class="params">ILogger logger</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _logger = logger;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">AddUser</span>(<span class="params"><span class="built_in">string</span> userName</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _logger.Log(<span class="string">"User Added: "</span> + userName);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
在這裡,當有人去new UserService時,IoC Container就會自動注入當初註冊的實體進入UserService的建構元。</li>
</ol>
<hr>
<ol start="2">
<li><p>設值方法注入 (Setter Injection)<br>透過setter method注入實體,他允許我們在呼叫者實體被創建後(instantiated),才注入相關依賴實體。</p>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">UserService</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> ILogger _logger;</span><br><span class="line"></span><br><span class="line"> <span class="keyword">public</span> ILogger Logger</span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">set</span> { _logger = <span class="keyword">value</span>; }</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">AddUser</span>(<span class="params"><span class="built_in">string</span> userName</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _logger.Log(<span class="string">"User Added: "</span> + userName);</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
</li>
<li><p>介面注入 (Interface Injection)<br>依賴透過Interface注入近實例中,這個Interface必須定義一個方法來注入依賴,再藉由實例去實作此介面,來實現具體的DI</p>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">IUserService</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">AddUser</span>(<span class="params"><span class="built_in">string</span> userName</span>)</span>;</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">SetLogger</span>(<span class="params">ILogger logger</span>)</span>; <span class="comment">// 定義注入依賴的方法</span></span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">UserService</span> : <span class="title">IUserService</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> ILogger _logger;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">AddUser</span>(<span class="params"><span class="built_in">string</span> userName</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _logger.Log(<span class="string">"User Added: "</span> + userName);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">SetLogger</span>(<span class="params">ILogger logger</span>) <span class="comment">// 實際注入依賴</span></span></span><br><span class="line"> {</span><br><span class="line"> _logger = logger;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure></li>
</ol>
<h2 id="DIP、IoC與DI的結合-實際應用"><a href="#DIP、IoC與DI的結合-實際應用" class="headerlink" title="DIP、IoC與DI的結合 - 實際應用"></a>DIP、IoC與DI的結合 - 實際應用</h2><p>為大家總結一下上面講的各種名詞,用這張圖簡單概括<br>先讓大家釐清,DIP, IoC, DI, IoC Container之間的關係<br><img data-src="/images/posts/DI-IoC/sumup.png"></img></p>
<h3 id="生活範例"><a href="#生活範例" class="headerlink" title="生活範例"></a>生活範例</h3><p>讓我們用在「餐廳煮東西」來舉例</p>
<blockquote>
<p>DIP: High-level modules should not depend on low-level modules. Both should depend on abstractions.</p>
</blockquote>
<p>在我們的例子中,<strong>廚師就是高階模組</strong>,而<strong>食材是低階模組</strong>,<ins>廚師不應該依賴於特定的食材</ins>,而是應該依賴於可以<ins>用來煮各種餐點的食材的抽象概念</ins>。</p>
<hr>
<blockquote>
<p>Inversion of Control (IoC): The control of the flow of a program is inverted.</p>
</blockquote>
<p>在我們例子中,顧客點餐,廚師備餐,對於餐點準備的控制流程,<ins>顧客不控制備餐的流程,而是接受最終完成的餐點</ins>。</p>
<hr>
<blockquote>
<p>IoC Container: A container that manages and controls the creation and life cycle of objects, and also injects their dependencies.</p>
</blockquote>
<p>在我們的例子中,可以把廚房想像成是IoC容器,他主管了各個食材、廚房用具的生命週期,並確保能夠提供廚師需要的食材或工具。</p>
<hr>
<blockquote>
<p>Dependency Injection (DI): A technique for achieving IoC, where the objects are given their dependencies instead of creating them themselves.</p>
</blockquote>
<p>在我們的例子中,廚師是被提供食材的人(由廚房提供),而不是自己去尋找食材。</p>
<h3 id="結合舉例"><a href="#結合舉例" class="headerlink" title="結合舉例"></a>結合舉例</h3><p>接下來我們接續上面的例子,透過程式的方式來講解上面的所有概念(DIP, IoC, IoC Container, DI)</p>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">IChef</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Cook</span>()</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">Chef</span> : <span class="title">IChef</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">readonly</span> IIngredients _ingredients;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Chef</span>(<span class="params">IIngredients ingredients</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _ingredients = ingredients;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Cook</span>()</span></span><br><span class="line"> {</span><br><span class="line"> Console.WriteLine(<span class="string">"Cooking with "</span> + _ingredients.GetIngredients());</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">IIngredients</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="built_in">string</span> <span class="title">GetIngredients</span>()</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">Ingredients</span> : <span class="title">IIngredients</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="built_in">string</span> <span class="title">GetIngredients</span>()</span></span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">return</span> <span class="string">"Tomatoes, Onions, Garlic, and Spices"</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title">Kitchen</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">static</span> IChef _chef;</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">static</span> IIngredients _ingredients;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">static</span> <span class="title">Kitchen</span>()</span></span><br><span class="line"> {</span><br><span class="line"> _ingredients = <span class="keyword">new</span> Ingredients();</span><br><span class="line"> _chef = <span class="keyword">new</span> Chef(_ingredients);</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">static</span> IChef <span class="title">GetChef</span>()</span></span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">return</span> _chef;</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">class</span> <span class="title">Program</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">static</span> <span class="keyword">void</span> <span class="title">Main</span>(<span class="params"><span class="built_in">string</span>[] args</span>)</span></span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">var</span> chef = Kitchen.GetChef();</span><br><span class="line"> chef.Cook();</span><br><span class="line"> Console.ReadLine();</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<ul>
<li>在上面的例子中,’IChef’介面和’Chef’類別遵守了<strong>DIP</strong>,因為他們依賴於抽象的’IIngredients’,而不是特定的食材實體。</li>
<li>而’Chef’類別透過 <strong>「建構元注入」</strong> 的方式,給予’IIngredients’的依賴,而不是自己主動直接創建一個實體。</li>
<li>‘Kitchen’ class則扮演IoC Container的角色,他管理了’Chef’和’Ingredients’的創建與生命週期,並注入’Ingredients’實體進入’Chef’的建構元中。</li>
<li>而Main method可以當成是我們的Application,透過’Kitchen’獲取’Chef’實體,並呼叫Cook() method。</li>
</ul>
<h2 id="NET-C-實現"><a href="#NET-C-實現" class="headerlink" title=".NET C#實現"></a>.NET C#實現</h2><p>下面我將簡單使用.NET預設的DI框架(Microsoft.Extensions.DependencyInjection)來實現註冊依賴實體,與依賴注入。<br>其中還有一些進階的用法,像是把<ins>註冊相關的邏輯抽提出來寫成擴充方法</ins>,還有使用<ins>Attribute與反射來解決建構元注入太多的問題</ins>,但在這篇教學中先使用最簡單的方法實作,為的是讓各位先理解基本的概念與用法,進階用法會在之後的文章詳細介紹。</p>
<h3 id="DI生命週期與註冊"><a href="#DI生命週期與註冊" class="headerlink" title="DI生命週期與註冊"></a>DI生命週期與註冊</h3><p>在.NET的預設DI框架中,註冊實體物件時可以指定其生命週期,分為三種(重要!)</p>
<blockquote>
<ol>
<li>Transient (一次性) : <strong>每次注入時</strong>,都建立一個新的實體。</li>
<li>Scoped (作用域) : <strong>每次的Request</strong>,都建立一個新的實體,同一個Request下,重複利用同一個實體 (這裡的Request 常指Http Request)。</li>
<li>Singleton (單例) : 使用單例模式(Singleton Pattern),<strong>從程式開始到結束</strong>,只建立一個實體,每次都重複利用同一個,直到程式被終止。</li>
</ol>
</blockquote>
<h3 id="實作"><a href="#實作" class="headerlink" title="實作"></a>實作</h3><p>讓我們繼續以上面餐廳的例子實作<br>首先定義好相關的class與Interface,其中使用DIP我這邊就不特別提了</p>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">IChef</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">void</span> <span class="title">Cook</span>()</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">Chef</span> : <span class="title">IChef</span></span><br><span class="line">{</span><br><span class="line"> <span class="keyword">private</span> <span class="keyword">readonly</span> IIngredients _ingredients;</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="title">Chef</span>(<span class="params">IIngredients ingredients</span>)</span></span><br><span class="line"> {</span><br><span class="line"> _ingredients = ingredients;</span><br><span class="line"> }</span><br><span class="line"></span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="keyword">void</span> <span class="title">Cook</span>()</span></span><br><span class="line"> {</span><br><span class="line"> Console.WriteLine(<span class="string">"Cooking with "</span> + _ingredients.GetIngredients());</span><br><span class="line"> }</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">interface</span> <span class="title">IIngredients</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="built_in">string</span> <span class="title">GetIngredients</span>()</span>;</span><br><span class="line">}</span><br><span class="line"></span><br><span class="line"><span class="keyword">public</span> <span class="keyword">class</span> <span class="title">Ingredients</span> : <span class="title">IIngredients</span></span><br><span class="line">{</span><br><span class="line"> <span class="function"><span class="keyword">public</span> <span class="built_in">string</span> <span class="title">GetIngredients</span>()</span></span><br><span class="line"> {</span><br><span class="line"> <span class="keyword">return</span> <span class="string">"Tomatoes, Onions, Garlic, and Spices"</span>;</span><br><span class="line"> }</span><br><span class="line">}</span><br></pre></td></tr></table></figure>
<hr>
<p>接著就是我們主要註冊DI實體的地方,在Program.cs的檔案中,我這邊只寫出關鍵的部分。</p>
<figure class="highlight c#"><table><tr><td class="code"><pre><span class="line"><span class="comment">// ...</span></span><br><span class="line"></span><br><span class="line"><span class="keyword">using</span> Microsoft.Extensions.DependencyInjection;</span><br><span class="line"></span><br><span class="line">builder.Services.AddScoped<IChef, Chef>();</span><br><span class="line">builder.Services.AddScoped<IIngredients, Ingredients>();</span><br><span class="line"></span><br><span class="line"><span class="comment">// ...</span></span><br></pre></td></tr></table></figure>
<p>解釋一下,這邊的builder.Services屬於IServiceCollection類,一但呼叫’AddScoped<IIngredients, Ingredients>()’方法,IoC Container就知道要建立一個Ingredients實體,去對應到程式中的三種DI形式之一,並注入實體讓IIngredients指向,在我們的例子中是「建構元注入」,因此,DI框架透過「反射原理」,知道Chef class中的建構元有IIngredients,透過之前註冊的資訊,IoC Continaer主動建立一個Ingredients實體,並注入到Chef class的建構元中。</p>
<p><strong>整理一下上面的流程</strong></p>
<blockquote>
<ol>
<li>利用builder.Services.AddScoped<IIngredients, Ingredients>()註冊依賴資訊,以及生命週期給IoC Container</li>
<li>IoC Container利用反射原理,得知Chef class中的建構元有IIngredients,並與之前註冊的依賴資訊做對應</li>
<li>IoC Container建立一個Ingredients實體,並注入進Chef class的建構元中,讓建構元的IIngredients指標指向</li>
<li>在建構元中,透過建構元參數指向的Ingredients實體,賦值給Chef class的內部欄位_ingredients</li>
</ol>
</blockquote>
<h2 id="結語"><a href="#結語" class="headerlink" title="結語"></a>結語</h2><p>這篇文章我十分詳細的介紹了DIP, IoC, DI的概念與實作,這些概念對於軟體開發來說非常重要,但大家也要清楚理解<strong>這些思想要解決的問題,以及使用它們的好處,清楚自己在做什麼,而不是為設計而設計</strong>,其實OOP很多的pattern,都會有其好處以及trade off,因此了解為何使用就顯得非常重要。</p>
<p>P.S. : </p>
<ul>
<li>我自己很喜歡使用指標和記憶體的概念來理解物件與其值,這對於理解Pass by value/reference和Stack, Heap的記憶體分配非常有用,十分建議大家使用。</li>
<li>Microsoft.Extensions.DependencyInjection這個namespace,利用IServiceProvider來管理我們程式中中所註冊的依賴,我們也可以透過注入這個IServiceProvider來取得實體,這在之後的<strong>DI進化的文章會有著關鍵作用。</strong></li>
</ul>
<style>
img{
width: 70%;
margin: 15px auto;
}
</style>]]></content>
<categories>
<category>OOP</category>
<category>SOLID</category>
</categories>
<tags>
<tag>OOP</tag>
<tag>SOLID</tag>
<tag>.NET C#</tag>
</tags>
</entry>
<entry>
<title>CC-沈浸式線上逛街APP-系統與技術介紹</title>
<url>/posts/2012237495/</url>
<content><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>接續上一篇的文章,這篇文章會著重介紹這個系統<br>以下是我在YouTube上對這個專案的介紹與Demo:</p>
<iframe
src="https://www.youtube.com/embed/aMGnyI2Xe04">
</iframe>
<span class="exturl" data-url="aHR0cHM6Ly9kcml2ZS5nb29nbGUuY29tL2ZpbGUvZC8xTmVDRGk0dWFmQ3UtcF9iVDE2cERLSm50SXluN3FzeUYvdmlldz91c3A9c2hhcmluZw==">Proposal Link<i class="fa fa-external-link-alt"></i></span>
<p>這篇文章的內容會與影片很像,主要分為<strong>動機與介紹、重點功能簡介、系統架構與技術</strong>的順序介紹<br>系統功能Demo, 還請觀看影片介紹</p>
<span id="more"></span>
<p><img data-src="/images/posts/CC-project-demo/1.png"
style="width: 90%; margin: 15px auto;"></p>
<h2 id="動機與介紹"><a href="#動機與介紹" class="headerlink" title="動機與介紹"></a>動機與介紹</h2><img data-src="/images/posts/CC-project-demo/2.png" style="width: 70%; margin: 15px auto;">
首先我們主要的TA之一是線上逛街族, 針對這個族群我們統整出簡單的兩點:第一, 他們在線上瀏覽或滑商品時, 通常沒有特定的消費目的, 只是想要滑滑看看, 無目的的瀏覽行為, 第二, 這種行為主要是以消遣、獲得樂趣為目的, 而不一定是真的想要買商品
<hr>
<img data-src="/images/posts/CC-project-demo/3.png" style="width: 70%; margin: 15px auto;">
根據上面的族群設定, 我們的專案主要會focus在解決或滿足下面的問題, 第一, 我們是針對無目的性的消費情境底下的需求, 第二, 我們希望帶給使用者沈浸式的體驗, 所謂的沈浸式體驗, 就是在舒服且不受干擾的狀態下, 接收自己有興趣的資訊, 以此讓使用者能在我們這個APP上做消遣、殺時間的動作, 這麼做的目的很大一部分是為了捕捉使用者的微時刻, 這些微時刻的數據可以代表使用者不同時刻的喜好或者決策, 第四,我們這個APP也會瞄準直播的市場與直播平台合作,而這些直播主的煩惱,便是下播後無法持續創造收入,因此我們除了希望能延長商品的生命週期外,也可以透過捕捉使用者行為來提供直播主缺乏的Consumer Insight
<hr>
<img data-src="/images/posts/CC-project-demo/4.png" style="width: 70%; margin: 15px auto;">
為了達成以上的目的,我們主要有這幾項手段,第一, 研究推薦演算法與捕捉使用者行為,透過使用者不同時刻的行為數據來作為推薦的依據,而非傳統的電商使用歷史紀錄來做推薦,以次讓使用者能夠在每時每刻接收有興趣的資訊,達到沈浸式的體驗。第二,透過簡潔的介面,降低使用者瀏覽時的壓力,讓使用者更願意停留在APP上,第三,透過分享貼文、追蹤、評論等等的社群機制,讓使用者能夠了解親朋好友們有興趣或者好評的商品,也能夠享受社群的樂趣,增加APP的黏著度,最後,透過APP上種種捕捉使用者行為數據的機制,可以將這些數據提供給與我們合作的第三方廠商,讓廠商更了解使用者的喜好
<h2 id="重點功能簡介"><a href="#重點功能簡介" class="headerlink" title="重點功能簡介"></a>重點功能簡介</h2><img data-src="/images/posts/CC-project-demo/5.png" style="width: 70%; margin: 15px auto;">
再來介紹一些重點功能的簡短敘述,首先第一個是我們的商品貼文,也是商品的主體,貼文特色主要以滿板設計與資訊收合來達到雜訊最小化的目的,另外後面Demo也會呈現推薦商品的形式,與現在短影音的方式很像,透過推薦與給人耳目一新的商品,帶給使用者殺時間的樂趣,另外,透過商品貼文,也可以成為直播主下播後銷售的利器,與一般電商不同的是,我們主動推薦商家商品,而且是透過使用者當前的行為喜好,而不是被動等待使用者搜尋或是透過購買紀錄來做推薦
<hr>
<img data-src="/images/posts/CC-project-demo/6.png" style="width: 70%; margin: 15px auto;">
再來是我們的主頁面,透過滑卡的機制增加互動感,也可以同時捕捉使用者行為,以做到即時推薦,這麼做除了可以增加沈浸體驗,也可以更廣泛的推薦商品,提高曝光程度
<hr>
<img data-src="/images/posts/CC-project-demo/7.png" style="width: 70%; margin: 15px auto;">
主頁面與探索頁面都可以透過使用者的滑卡行為、停留時間與點擊率,來彼此優化推薦內容,做到雙向推薦的功能
<hr>
<img data-src="/images/posts/CC-project-demo/8.png" style="width: 70%; margin: 15px auto;">
探索頁面除了會推薦使用者感興趣的商品,也會隨機推薦相關商品,增加新奇度,其中最大塊的貼文則為推薦分數最高的商品,透過矩陣式樹狀結構,帶給使用者簡潔與大量瀏覽的感受
<hr>
<img data-src="/images/posts/CC-project-demo/9.png" style="width: 70%; margin: 15px auto;">
CC提供商城、買家、直播主追蹤的機制,加上個人化自己的動態牆與留言區,增加社群互動性,以社群的力量增加APP的黏著度
<hr>
<img data-src="/images/posts/CC-project-demo/10.png" style="width: 70%; margin: 15px auto;">
分享貼文除了可以增加互動的樂趣,也是提供消費者洞見的大平台
最重要的是,可以藉由社群的力量,一傳十十傳百,來行銷商品
<h2 id="系統架構與技術"><a href="#系統架構與技術" class="headerlink" title="系統架構與技術"></a>系統架構與技術</h2><p>先附上系統架構圖:<br><img data-src="/images/posts/CC-project-demo/CC_structure.jpeg" style="width: 70%; margin: 15px auto;"></p>
<p>首先前端主要由Angular寫成,搭配<strong>模組化與物件導向設計,增加程式的可維護性</strong>,最後透過PWA包裝成跨平台的APP<br>藉由Angular能夠模組化與元件化的特性,我們將各個功能切分成不同模組,提升程式低耦合高內聚的特性,另外,Token驗證、路由保護、API攔截器、資料格式化Pipeline等等,也都抽離出來實作,其中,核心的商業邏輯切分在我們的Service模組,與畫面邏輯分離,再藉由依賴注入讓各模組使用,這樣的設計,讓我們的專案在未來更改與維護擴充時非常省時與省力。</p>
<p>另外一個主要的功能是推薦演算法,我們透過標籤(label)與雅卡爾相似度(Jaccard Similarity),做了一個Example Based的推薦引擎,主要是用MongoDB pipeline與後端演算構築而成。</p>
<p>這裡也是我第一次使用前端框架,來寫一個比較完整的專案,由於我是後端出身,所以選擇一個自己寫起來最舒服的Angular框架,有種在寫後端的親切感XD。</p>
<p>後端主要由NodeJS寫成,資料庫則為MongoDB,一樣是將各個功能進行模組化的拆分,與資料庫索引的優化,以及推薦演算法的實現。</p>
<p>另外WebServer的部分則是由IIS代理,加上GitLab CI/CD & GitLab Runner做到自動化整合與部屬的功能</p>
<style>
.video-container
{
padding-top: 60% !important;
}
</style>]]></content>
<categories>
<category>專案</category>
<category>side project</category>
</categories>
<tags>
<tag>專案</tag>
<tag>side project</tag>
</tags>
</entry>
<entry>
<title>大一大二參加畢業專案與程式比賽心得分享</title>
<url>/posts/534426495/</url>
<content><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>小弟我在剛上大一沒多久,就私mail系上的程式老師,詢問是否可以參與更多的專案或研究,也一併把我的履歷給教授看。<br>沒想到教授大方地給我幾條路選,可以跟教授做研究,出去實習,或是和學長們做專案。<br>大一的我想要慢慢累積實力,於是決定先<strong>和學長們做專案</strong>,去比賽,累積經驗。<br>在歷經<strong>國科會專案、資訊競賽、畢業專案競賽等等</strong>後,我非常感謝教授和學長們給我這個機會,讓我可以擁有這些寶貴的經驗!<br>這篇文章主要著重在心得與分享,比較技術面的內容會在下篇文章詳細介紹。</p>
<span id="more"></span>
<h2 id="比賽心得與定位"><a href="#比賽心得與定位" class="headerlink" title="比賽心得與定位"></a>比賽心得與定位</h2><p>在這個Team內,我主要負責程式開發,包括<strong>前端、伺服器架設與管理和一部分的後端</strong>。<br>前前後後比了<strong>國科會大專生計畫、智慧創新、資訊服務和系上的畢業專案競賽</strong>,大約歷時1年多的時間,在這段時間裡,我們歷經<strong>發想、開發、維護、寫技術文件、UI/UX設計到演算法研究等等</strong>,這也是我大一結束為止做過相對完整的專案。</p>
<hr>
<p>這是我們智慧創新和資訊服務競賽時的照片,我們直接把成品給評審滑XD,因為以我們的系統來說相對完整,也不怕噴Bug,雖然都只進到決賽,但我也更加知道這類型比賽的準備方向(基本上就是要對到當前流行的主題),<del>所以明年就準備AI吧</del><br><img data-src="/images/posts/CC-experience/智慧創新.JPG"
style="width: 70%; margin: 15px auto;"><br><img data-src="/images/posts/CC-experience/資服.jpg"
style="width: 70%; margin: 15px auto;"></p>
<hr>
<p>這是系上畢業專案的比賽,非常榮幸拿到第二名!當天還直接被其他資訊公司的主管遞名片,受寵若驚。<br><img data-src="/images/posts/CC-experience/畢業專案.JPG"
style="width: 70%; margin: 15px auto;"><br><img data-src="/images/posts/CC-experience/畢業專案得獎.JPG"
style="width: 70%; margin: 15px auto;"></p>
<h2 id="結語"><a href="#結語" class="headerlink" title="結語"></a>結語</h2><p>總的來說,這是我<strong>參與多人協作、實作一個完整專案的寶貴經驗</strong>,這也成為我日後開發其他專案的養分(Google學生開發者社群, 資訊競賽, etc.),除了<strong>技術面的大幅成長,專案管理、人際互動、時間管理</strong>也都是成長的一部分,主動去尋找機會,得到的會比你想像的多,雖然有點辛苦就是了,但我覺得很值得!所以時間管理真的超重要,這也讓我在工程師這條路上變得更主動,相信只要努力,自己絕對值得更好的!</p>
<p>比較技術面的內容會在之後的文章提及</p>
]]></content>
<categories>
<category>心得</category>
<category>比賽</category>
</categories>
<tags>
<tag>心得</tag>
<tag>比賽</tag>
</tags>
</entry>
<entry>
<title>政大Google學生開發者社群-政大通NCCUPass-心得 (第二學期)</title>
<url>/posts/3042436019/</url>
<content><![CDATA[<h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><p>繼上篇<a href="https://mao-code.github.io/posts/277323671/#more">GDSC的心得文</a>,這篇我會介紹我們下學期的開發經歷、得獎、專案簡介以及未來這個專案的走向,屬於一個小小的紀錄心得文!</p>
<span id="more"></span>
<h1 id="專案"><a href="#專案" class="headerlink" title="專案"></a>專案</h1><h2 id="簡介"><a href="#簡介" class="headerlink" title="簡介"></a>簡介</h2><p>政大通 - NCCUPass 團隊熱切懷抱著重塑校園體驗的夢想,點燃學生的熱情和創造力,以共同打造一個充滿活力、連結力和創新力的校園生活。經由深入校園的人際交流與觀察,我們發現學生面對著種種生活上的不便與困難,然而目前鮮少有專門為學生打造的解決方案。學生間的線下互動與真實聯繫日益減少,導致難以找到同學幫忙或組團合作的機會。此外,午晚餐時的人潮湧入校園周邊餐廳,更帶來了無法即時點餐的困擾。這些現況讓我們看到一個獨特的契機,我們決心以「政大通 - NCCUPass 」 APP 的形式來化解這些困境。</p>
<p>「政大通 - NCCUPass 」不僅提供政大學生一個集社交、實用和創新於一身的平台,更是一個綜合性的應用程式,旨在加強他們的校園體驗。我們希望這個 APP 能成為解決校園生活難題的得力幫手!</p>
<p>在社團「政大 Google 學生開發者社群」的擁護下,我們匯聚各領域的精英,凝聚開放分享的心態,攜手共創這個改變學生生活的嶄新 APP !政大只是我們的第一塊版圖,我們的未來目標是影響全國的大學生,為他們帶來更加豐富、更加便利的校園體驗!</p>
<h2 id="目前功能簡介"><a href="#目前功能簡介" class="headerlink" title="目前功能簡介"></a>目前功能簡介</h2><h3 id="學生任務功能-增強校園互動體驗"><a href="#學生任務功能-增強校園互動體驗" class="headerlink" title="學生任務功能 - 增強校園互動體驗"></a>學生任務功能 - 增強校園互動體驗</h3><ul>
<li>動機發想<ul>
<li>缺乏線下互動與真實聯繫:<br>我們注意到越來越多的大學生在日常生活中依賴數位平台和社交媒體,而非與人面對面的互動。這種現象在校園中尤為明顯,比起在現實中進行真實的互動,學生更傾向於在虛擬世界中建立社交連結。我們對大學生進行訪談、問卷調查和社交媒體分析,收集了大量的數據和反饋。這些數據揭示了大學生內心深處對面對面互動的渴望,以及更真實的交流,更深刻的情感,卻在線下互動中遇到了種種挑戰與限制。基於這些資訊,我們更加確定缺乏線下互動和真實聯繫的問題造成大學生的困擾,這些數據與心靈共鳴,激發了我們的靈感。我們深信,缺乏線下互動和真實聯繫的問題,不僅是個人的困擾,更是一種社會現象,需要我們攜手改變。於是,我們投入心血,開發了一個獨特的解決方案——一個能在現實世界中促進人與人之間有意義連結的奇妙工具。</li>
</ul>
</li>
<li>功能敘述<ul>
<li>這個功能旨在促進學生之間的線下互動,並為他們提供一個自由而實用的平台,能以此完成各種有趣的任務。學生們可以發布任務,例如:幫忙帶宵夜、一起揪團出去玩等等。學生可以在我們的 APP 上自由發布任務,並尋找其他學生來接受挑戰,每一次的任務都是獨特的社交機會,讓學生們在校園中建立更多真實的聯繫和友誼。除此之外,這個功能還擁有相當多面的好處。首先,它鼓勵學生積極參與校園生活,擺脫線上世界的束縛。透過與其他學生一起完成任務,他們可以體驗到珍貴的團隊合作和互助精神。此外,如果同學對彼此的服務滿意,可以提供金錢或其他回饋,藉著小小賺外快的機會,也可以累積一定的金錢!相信這個制度能鼓勵學生發揮創意和努力工作,同時也建立相互尊重和價值交換的文化。</li>
</ul>
</li>
</ul>
<h3 id="預約外帶功能與午餐快選器---用餐省時無等待,專屬學生的美食提前預訂"><a href="#預約外帶功能與午餐快選器---用餐省時無等待,專屬學生的美食提前預訂" class="headerlink" title="預約外帶功能與午餐快選器 - 用餐省時無等待,專屬學生的美食提前預訂"></a>預約外帶功能與午餐快選器 - 用餐省時無等待,專屬學生的美食提前預訂</h3><ul>
<li><p>動機發想</p>
<ul>
<li><p>餐點供應不足:<br>尤其在午晚餐尖峰時段,餐廳經常供不應求,學生需要花費大量的時間排隊等待。</p>
</li>
<li><p>用餐時間緊湊:<br>大學生的生活節奏快,經常要在短時間內完成吃飯等基本活動,而在繁忙的用餐時間內排隊等待餐點往往會浪費寶貴的時間。</p>
</li>
<li><p>無法預估用餐需求:<br>在繁忙的用餐時間內,無法預估用餐需求,可能需要長時間等待,或因時間太短而做了不太理想的食物選擇。</p>
</li>
</ul>
</li>
<li><p>功能敘述</p>
<ul>
<li>這個功能將為大學生們帶來極速、便利、無等待的用餐體驗!我們與學校附近的店家攜手合作,確保學生們不必等待太久。現在,同學只需要提前預約外帶,下課後直接前往店家,餐點早已準備好,輕鬆取走即可!不再浪費時間排隊,享受自由自在的用餐時光!這個預約外帶功能不僅是大學生們用餐的時尚選擇,更是忙碌學習生活的得力助手。在繁忙的用餐尖峰時段,可以輕鬆預訂心儀的美食,避免為了取餐而耽誤其他重要事務。</li>
</ul>
</li>
</ul>
<h2 id="連結"><a href="#連結" class="headerlink" title="連結"></a>連結</h2><ul>
<li><span class="exturl" data-url="aHR0cHM6Ly9uY2N1cGFzcy5jb20v">官方網站<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cuaW5zdGFncmFtLmNvbS9uY2N1cGFzcy8=">IG粉專<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cubGlua2VkaW4uY29tL2NvbXBhbnkvbmNjdXBhc3M=">LinkedIn<i class="fa fa-external-link-alt"></i></span></li>
</ul>
<h1 id="得獎"><a href="#得獎" class="headerlink" title="得獎"></a>得獎</h1><p>NCCUPass在當天的期末發表會得到最佳人氣專案奬的殊榮!<br><img data-src="/images/posts/2023NCCUPass/GDSC_award.JPG"
style="width: 70%; margin: 15px auto;"></p>
<p><img data-src="/images/posts/2023NCCUPass/GDSC_award_on_stage.JPG"
style="width: 70%; margin: 15px auto;"></p>
<p>當天的期末發表會:<br><img data-src="/images/posts/2023NCCUPass/GDSC_final.JPG"
style="width: 70%; margin: 15px auto;"></p>
<h1 id="技術"><a href="#技術" class="headerlink" title="技術"></a>技術</h1><p>在經過一個學期後,我們也慢慢拓展技術範圍,未來也將因應需求機動地改變<br>在這邊就只放上系統架構圖,不贅述太多技術細節</p>
<h2 id="系統架構圖"><a href="#系統架構圖" class="headerlink" title="系統架構圖"></a>系統架構圖</h2><p><img data-src="/images/posts/2023NCCUPass/NCCUPass_structure.png"
style="width: 70%; margin: 15px auto;"></p>
<h1 id="未來"><a href="#未來" class="headerlink" title="未來"></a>未來</h1><p>政大是我們的第一塊版圖,我們的目標是全台灣的大學,目前先穩紮穩打在政大站穩腳步,在此期間會積極地與其他組織或社團合作、參與各項競賽以及參加創投機構的活動,我們的理念是讓大學生們彼此間的關係更加密切與真實,使他們的生活更加數位化與便利!</p>
]]></content>
<categories>
<category>專案</category>
<category>心得</category>
<category>GDSC</category>
<category>side project</category>
<category>NCCUPass</category>
</categories>
<tags>
<tag>專案</tag>
<tag>GDSC</tag>
<tag>NCCUPass</tag>
</tags>
</entry>
<entry>
<title>政大Google學生開發者社群-心得(第一學期)</title>
<url>/posts/277323671/</url>
<content><![CDATA[<h2 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h2><p>在剛上大二時,除了手邊和學長的專案,我也參加了政大的Google學生開發者社群。主要是希望透過提案,<strong>利用科技來解決問題</strong>,在這裡我也深深感受到Google的文化,那就是<strong>不會怕去嘗試、不怕去解決問題</strong>,我也認識在這個社群中的各個人才。在這個社群中,我除了擔任自己<strong>專案領導人</strong>的角色外,我也身兼<strong>後端技術長</strong>的任務,也是想要發揮我目前所學到的、最成熟的後端知識。除了技術面,如何管理一個團隊、一項專案,也是一門很大的課題。希望在未來可以繼續把自己的這項專案做到完善,解決真正的問題!</p>
<span id="more"></span>
<h2 id="定位與職位"><a href="#定位與職位" class="headerlink" title="定位與職位"></a>定位與職位</h2><ul>
<li><h3 id="專案定位"><a href="#專案定位" class="headerlink" title="專案定位"></a>專案定位</h3></li>
</ul>
<p>我這次的提案叫做「政大通NCCUPass」,以<strong>政大校園生活為出發點</strong>,想要做一款能夠解決政大學生生活上不便的APP,透過專案組的討論與調查,帶來<strong>更便利、聰明與數位化</strong>的校園生活,讓校園內的需求能夠即時被滿足,也讓這款APP融入政大學生的生活中,成為政大校園內不可或缺的一部分,因此,「政大通-NCCUPass」的提案就此誕生</p>
<hr>
<p>下面是我們第一次期末發表的海報和照片<br>海報內呈現的是我們規劃、也正在開發的功能<br><img data-src="/images/posts/GDSC-NCCUPass-experience-1/poster.jpg"
style="width: 70%; margin: 15px auto;"></p>
<hr>
<p>這個是期末發表當天的攤位介紹照片<br><img data-src="/images/posts/GDSC-NCCUPass-experience-1/img4.jpg"
style="width: 70%; margin: 15px auto;"></p>
<p><img data-src="/images/posts/GDSC-NCCUPass-experience-1/img2.JPG"
style="width: 70%; margin: 15px auto;"></p>
<p><img data-src="/images/posts/GDSC-NCCUPass-experience-1/img1.JPG"
style="width: 70%; margin: 15px auto;"></p>
<p>參與證書<br><img data-src="/images/posts/GDSC-NCCUPass-experience-1/GDSC_certificate.png"
style="width: 70%; margin: 15px auto;"></p>
<hr>
<ul>
<li><h3 id="職位"><a href="#職位" class="headerlink" title="職位"></a>職位</h3></li>
</ul>
<p>在這個專案中,我擔任<strong>Project Leader(專案主持人)<strong>,與</strong>Backend Tech-Lead(後端技術長)<strong>的身份,帶領總共</strong>11人的團隊</strong>,在我們這個團隊中,主要分成四組,前端、後端、UI/UX、文書,除了負責主要的後端技術,也要帶領後端的組員們提升實力、協調與規劃。以專案領導人的角度來看,除了需要<strong>規劃整個專案的進度與走向外,也要監督進度、協調各組、人際溝通等等</strong>,除了利用各種專案管理的工具外,也要撰寫各種文件與流程圖,老實說滿累的XD。但也是遇到一群願意跟隨我的隊友們,一起成長,真的非常得感謝他們🙏</p>
<h2 id="技術"><a href="#技術" class="headerlink" title="技術"></a>技術</h2><ul>
<li><h3 id="系統架構圖"><a href="#系統架構圖" class="headerlink" title="系統架構圖"></a>系統架構圖</h3></li>
</ul>
<p>下面是我們的系統架構圖,因為我是負責後端的部分,所以會著重畫後端的架構<br><img data-src="/images/posts/GDSC-NCCUPass-experience-1/NCCUPass-Structure.jpg"
style="width: 70%; margin: 15px auto;"></p>
<ul>
<li><h3 id="後端技術細節"><a href="#後端技術細節" class="headerlink" title="後端技術細節"></a>後端技術細節</h3></li>
</ul>
<p>根據上面的架構圖,我們的專案主要部署在Ubuntu的主機上,所有的服務利用Docker Compose統一建立,另外還有利用GitLab Runner做到自動化整合與自動化部屬的功能。後端程式是由.NET C#撰寫,這邊我使用的是Software Layer Architecture Pattern(軟體分層架構),結合各種Design Pattern再加上自己的一些變形,資料庫是使用MongoDB的Replica-Set,也有使用Redis做快取,照片和一些公開的檔案主要放在我們的File Server上。其他技術細節我就列在下面,想看的人可以參考一下XD</p>
<figure class="highlight plaintext"><table><tr><td class="code"><pre><span class="line">- Software Layer Architecture pattern (多層)</span><br><span class="line">- Repository pattern</span><br><span class="line">- Unit of Work pattern</span><br><span class="line">- Mediator pattern & CQRS</span><br><span class="line">- Password Salting and Encryption</span><br><span class="line">- JWT & RBAC</span><br><span class="line">- Automapper</span><br><span class="line">- Exception handler抽離</span><br><span class="line">- Dapper & EF combination</span><br><span class="line">- Redis (Cache)</span><br><span class="line">- Docker Networking</span><br><span class="line">- Docker Volume</span><br><span class="line">- Docker hub</span><br><span class="line">- appsettings 組態切換</span><br><span class="line">- MongoDB replica-set</span><br><span class="line"> - key-file (internal authentication)</span><br><span class="line">- Git 多人協作</span><br><span class="line">- Swagger / OpenAPI</span><br><span class="line">- Docker File Server</span><br><span class="line">- Docker mongoDB backup daily</span><br><span class="line">- Docker Compose</span><br><span class="line">- JMeter壓力測試</span><br><span class="line">- GC mode區別(Workstation, Server)</span><br><span class="line">- SignalR 雙向溝通</span><br><span class="line">- 測試 (K6 stress testing, )</span><br><span class="line">- 自動化發送Email (python selenium)</span><br><span class="line">- SSH with Linux server</span><br><span class="line">- FCM (to push device notification)</span><br><span class="line">- Linux server</span><br><span class="line">- Cloudflare domain and SSL/TLS</span><br><span class="line">- Nginx (on server and on docker)</span><br><span class="line"> - redirect (setting files)</span><br><span class="line"> - ssl setting(certificate, key)</span><br><span class="line">- Shell script自動備份資料庫</span><br></pre></td></tr></table></figure>
<p>之後也會一直新增,因為現在處於專案開發的初期。</p>
<h2 id="結語"><a href="#結語" class="headerlink" title="結語"></a>結語</h2><p>最後,這個專案雖然只是在發展初期,但我希望未來可以發展到我預想的樣子,也非常感謝一路願意跟隨我、幫助我的隊友們,單打獨鬥真的比不上團隊合作👍,也希望各位未來也可以繼續幫助我啦,現在打分享文可能還太早,但我就是想要趁學期末趕快記錄一下哈哈</p>
]]></content>
<categories>
<category>專案</category>
<category>心得</category>
<category>GDSC</category>
<category>side project</category>
<category>NCCUPass</category>
</categories>
<tags>
<tag>專案</tag>
<tag>GDSC</tag>
<tag>NCCUPass</tag>
</tags>
</entry>
<entry>
<title>Basic OOP 基礎物件導向</title>
<url>/posts/3787153742/</url>
<content><![CDATA[<h1 id="前言"><a href="#前言" class="headerlink" title="前言"></a>前言</h1><p>物件導向(Object-Oriented Programming, OOP)是一種程式設計範式,強調使用包含數據(屬性)和方法(功能)的物件來設計和構建應用程序。提高軟體的重用性、靈活性和擴充性。</p>
<p>而對於物件導向,最基礎需要知道以下:</p>
<blockquote>
<p><strong>一個抽象</strong><br><strong>兩個目的</strong><br><strong>三個特性</strong><br><strong>五個原則</strong></p>
</blockquote>
<span id="more"></span>
<h1 id="一個抽象"><a href="#一個抽象" class="headerlink" title="一個抽象"></a>一個抽象</h1><h2 id="抽象-Abstraction"><a href="#抽象-Abstraction" class="headerlink" title="抽象 (Abstraction)"></a>抽象 (Abstraction)</h2><p>在OOP的背景下,抽象是指<strong>隱藏複雜的實作細節</strong>,僅展示物件的必要特性的能力。這簡化了與物件的互動,使編程更直觀、更高效。</p>
<p>例如想要使用交通工具的物件,只需要知道交通工具提供的介面,而不需要知道具體的交通工具是什麼,以及其實作細節<br><img data-src="/images/posts/OOP-basic/abstraction.png"
style="width: 70%; margin: 15px auto;"></p>
<h1 id="兩個目的"><a href="#兩個目的" class="headerlink" title="兩個目的"></a>兩個目的</h1><h2 id="低耦合-Low-Coupling"><a href="#低耦合-Low-Coupling" class="headerlink" title="低耦合 (Low Coupling)"></a>低耦合 (Low Coupling)</h2><p>低耦合是指程式中<strong>不同的類別或模組之間應該有盡可能少的依賴關係</strong>。這使得<strong>一個類別或模組的變更不太可能影響到其他的類別或模組</strong>,從而使得程式更容易維護和擴展。</p>
<h2 id="高內聚(High-Cohesion)"><a href="#高內聚(High-Cohesion)" class="headerlink" title="高內聚(High Cohesion)"></a>高內聚(High Cohesion)</h2><p>高內聚是指一個類別或模組應該只<strong>專注於完成一項特定的任務或一組緊密相關的任務</strong>。這使得程式更有組織,更易於理解和維護。</p>
<p><img data-src="/images/posts/OOP-basic/CandC.png"
style="width: 70%; margin: 15px auto;"></p>
<h1 id="三個特性"><a href="#三個特性" class="headerlink" title="三個特性"></a>三個特性</h1><h2 id="繼承-Inheritance"><a href="#繼承-Inheritance" class="headerlink" title="繼承 (Inheritance)"></a>繼承 (Inheritance)</h2><p>繼承允許新創建的類別(子類別)繼承一個或多個現有類別(父類別)的屬性和方法。這促進了程式碼重用和擴展性。</p>
<h2 id="封裝-Encapsulation"><a href="#封裝-Encapsulation" class="headerlink" title="封裝 (Encapsulation)"></a>封裝 (Encapsulation)</h2><p>封裝是將數據(屬性)和行為(方法)綁定到單個單位(類別)中,並<strong>限制對該單位內部的直接訪問</strong>。這有助於保護數據和隱藏實現細節。</p>
<h2 id="多型-Polymorphism"><a href="#多型-Polymorphism" class="headerlink" title="多型 (Polymorphism)"></a>多型 (Polymorphism)</h2><p>多型允許對<strong>不同類別的物件使用共同的接口</strong>。這意味著可以在不同類別的物件上執行同一操作,而每個類別可以以不同的方式響應相同的操作。</p>
<h1 id="五個原則-SOLID"><a href="#五個原則-SOLID" class="headerlink" title="五個原則 (SOLID)"></a>五個原則 (SOLID)</h1><h2 id="S-單一職責原則(Single-Responsibility-Principle)"><a href="#S-單一職責原則(Single-Responsibility-Principle)" class="headerlink" title="S - 單一職責原則(Single Responsibility Principle):"></a>S - 單一職責原則(Single Responsibility Principle):</h2><p>一個類別應該只有一個改變的理由,這意味著<strong>一個類別應該只做一件事</strong>。</p>
<h2 id="O-開放封閉原則(Open-Closed-Principle)"><a href="#O-開放封閉原則(Open-Closed-Principle)" class="headerlink" title="O - 開放封閉原則(Open/Closed Principle):"></a>O - 開放封閉原則(Open/Closed Principle):</h2><p>軟體實體(類別、模組、函數等)應該<strong>對擴展開放,對修改封閉</strong>。這意味著應該能夠在不修改現有代碼的情況下擴展其功能。</p>
<h2 id="L-里氏替換原則(Liskov-Substitution-Principle)"><a href="#L-里氏替換原則(Liskov-Substitution-Principle)" class="headerlink" title="L - 里氏替換原則(Liskov Substitution Principle):"></a>L - 里氏替換原則(Liskov Substitution Principle):</h2><p><strong>子類別應該能夠替換其父類別而不影響程序的正常運行</strong>。</p>
<h2 id="I-接口隔離原則(Interface-Segregation-Principle)"><a href="#I-接口隔離原則(Interface-Segregation-Principle)" class="headerlink" title="I - 接口隔離原則(Interface Segregation Principle):"></a>I - 接口隔離原則(Interface Segregation Principle):</h2><p>不應強迫客戶依賴於它們不使用的接口。換句話說,<strong>更小和更具體的接口優於大而通用的接口</strong>。</p>
<h2 id="D-依賴反轉原則(Dependency-Inversion-Principle)"><a href="#D-依賴反轉原則(Dependency-Inversion-Principle)" class="headerlink" title="D - 依賴反轉原則(Dependency Inversion Principle):"></a>D - 依賴反轉原則(Dependency Inversion Principle):</h2><p><strong>高層模組不應該依賴低層模組,兩者都應該依賴於抽象</strong>;抽象不應該依賴於細節,細節應該依賴於抽象。這有助於減少類別之間的直接依賴,從而提高系統的靈活性和可重用性。</p>
<h1 id="Design-Pattern概述"><a href="#Design-Pattern概述" class="headerlink" title="Design Pattern概述"></a>Design Pattern概述</h1><p>設計模式是一組作為最佳實踐的解決方案,用來解決特定類型的重複出現的設計問題。這些模式不是現成的程式碼,而是可以在許多不同情況下使用的模板。它們提高了代碼的可重用性、靈活性和維護性。</p>
<p>設計模式通常分為三大類:</p>
<ol>
<li>創建型模式(Creational Patterns)<br>這些模式與物件創建機制有關,幫助創建物件的方式使得系統獨立於物件的創建和組合方式。</li>
<li>結構型模式(Structural Patterns)<br>這些模式與物件的組合有關,通常用於形成大的物件結構</li>
<li>行為型模式(Behavioral Patterns)<br>這些模式關注物件間的通訊和責任分配</li>
</ol>
]]></content>
<categories>
<category>OOP</category>
</categories>
<tags>
<tag>OOP</tag>
</tags>
</entry>
<entry>
<title>[NLP][ML] Transformer (1) - Structure</title>
<url>/posts/4283617483/</url>
<content><![CDATA[<h1 id="Overview"><a href="#Overview" class="headerlink" title="Overview"></a>Overview</h1><p>The Transformer is a <strong>deep learning architecture</strong> introduced in the paper “Attention Is All You Need” by Vaswani et al. in 2017. It revolutionized the field of natural language processing (NLP) and brought significant advancements in various <strong>sequence-to-sequence tasks</strong>. The Transformer architecture, thanks to its <strong>attention mechanisms</strong>, enables efficient processing of sequential data while <strong>capturing long-range dependencies</strong>.</p>
<hr>
<p>Transformer is a <strong>Seq2Seq(Sequence to Sequence) model</strong>. It uses <strong>Encoder-Decoder structure</strong><br>Below is a simple diagram:</p>
<p><img data-src="/images/posts/NLP-series/transformer-1.gif"
style="width: 70%; margin: 15px auto;"><br>Source: <span class="exturl" data-url="aHR0cHM6Ly9haS5nb29nbGVibG9nLmNvbS8yMDE2LzA5L2EtbmV1cmFsLW5ldHdvcmstZm9yLW1hY2hpbmUuaHRtbA==">https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html<i class="fa fa-external-link-alt"></i></span></p>
<p>The line between Encoder and Decoder represents the “Attention”.<br>The thicker the line, the more the Decoder below pays more attention to some Chinese characters above when generating an English word.</p>
<span id="more"></span>
<p>In the below sections, I will introduce the structure of transformer and the attention machanism.<br>For the <strong>detail explaination of key components in transformer and the details of attention</strong>, you can refer to the <a href="https://mao-code.github.io/posts/2443192075/#more">next article</a>.</p>
<h1 id="Structure"><a href="#Structure" class="headerlink" title="Structure"></a>Structure</h1><p>Below is the strucutre of Transformer.</p>
<p><img data-src="/images/posts/NLP-series/transformer-2.png"
style="width: 70%; margin: 15px auto;"></p>
<p>The part <strong>on the left in the figure is the Encoder, and the part on the right is the Decoder</strong>.<br>You can notice that the structures of the left and right sides are actually quite similar.<br>The Encoder and Decoder usually contain many blocks with the same layer structure, and each layer will have <strong>multi-head attention and Feed Forward Network</strong>.</p>
<h2 id="Encoder"><a href="#Encoder" class="headerlink" title="Encoder"></a>Encoder</h2><p><img data-src="/images/posts/NLP-series/transformer-3.png"
style="width: 70%; margin: 15px auto;"></p>
<p>We just mentioned that the Encoder will be divided into many blocks.<br>We will <strong>first convert a whole row of input sequence data into a whole row of vectors</strong>, and then the processing steps of each block are as follows:</p>
<ol>
<li>Firstly, after considering all the input information of the input vector through self-attention, output a row of vectors. (I will introduce how self-attention considers all input information later)</li>
<li>Throw this row of vectors into the feed forward network of Fully Connected (FC).</li>
<li>The final output vector is the output of the block.</li>
</ol>
<hr>
<p>However, what the block does in the original Transformer is more complicated, the details are as follows:<br><img data-src="/images/posts/NLP-series/transformer-4.png"
style="width: 70%; margin: 15px auto;"></p>
<p>Suppose we will follow the method just now, and the <strong>output result of the input vector after self-attention is called a</strong>. we also need to <strong>pull the original input (we first call it b )</strong> and add it to a to get a+b . Such a network architecture is called a <strong>residual connection</strong>.</p>
<p>After that, we will do <strong>layer normalization</strong> on the result of a+b. It will calculate the average $m$(mean) and standard deviation $\sigma$(standard deviation) of the input vector, and then calculate it according to the formula: divide the input minus the mean $m$ by the standard deviation $\sigma$. It’s here that we really get <strong>the input of the FC network</strong>.</p>
<p>The <strong>FC network also has a residual architecture</strong>, so we will add the input of the FC network to its output to get a new output, and then do layer normalization again. This is the <strong>real output of a block in Transformer Encoder</strong>.</p>
<hr>
<p>Now, let’s look back at the structure diagram of Encoder:</p>
<p><img data-src="/images/posts/NLP-series/transformer-5.png"
style="width: 70%; margin: 15px auto;"></p>
<p>First, in the place of input, <strong>convert the input into a vector through Embedding</strong>, and then add <strong>positional encoding</strong> (because if only self-attention is used, there will be a lack of unknown information)</p>
<p>Next we see <strong>Multi-Head Attention</strong>, which is <strong>the block of self-attention and Add&Norm means residual plus layer normalization</strong>.</p>
<p>Finally, <strong>doing Add&Norm again after FC’s feed forward network is the output of the whole block</strong>, and this block will be <strong>repeated n times</strong>.</p>
<h2 id="Decoder"><a href="#Decoder" class="headerlink" title="Decoder"></a>Decoder</h2><p>Then let us look at the Decoder:</p>
<p><img data-src="/images/posts/NLP-series/transformer-2.png"
style="width: 70%; margin: 15px auto;"></p>
<p><strong>Input the sequence obtained at the previous time</strong>, and then perform the same Embedding and Positional Encoding and then enter the block repeated N times.<br>The difference is that there is an extra <strong>“Masked” (note the red box) in the Multi-Head Attention</strong> when we first entered. What does it mean?</p>
<p>Masked means that the model will <strong>only pay attention to the part it has already generated</strong>, and <strong>will not accidentally pay attention to the words generated in the future</strong>. Since the output of the Decoder is generated <strong>one by one</strong>, it has no way to consider its future input. It seems a bit vague to say this, and I will make it clearer when I talk about self-attention in the next article.</p>
<p>After repeating the block N times, after the <strong>Linear Layer and Softmax</strong>, we can get the output <strong>probability distribution</strong> we want, and we can sample according to this distribution or take the value with the highest probability to get the output sequence.</p>
<p>There is still one of the most critical self-attention mechanisms that has not been explained in detail. Let’s introduce how it pays attention to all input sequences and performs parallel processing in the next article!</p>
<h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1lTWx4NWZGTm9ZYw==">3Blue1Brown<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9pdGhlbHAuaXRob21lLmNvbS50dy9hcnRpY2xlcy8xMDI4MDM5Mg==">iThome - Day 27 Transformer (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9pdGhlbHAuaXRob21lLmNvbS50dy9hcnRpY2xlcy8xMDI4MTI0Mg==">iThome - Day 28 Self-Attention (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9oYWNrbWQuaW8vQGFibGl1L0JrWG16REJtcg==">Transformer 李宏毅深度學習 (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9zcGVlY2guZWUubnR1LmVkdS50dy9+aHlsZWUvbWwvbWwyMDIxLWNvdXJzZS1kYXRhL3NlcTJzZXFfdjkucGRm">Transformer 李宏毅老師簡報<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vY2hhbm5lbC9VQzJnZ2p0dXVXdnhySEhIaWFESDFkbFE=">李宏毅老師YouTube channel<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzE3MDYuMDM3NjI=">Attention is all you need (paper)<i class="fa fa-external-link-alt"></i></span></li>
</ul>
]]></content>
<tags>
<tag>ML</tag>
<tag>AI</tag>
<tag>NLP</tag>
</tags>
</entry>
<entry>
<title>[NLP][ML] Adapters & LoRA</title>
<url>/posts/38266156/</url>
<content><![CDATA[<h1 id="Overview"><a href="#Overview" class="headerlink" title="Overview"></a>Overview</h1><p>In this article, I will provide an introduction to adapters and LoRA, including their <strong>definitions, purposes, and functions.</strong> I will also explore their various <strong>applications</strong> and, lastly, delve into the distinctions that set them apart(the <strong>differences between them</strong>).</p>
<span id="more"></span>
<h1 id="Adapter"><a href="#Adapter" class="headerlink" title="Adapter"></a>Adapter</h1><h2 id="What-are-Adapters"><a href="#What-are-Adapters" class="headerlink" title="What are Adapters?"></a>What are Adapters?</h2><p>According to <span class="exturl" data-url="aHR0cHM6Ly93d3cuYW5hbHl0aWNzdmlkaHlhLmNvbS9ibG9nLzIwMjMvMDQvdHJhaW5pbmctYW4tYWRhcHRlci1mb3Itcm9iZXJ0YS1tb2RlbC1mb3Itc2VxdWVuY2UtY2xhc3NpZmljYXRpb24tdGFzay8jOn46dGV4dD1BZGFwdGVycyUyMGFyZSUyMGxpZ2h0d2VpZ2h0JTIwYWx0ZXJuYXRpdmVzJTIwdG8sbW9kdWxhciUyMGFwcHJvYWNoJTIwdG8lMjB0cmFuc2ZlciUyMGxlYXJuaW5nLg==">this article<i class="fa fa-external-link-alt"></i></span><br>We can give the definition of adapters:</p>
<blockquote>
<p><strong>Adapters are lightweight alternatives to fully fine-tuned pre-trained models.</strong><br>Currently, <strong>adapters are implemented as small feedforward neural networks</strong> that are <strong>inserted between layers of a pre-trained model.</strong><br>They provide a <strong>parameter-efficient, computationally efficient, and modular approach</strong> to transfer learning. The following image shows added adapter.</p>
</blockquote>
<p>The image below clearly shows the usage flow of apadters<br>(Source: <span class="exturl" data-url="aHR0cHM6Ly9hZGFwdGVyaHViLm1sLw==">AdapterHub<i class="fa fa-external-link-alt"></i></span>)<br><img data-src="/images/posts/NLP-series/adapter.gif"
style="width: 70%; margin: 15px auto;"></p>
<blockquote>
<p>During training, <strong>all the weights of the pre-trained model are frozen</strong> such that only the adapter weights<br>are updated, resulting in <strong>modular knowledge representations</strong>. They can be easily <strong>extracted, interchanged,</strong><br><strong>independently distributed, and dynamically plugged</strong> into a language model. These properties highlight the<br>potential of adapters in advancing the NLP field astronomically.</p>
</blockquote>
<h2 id="What-is-the-purpose-and-function-of-adapters"><a href="#What-is-the-purpose-and-function-of-adapters" class="headerlink" title="What is the purpose and function of adapters?"></a>What is the purpose and function of adapters?</h2><ol>
<li>Purpose:<br>Large Language Models are computationally expensive and memory-intensive. <strong>Fine-tuning the entire model for each specific task</strong> can be impractical due to resource constraints. Adapters provide a solution by allowing for more efficient and <strong>targeted modifications</strong> to the model for different tasks. This approach <strong>saves both computational power and memory</strong>, enabling the deployment of a single pre-trained model for multiple tasks.</li>
<li>Functions:<ol>
<li><p>Efficient Fine-tuning: Instead of fine-tuning the entire model, adapters enable <strong>fine-tuning only a small subset of parameters</strong> related to a specific task. This fine-tuning process is faster and requires fewer resources.</p>
</li>
<li><p>Task-specific Modifications: Adapters allow you to add <strong>task-specific layers or modifications</strong> to the pre-trained model without altering the core architecture. This makes it easier to adapt the model for various tasks like text classification, named entity recognition, sentiment analysis, etc.</p>
</li>
<li><p>Versatility: With adapters, a single pre-trained LLM can be <strong>adapted for a wide range of tasks.</strong> This versatility is beneficial in scenarios where deploying and maintaining separate models for each task might be impractical.</p>
</li>
<li><p>Interoperability: Adapters enable the combination of pre-trained models with task-specific modifications in a standardized way. This facilitates sharing, collaboration, and research in the NLP community.</p>
</li>
<li><p>Transfer Learning: Adapters enhance the <strong>effectiveness of transfer learning.</strong> Models pre-trained on <strong>large and diverse datasets can be fine-tuned on smaller</strong>, task-specific datasets using adapters, improving performance on specific tasks.</p>
</li>
<li><p>Incremental Updates: Adapters allow for easy updates to the model. Instead of retraining the entire model, only the adapters related to a specific task need to be fine-tuned when new data or requirements arise.</p>
</li>
</ol>
</li>
</ol>
<p>Overall, adapters are a <strong>mechanism that strikes a balance between the benefits of fine-tuning for specific tasks and the efficiency of reusing pre-trained LLMs.</strong> They enable the NLP community to leverage the power of these large models while tailoring them to a diverse set of applications.</p>
<h2 id="The-applications-of-adapters"><a href="#The-applications-of-adapters" class="headerlink" title="The applications of adapters"></a>The applications of adapters</h2><p>I list some practical applications that can use adapters to enhance.</p>
<ol>
<li><p>Efficient Task Adaptation: Adapters make it possible to fine-tune a pretrained model for specific tasks with minimal computational resources and time. This is particularly useful for industries that require quick adaptation to changing trends or requirements.</p>
</li>
<li><p>Multilingual Applications: Adapters can be used to enable a pretrained model to perform tasks in multiple languages. This is valuable for businesses operating in global markets.</p>
</li>
<li><p>Domain-Specific NLP: Adapting models with domain-specific adapters (e.g., medical, legal, financial) enhances their performance on tasks specific to those domains.</p>
</li>
<li><p>Personalization: Adapters can be used to personalize a general-purpose model for individual users or contexts, leading to more relevant and tailored responses.</p>
</li>
</ol>
<h1 id="LoRA-Low-Rank-Adaptation"><a href="#LoRA-Low-Rank-Adaptation" class="headerlink" title="LoRA (Low-Rank Adaptation)"></a>LoRA (Low-Rank Adaptation)</h1><h2 id="What-is-LoRA"><a href="#What-is-LoRA" class="headerlink" title="What is LoRA?"></a>What is LoRA?</h2><ul>
<li>Low-Rank Adaptation, or LoRA, is proposed, which <strong>freezes the pre-trained model weights</strong> and <strong>injects trainable rank decomposition matrices into each layer</strong> of the <strong>Transformer</strong> architecture, greatly <strong>reducing the number of trainable parameters</strong> for downstream tasks.</li>
</ul>
<p>And let’s see the full fine-tuning definition:</p>
<ul>
<li>Full fine-tuning LLM, which <strong>retrains all model parameters</strong>, becomes less feasible. Using GPT-3 175B as an example — deploying independent instances of fine-tuned models, each with 175B parameters, is prohibitively expensive.</li>
</ul>
<p>In <span class="exturl" data-url="aHR0cHM6Ly9iZHRlY2h0YWxrcy5jb20vMjAyMy8wNS8yMi93aGF0LWlzLWxvcmEv">this article<i class="fa fa-external-link-alt"></i></span>, it clearly compare the fine-tuning and LoRA approaches. And in <span class="exturl" data-url="aHR0cHM6Ly9zaC10c2FuZy5tZWRpdW0uY29tL2JyaWVmLXJldmlldy1sb3JhLWxvdy1yYW5rLWFkYXB0YXRpb24tb2YtbGFyZ2UtbGFuZ3VhZ2UtbW9kZWxzLWZhZjVkZGQ1ODAyZiM6fjp0ZXh0PUxvUkElMkMlMjBMb3clMkRSYW5rJTIwTExNJTIwRmluZSUyRFR1bmluZyUyQyUyMFJlZHVjZSUyMFJlcXVpcmVkJTIwTWVtb3J5JnRleHQ9TG93JTJEUmFuayUyMEFkYXB0YXRpb24lMkMlMjBvciUyMExvUkEsdHJhaW5hYmxlJTIwcGFyYW1ldGVycyUyMGZvciUyMGRvd25zdHJlYW0lMjB0YXNrcy4=">this article<i class="fa fa-external-link-alt"></i></span>, it dives more into the machanism of LoRA. If you wnat to know the full knowledge, you can refer to <span class="exturl" data-url="aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIxMDYuMDk2ODU=">the paper of LoRA<i class="fa fa-external-link-alt"></i></span>.<br>In the next sections, I will introduce more ideas of LoRA based on these references. Note that I only organize the contents of these articles and add some mark and note on my own.</p>
<hr>
<h3 id="How-does-fine-tuning-LLMs-work"><a href="#How-does-fine-tuning-LLMs-work" class="headerlink" title="How does fine-tuning LLMs work?"></a>How does fine-tuning LLMs work?</h3><p>Open-source LLMs such as LLaMA,Vicuna are foundation models that <strong>have been pre-trained on hundreds of billions of words.</strong> Developers and machine learning engineers can download the model with the <strong>pre-trained weights</strong> and <strong>fine-tune it for downstream tasks</strong> such as <span class="exturl" data-url="aHR0cHM6Ly9iZHRlY2h0YWxrcy5jb20vMjAyMy8wMS8xNi93aGF0LWlzLXJsaGYv">instruction following<i class="fa fa-external-link-alt"></i></span>.</p>
<p>The model is provided input from the <strong>fine-tuning dataset</strong>. It then <strong>predicts the next tokens and compares its output with the ground truth.</strong> It then <strong>adjusts the weights(gradient)</strong> to correct its predictions. By doing this over and over, the LLM becomes fine-tuned to the downstream task.</p>
<p><img data-src="/images/posts/NLP-series/LLM-fintune.png"
style="width: 70%; margin: 15px auto;"></p>
<p>(Source: <span class="exturl" data-url="aHR0cHM6Ly9iZHRlY2h0YWxrcy5jb20vMjAyMy8wNS8yMi93aGF0LWlzLWxvcmEv">What is low-rank adaptation (LoRA)?<i class="fa fa-external-link-alt"></i></span>)</p>
<h3 id="The-idea-of-LoRA"><a href="#The-idea-of-LoRA" class="headerlink" title="The idea of LoRA"></a>The idea of LoRA</h3><p>Now, let’s make a small modification to the fine-tuning process. In this new method, we <strong>freeze the original weights of the model and don’t modify them during the fine-tuning process.</strong> Instead, we apply the modifications to a <strong>separate set of weights</strong> and we add their new values to the original parameters. Let’s call these two sets <strong>“pre-trained” and “fine-tuned” weights</strong>.</p>
<blockquote>
<p><strong>Separating the pre-trained and fine-tuned parameters is an important part of LoRA.</strong><br><img data-src="/images/posts/NLP-series/LoRA-1.png"
style="width: 70%; margin: 15px auto;"></p>
</blockquote>
<h4 id="Low-rank-adaptation"><a href="#Low-rank-adaptation" class="headerlink" title="Low-rank adaptation"></a>Low-rank adaptation</h4><p>Before moving on to LoRA, let’s think about our <strong>model parameters as very large matrices</strong>. If you remember your linear algebra class, <strong>matrices can form vector spaces</strong>. In this case, we’re talking about a <strong>very large vector space with many dimensions</strong> that models language.</p>
<p>Every matrix has a <strong>“rank”</strong>, which is <strong>the number of linearly independent columns it has</strong>. If a column is linearly independent, it means that <strong>it can’t be represented as a combination of other columns in the matrix</strong>. On the other hand, a dependent column is one that can be represented as a combination of one or more columns in the same matrix. You can remove dependent columns from a matrix without losing information.</p>
<p>LoRA, proposed in a <span class="exturl" data-url="aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIxMDYuMDk2ODU=">paper<i class="fa fa-external-link-alt"></i></span> by researchers at Microsoft, suggests that when fine-tuning an LLM for a downstream task, <strong>you don’t need the full-rank weight matrix.</strong> They proposed that you could preserve most of the learning capacity of the model while <strong>reducing the dimension of the downstream parameters.</strong> (This is why it makes sense to separate the pre-trained and fine-tuned weights.)</p>
<hr>
<p><img data-src="/images/posts/NLP-series/LoRA-2.png"
style="width: 70%; margin: 15px auto;"></p>
<p>Basically, in LoRA, you create <strong>two downstream weight matrices</strong>. One <strong>transforms the input parameters from the original dimension to the low-rank dimension</strong>. And the second matrix <strong>transforms the low-rank data to the output dimensions of the original model</strong>.</p>
<p>During training, <strong>modifications are made to the LoRA parameters</strong>, which are now much fewer than the original weights. This is why they can be trained much faster and at a fraction of the cost of doing full fine-tuning. <strong>At inference time, the output of LoRA is added to the pre-trained parameters to calculate the final values.</strong></p>
<h4 id="More-detail"><a href="#More-detail" class="headerlink" title="More detail"></a>More detail</h4><p><img data-src="/images/posts/NLP-series/LoRA-3.png"
style="width: 70%; margin: 15px auto;"></p>
<ul>
<li>For a pre-trained weight matrix $W_0$, its update is constrained by representing the latter with a low-rank decomposition:</li>
</ul>
<p>$$\begin{equation}<br> W_0 + \Delta{W} = W_0 + BA<br>\end{equation}$$</p>
<ul>
<li>During training, <strong>$W_0$ is frozen</strong> and does not receive gradient updates, while <strong>A and B contain trainable parameters</strong>.</li>
<li>For $h=W_0$, the modified forward pass yields:</li>
</ul>
<p>$$\begin{equation}<br> h = W_0x + \Delta{W}x = W_0x + BAx<br>\end{equation}$$</p>
<ul>
<li><p>A random <strong>Gaussian initialization is used for A</strong> and <strong>zero is used for B</strong>, so $\Delta{W}=BA$ is zero at the beginning of training. (The method of initializing the weights)</p>
</li>
<li><p>One of the advantages is that when deployed in production, we can <strong>explicitly compute and store $W=W_0+BA$</strong> and perform inference as usual. <strong>No additional latency</strong> compared to other methods, such as appending more layers.</p>
</li>
</ul>
<p>You can see the implementation and more detail of LoRA in <span class="exturl" data-url="aHR0cHM6Ly95b3V0dS5iZS9kQS1OaEN0cnJWRQ==">this video<i class="fa fa-external-link-alt"></i></span><br>and <span class="exturl" data-url="aHR0cHM6Ly9ibG9nLm1sNi5ldS9sb3ctcmFuay1hZGFwdGF0aW9uLWEtdGVjaG5pY2FsLWRlZXAtZGl2ZS03ODJkZWM5OTU3NzI=">this article<i class="fa fa-external-link-alt"></i></span>.</p>
<h2 id="What-is-the-purpose-and-function-of-LoRA"><a href="#What-is-the-purpose-and-function-of-LoRA" class="headerlink" title="What is the purpose and function of LoRA?"></a>What is the purpose and function of LoRA?</h2><ul>
<li>Purpose<ul>
<li>The purpose of LoRA is to make it easier and more efficient to fine-tune LLMs for downstream tasks.</li>
</ul>
</li>
<li>Function<ul>
<li>The function of LoRA is to decompose the LLM into a low-rank representation and then adapt this representation to the target task.</li>
<li>Here are the benefits:</li>
</ul>
</li>
<li>Reduced number of parameters: The low-rank representation has a much smaller number of parameters than the original LLM, which can make it faster to train and easier to deploy.</li>
<li>Improved performance: The low-rank representation is able to capture the most important features of the LLM, which can lead to improved performance on the downstream task.</li>
</ul>
<h2 id="The-applications-of-LoRA"><a href="#The-applications-of-LoRA" class="headerlink" title="The applications of LoRA"></a>The applications of LoRA</h2><ul>
<li>Fine-tuning large language models for downstream tasks:<br>LoRA can be used to fine-tune large language models (LLMs) for a variety of downstream tasks, such as question answering, summarization, and translation. This can <strong>make LLMs more accessible and easier to use for a wider range of applications.</strong></li>
<li>Improving the efficiency of machine learning models:<br> LoRA can be used to improve the efficiency of machine learning models by reducing the number of parameters. This can <strong>make models faster to train and easier to deploy</strong>.</li>
<li>Compressing large datasets:<br>LoRA can be used to <strong>compress large datasets by representing them in a low-rank format</strong>. This can make datasets easier to <strong>store and transmit</strong>.</li>
<li>Improving the security of machine learning models:<br>LoRA can be used to improve the security of machine learning models by making them more resistant to adversarial attacks.</li>
</ul>
<h1 id="Differences-between-adapter-and-LoRA"><a href="#Differences-between-adapter-and-LoRA" class="headerlink" title="Differences between adapter and LoRA"></a>Differences between adapter and LoRA</h1><table>
<thead>
<th>Feature</th>
<th>Adapter</th>
<th>LoRA</th>
</thead>
<tbody>
<tr>
<td>Approach</td>
<td>Adds additional layers to the pretrained model </td>
<td>Decomposes the pretrained model into a low-rank representation
</td>
</tr>
<tr>
<td>Parameters</td>
<td>Adds a small number of parameters to the pretrained model</td>
<td>Reduces the number of parameters in the pretrained model</td>
</tr>
<tr>
<td>Performance</td>
<td>Effective for a variety of downstream tasks</td>
<td>Particularly effective for tasks that require a large number of parameters</td>
</tr>
<tr>
<td>Speed</td>
<td>Can be faster to train than LoRA</td>
<td>Can be faster at inference time</td>
</tr>
<tr>
<td>Memory usage</td>
<td>Can use more memory than LoRA</td>
<td></td>
</tr>
</tbody>
</table>
<p>In general, adapters are a good choice for tasks that require a small number of parameters and can be trained quickly, while LoRA is a good choice for tasks that require a large number of parameters and need to be fast at inference time.</p>
<h1 id="References"><a href="#References" class="headerlink" title="References"></a>References</h1><ul>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cuYW5hbHl0aWNzdmlkaHlhLmNvbS9ibG9nLzIwMjMvMDQvdHJhaW5pbmctYW4tYWRhcHRlci1mb3Itcm9iZXJ0YS1tb2RlbC1mb3Itc2VxdWVuY2UtY2xhc3NpZmljYXRpb24tdGFzay8jOn46dGV4dD1BZGFwdGVycyUyMGFyZSUyMGxpZ2h0d2VpZ2h0JTIwYWx0ZXJuYXRpdmVzJTIwdG8sbW9kdWxhciUyMGFwcHJvYWNoJTIwdG8lMjB0cmFuc2ZlciUyMGxlYXJuaW5nLg==">(Recommend) Training an Adapter for RoBERTa Model for Sequence Classification Task<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9hZGFwdGVyaHViLm1sLw==">AdapterHub<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9odWdnaW5nZmFjZS5jby9kb2NzL2RpZmZ1c2Vycy90cmFpbmluZy9sb3JhIzp+OnRleHQ9TG93JTJEUmFuayUyMEFkYXB0YXRpb24lMjBvZiUyMExhcmdlJTIwTGFuZ3VhZ2UlMjBNb2RlbHMlMjAoTG9SQSklMjBpcyx0cmFpbnMlMjB0aG9zZSUyMG5ld2x5JTIwYWRkZWQlMjB3ZWlnaHRzLg==">Low-Rank Adaptation of Large Language Models (LoRA) (Huggingface)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9zaC10c2FuZy5tZWRpdW0uY29tL2JyaWVmLXJldmlldy1sb3JhLWxvdy1yYW5rLWFkYXB0YXRpb24tb2YtbGFyZ2UtbGFuZ3VhZ2UtbW9kZWxzLWZhZjVkZGQ1ODAyZiM6fjp0ZXh0PUxvUkElMkMlMjBMb3clMkRSYW5rJTIwTExNJTIwRmluZSUyRFR1bmluZyUyQyUyMFJlZHVjZSUyMFJlcXVpcmVkJTIwTWVtb3J5JnRleHQ9TG93JTJEUmFuayUyMEFkYXB0YXRpb24lMkMlMjBvciUyMExvUkEsdHJhaW5hYmxlJTIwcGFyYW1ldGVycyUyMGZvciUyMGRvd25zdHJlYW0lMjB0YXNrcy4=">Brief Review — LoRA: Low-Rank Adaptation of Large Language Models<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9iZHRlY2h0YWxrcy5jb20vMjAyMy8wNS8yMi93aGF0LWlzLWxvcmEv">What is low-rank adaptation (LoRA)?<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzIxMDYuMDk2ODU=">LoRA: Low-Rank Adaptation of Large Language Models (Paper)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9ibG9nLm1sNi5ldS9sb3ctcmFuay1hZGFwdGF0aW9uLWEtdGVjaG5pY2FsLWRlZXAtZGl2ZS03ODJkZWM5OTU3NzI=">(Recommend) Low Rank Adaptation: A Technical Deep Dive<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly95b3V0dS5iZS9kQS1OaEN0cnJWRQ==">Low-rank Adaption of Large Language Models: Explaining the Key Concepts Behind LoRA (Video)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1VczVaRnAxNlBhVQ==">Fine-tuning LLMs with PEFT and LoRA<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9odWdnaW5nZmFjZS5jby9ibG9nL3BlZnQ=">PEFT: Parameter-Efficient Fine-Tuning of Billion-Scale Models on Low-Resource Hardware<i class="fa fa-external-link-alt"></i></span></li>
</ul>
]]></content>
<tags>
<tag>ML</tag>
<tag>AI</tag>
<tag>NLP</tag>
</tags>
</entry>
<entry>
<title>[NLP][ML] Transformer (3) - More conputational detail</title>
<url>/posts/1255359463/</url>
<content><![CDATA[<h1 id="Overview"><a href="#Overview" class="headerlink" title="Overview"></a>Overview</h1><p>In this article, I will focus more on the computing detail in transformer.<br>It will cover self-attention, prallel processing, multi-head self-attention, positional encoding and so on.</p>
<p><img data-src="/images/posts/NLP-series/transformer-2.png"
style="width: 70%; margin: 15px auto;"></p>
<span id="more"></span>
<h1 id="Self-Attention"><a href="#Self-Attention" class="headerlink" title="Self-Attention"></a>Self-Attention</h1><h2 id="Idea"><a href="#Idea" class="headerlink" title="Idea"></a>Idea</h2><ul>
<li>Input:<ul>
<li>$x_1$, …, $x_4$, is a sequence.</li>
<li>Each input first goes through an <strong>embedding(convert to vectors)</strong>, multiplied by a weight matrix to become $a_1$, …, $a_4$. These $a_1$, …, $a_4$ are then passed into a Self-attention layer</li>
</ul>
</li>
<li>Each input is multiplied by different vectors:<ul>
<li>$q$: query (to match against others)<ul>
<li>$q_i$ = $W^qa_i$</li>
</ul>
</li>
<li>$k$: key (to be matched)<ul>
<li>$k_i$ = $W^ka_i$</li>
</ul>
</li>
<li>$v$: value, information to be extracted<ul>
<li>$v_i$ = $W^va_i$</li>
</ul>
</li>
</ul>
</li>
<li>The weights $W^q$, $W^k$, $W^v$ are <strong>learned, initially randomly initialized</strong>.</li>
</ul>
<h2 id="Method"><a href="#Method" class="headerlink" title="Method"></a>Method</h2><p><img data-src="/images/posts/NLP-series/transformer-6.png"
style="width: 70%; margin: 15px auto;"></p>
<p><img data-src="/images/posts/NLP-series/transformer-7.png"
style="width: 70%; margin: 15px auto;"></p>
<ol>
<li>Take each query <strong>$q$ and perform attention on each key $k$</strong> (using two vectors to output a score(attention score)), which is essentially calculating the <strong>similarity of $q$ and $k$ (Similarity)</strong>.<ul>
<li>Scaled Dot-Product: $S(q_1, k_1)$ yields $\alpha_{1,1}$, $S(q_1, k_2)$ yields $\alpha_{1,2}$, and so on.</li>
<li>$\alpha_{1,i}$ = $ q_1 \cdot k_i / \sqrt{d}$ </li>
<li>$d$ represents the <strong>dimensions of $q$ and $k$</strong>. This is a <strong>trick used by the authors in the paper</strong>.</li>
</ul>
</li>
<li>Followed by <strong>Softmax normalization to normalize the values</strong>.</li>
<li>Multiply the obtained $\hat{\alpha}$ with $v$ to get $b$, which is <strong>equivalent to a weighted sum</strong>.</li>
<li>The obtained $b_1$ in the figure is the <strong>first vector (word or character)</strong> of the sought sequence.</li>
<li>Each output vector incorporates information from the <strong>entire sequence</strong>.</li>
</ol>
<h1 id="Prallel-Processing"><a href="#Prallel-Processing" class="headerlink" title="Prallel Processing"></a>Prallel Processing</h1><p><img data-src="/images/posts/NLP-series/transformer-8.png"
style="width: 70%; margin: 15px auto;"></p>
<p>$$q_i = W^qa_i$$<br>$$k_i = W^ka_i$$<br>$$v_i = W^va_i$$</p>
<hr>
<p><img data-src="/images/posts/NLP-series/transformer-9.png"
style="width: 70%; margin: 15px auto;"></p>
<ol>
<li>Consider $a_1$, …, $a_4$ as a matrix $I$. Multiply it by the weight matrix $W^q$ to obtain $q_1$, …, $q_4$, forming another matrix $Q$. </li>
<li>The same process applies to matrices $K$ and $V$, formed by multiplying $q$, $k$, and $a$ to get<br>$\alpha_{1,1}$ = $k^T_1 \cdot q_1$,<br>$\alpha_{1,2}$ = $k^T_2 \cdot q_1$,<br>…<br>Stack $k_1$, …, $k_4$ to form matrix $K$, then multiply it by $q_1$ stacked with $q_2$, …, $q_4$ to form matrix $Q$, resulting in a matrix $A$ composed of $\alpha$ values, which is the <strong>Attention</strong>. </li>
<li>After applying Softmax, it becomes $\hat{A}$. In each time step, attention exists between each pair of vectors.</li>
</ol>
<hr>
<p><img data-src="/images/posts/NLP-series/transformer-10.png"
style="width: 70%; margin: 15px auto;"></p>
<p>By calculating the <strong>weighted sum of $V$ and $\hat{A}$</strong>, you obtain $b$, and the matrix composed of b forms the <strong>output matrix $O$</strong>.”</p>
<hr>
<h2 id="What-self-attention-layer-do"><a href="#What-self-attention-layer-do" class="headerlink" title="What self-attention layer do"></a>What self-attention layer do</h2><p><img data-src="/images/posts/NLP-series/transformer-11-1.png"
style="width: 70%; margin: 15px auto;"></p>
<p><img data-src="/images/posts/NLP-series/transformer-11-2.png"
style="width: 70%; margin: 15px auto;"></p>
<p>By converting it into matrix multiplication, you can utilize the <strong>GPU to accelerate the computation</strong>.</p>
<h1 id="Multi-head-Self-attention"><a href="#Multi-head-Self-attention" class="headerlink" title="Multi-head Self-attention"></a>Multi-head Self-attention</h1><p><img data-src="/images/posts/NLP-series/transformer-12.png"
style="width: 70%; margin: 15px auto;"></p>
<p>Taking 2 heads as an example:</p>
<ul>
<li>Having <strong>2 heads</strong> means splitting $q, k, v$ into two sets of $q, k, v$. And $q_{i,1}$ will only be multiplied with $k_{i,1}$ to obtain $\alpha_{i,1}$, finally calculating $b_{i,1}$. </li>
<li>Afterward, concatenate $b_{i,1}, b_{i,2}$, apply a transformation, and perform dimension reduction to obtain the final $b_i$.</li>
<li><strong>Each head focuses on different information</strong>; some only care about local information (neighborhood data), while others concentrate on global (long-term) information, and so on.</li>
</ul>
<h1 id="Positional-Encoding"><a href="#Positional-Encoding" class="headerlink" title="Positional Encoding"></a>Positional Encoding</h1><p><img data-src="/images/posts/NLP-series/transformer-13.png"
style="width: 40%; margin: 15px auto;"></p>
<p>In the attention mechanism, the order of words in the input sentence doesn’t matter.</p>
<hr>
<p><img data-src="/images/posts/NLP-series/transformer-14.png"
style="width: 70%; margin: 15px auto;"></p>
<ul>
<li>Without positional information => Therefore, there is a <strong>unique position vector $e_i$</strong>, which is not learned but set by humans.</li>
<li>Other methods: <strong>Using one-hot encoding</strong> to represent $p_i$ as $x_i$ to denote its position.</li>
</ul>
<h1 id="Seq2seq-with-Attention"><a href="#Seq2seq-with-Attention" class="headerlink" title="Seq2seq with Attention"></a>Seq2seq with Attention</h1><p><img data-src="/images/posts/NLP-series/transformer-15.png"
style="width: 70%; margin: 15px auto;"></p>
<p>The original seq2seq model consists of two RNNs, an Encoder and a Decoder, and can be applied to machine translation.</p>
<p>In the diagram above, the Encoder originally contained bidirectional RNNs, while the Decoder contained a unidirectional RNN. In the diagram below, <strong>both(bi/unidirectional RNN) have been replaced with Self-Attention layers</strong>, achieving the same purpose and enabling <strong>parallel processing</strong>.</p>
<p><img data-src="/images/posts/NLP-series/transformer-16.png"
style="width: 70%; margin: 15px auto;"></p>
<h1 id="Look-into-the-detail-of-Transformer-Model"><a href="#Look-into-the-detail-of-Transformer-Model" class="headerlink" title="Look into the detail of Transformer Model"></a>Look into the detail of Transformer Model</h1><p><img data-src="/images/posts/NLP-series/transformer-2.png"
style="width: 70%; margin: 15px auto;"></p>
<p>Using Chinese to English translation for example.</p>
<h2 id="Encoder-Part"><a href="#Encoder-Part" class="headerlink" title="Encoder Part:"></a>Encoder Part:</h2><ol>
<li>The input goes through <strong>Input Embedding</strong>, which considers <strong>positional information</strong> and is augmented with manually set Positional Encoding. It then enters the block that <strong>repeats N times</strong>.</li>
</ol>
<hr>
<p><img data-src="/images/posts/NLP-series/transformer-17.png"
style="width: 70%; margin: 15px auto;"></p>
<ol start="2">
<li>Multi-head:<br>Within the Encoder, it utilizes <strong>Multi-head Attention,</strong> which means there <strong>are multiple sets of $q$, $k$, $z$</strong>. Inside this mechanism, individual $qkv$ multiplications with $a$ are performed, leading to the calculation of $\alpha$, ultimately resulting in $b$.</li>
</ol>
<hr>
<p><img data-src="/images/posts/NLP-series/transformer-18.png"
style="width: 70%; margin: 15px auto;"></p>
<ol start="3">
<li>Add & Norm (residual connection):<br>The <strong>input</strong> of Multi-head Attention, denoted as <strong>$a$</strong>, is added to the <strong>output $b$</strong>, resulting in <strong>$b^\prime$</strong>. Following this, <strong>Layer Normalization</strong> is performed. </li>
<li>Once the calculations are completed, the result is passed through the <strong>forward propagation</strong>, followed by another <strong>Add & Norm</strong> step.</li>
</ol>
<h2 id="Decoder-Part"><a href="#Decoder-Part" class="headerlink" title="Decoder Part"></a>Decoder Part</h2><p><img data-src="/images/posts/NLP-series/transformer-18-2.png"
style="width: 70%; margin: 15px auto;"></p>
<ol>
<li><p>The Decoder <strong>input is the output from the previous time step</strong>. It goes through output embedding, considering positional information, and is augmented with manually set positional encoding. It then enters the block that repeats n times.</p>
</li>
<li><p><strong>Masked Multi-head Attention</strong>:<br>Attention is performed, where <strong>“Masked” indicates attending only to the already generated sequence</strong>. This is followed by an Add & Norm layer.</p>
</li>
<li><p>Next, it undergoes a Multi-head Attention layer, attending to the <strong>previous output of the Encoder</strong>, followed by another Add & Norm layer.</p>
</li>
<li><p>After the computations, it is passed to the <strong>Feed Forward forward propagation</strong>. Subsequently, <strong>Linear and Softmax</strong> operations are applied to generate the <strong>final output</strong>.</p>
</li>
</ol>
<hr>
<p>Last but not least, I provide the definition and purpose of encoder and decoder (in the previous article):</p>
<blockquote>
<p>Encoder-Decoder Architecture:<br> The Transformer’s architecture is divided into an <strong>encoder and a decoder</strong>. The <strong>encoder processes the input sequence, capturing its contextual information</strong>, while the <strong>decoder generates the output sequence</strong>. This architecture is widely used in tasks like machine translation.</p>
</blockquote>
<h1 id="Attention-Visualization"><a href="#Attention-Visualization" class="headerlink" title="Attention Visualization"></a>Attention Visualization</h1><h2 id="single-head"><a href="#single-head" class="headerlink" title="single-head"></a>single-head</h2><p><img data-src="/images/posts/NLP-series/transformer-19.png"
style="width: 70%; margin: 15px auto;"></p>
<p>The relationships between words. The thicker the line, the more related of these words.</p>
<h2 id="multi-head"><a href="#multi-head" class="headerlink" title="multi-head"></a>multi-head</h2><p><img data-src="/images/posts/NLP-series/transformer-20.png"
style="width: 70%; margin: 15px auto;"></p>
<p>The results obtained by pairing different sets of $q$ and $k$ vectors differ, indicating that different sets of $q$ and $k$ possess distinct information. This signifies that various sets of $q$ and $k$ hold different types of information, with some focusing on local aspects (below) and others on global aspects (above).</p>
<h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1lTWx4NWZGTm9ZYw==">3Blue1Brown<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9pdGhlbHAuaXRob21lLmNvbS50dy9hcnRpY2xlcy8xMDI4MDM5Mg==">iThome - Day 27 Transformer (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9pdGhlbHAuaXRob21lLmNvbS50dy9hcnRpY2xlcy8xMDI4MTI0Mg==">iThome - Day 28 Self-Attention (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9oYWNrbWQuaW8vQGFibGl1L0JrWG16REJtcg==">Transformer 李宏毅深度學習 (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9zcGVlY2guZWUubnR1LmVkdS50dy9+aHlsZWUvbWwvbWwyMDIxLWNvdXJzZS1kYXRhL3NlcTJzZXFfdjkucGRm">Transformer 李宏毅老師簡報<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vY2hhbm5lbC9VQzJnZ2p0dXVXdnhySEhIaWFESDFkbFE=">李宏毅老師YouTube channel<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzE3MDYuMDM3NjI=">Attention is all you need (paper)<i class="fa fa-external-link-alt"></i></span></li>
</ul>
]]></content>
<tags>
<tag>ML</tag>
<tag>AI</tag>
<tag>NLP</tag>
</tags>
</entry>
<entry>
<title>[NLP][ML] Transformer (2) - Attention & Summary</title>
<url>/posts/2443192075/</url>
<content><![CDATA[<h1 id="Overview"><a href="#Overview" class="headerlink" title="Overview"></a>Overview</h1><p>Self-attention allows the model to <strong>weigh the importance of different parts of an input sequence against each other</strong>, capturing <strong>relationships</strong> and dependencies between elements within the sequence. This is particularly powerful for tasks involving sequential or contextual information, such as language translation, text generation, and more.</p>
<p>What Self-Attention wants to do is to replace what RNN can do<br>Its output/input is the same as RNN, and its biggest advantages are:</p>
<ul>
<li>Can parallelize operations</li>
<li>Each output vector has seen the entire input sequence. So there is no need to stack several layers like CNN.</li>
</ul>
<span id="more"></span>
<h1 id="Difference-between-attention-and-self-attention"><a href="#Difference-between-attention-and-self-attention" class="headerlink" title="Difference between attention and self-attention"></a>Difference between attention and self-attention</h1><p><strong>attention is a broader concept of selectively focusing on information</strong>, while <strong>self-attention is a specific implementation of this concept</strong> where elements within the same sequence are attended to. Self-attention is a fundamental building block of the Transformer architecture, allowing it to capture relationships and dependencies within sequences effectively.</p>
<h1 id="Attention-Machanism"><a href="#Attention-Machanism" class="headerlink" title="Attention (Machanism)"></a>Attention (Machanism)</h1><p>main idea:<br>use <strong>triples</strong><br>$$<Q,K,V>$$</p>
<p>Represents the <strong>attention mechanism</strong>, expresses the <strong>similarity between Query and Key</strong>, and then assigns the value of <strong>Value according to the similarity</strong><br><strong>formula</strong>:<br>$$Attention(Q,K,V) = softmax(\frac{QK^T}{\sqrt{d_k}})V$$</p>
<h1 id="Self-Attention-Layer-Key"><a href="#Self-Attention-Layer-Key" class="headerlink" title="Self-Attention(Layer) (Key)"></a>Self-Attention(Layer) (<strong>Key</strong>)</h1><h2 id="Computing-process"><a href="#Computing-process" class="headerlink" title="Computing process"></a>Computing process</h2><ol>
<li><p>Now we suppose to enter $a_1$~$a_4$ Four vector, and Self-Attention needs to output another row of $b$ vectors, and each $b$ is generated after considering all $a$</p>
</li>
<li><p>To figure out $b_1$, the first step is based on $a_1$, find <strong>other vectors related to $a_1$</strong> in this sequence. We use the “$\alpha$” to represent the <strong>similarity</strong> of each vector related to $a_1$.</p>
</li>
<li><p>It must be mentioned here that there are <strong>3 very important values in the Self-Attention mechanism</strong>: <strong>Query, Key, Value</strong>. Respectively represent <strong>the value used to match, the value to be matched, and the extracted information</strong>.</p>
</li>
<li><p>As for determining the <strong>correlation between two vectors</strong>, the most commonly used method is the <strong>dot product (here is scaled dot product)</strong>. It takes two vectors as input and multiplies them with two different matrices. The left vector is multiplied by the matrix $W^q$ (Query matrix), and the right vector is multiplied by the matrix $W^k$ (Key matrix). The values of $W^q$ and $W^k$ are both <strong>randomly initialized and obtained through training</strong>.</p>
</li>
<li><p>Next, after obtaining the two vectors, $q$, $k$, the <strong>dot product is computed between them</strong>. After summing up all the dot products, a <strong>scalar (magnitude)</strong> is obtained. This scalar is represented as $\alpha$, which we consider as <strong>the degree of correlation between the two vectors.</strong></p>
</li>
</ol>
<p><img data-src="/images/posts/NLP-series/transformer-6.png"
style="width: 70%; margin: 15px auto;"></p>
<p>Next, we apply what was just introduced to Self-Attention.</p>
<ol>
<li>First, we calculate the relationships “$\alpha$” between $a_1$ and $a_2$, $a_3$, $a_4$ individually. <ul>
<li>We multiply $a_1$ by $W^q$ to obtain $q_1$. </li>
<li>Then, we multiply $a_2$, $a_3$, $a_4$ by $W^k$ respectively and compute the inner products to determine the relationship “$\alpha$” between $a_1$ and each vector. </li>
<li>Applying the Softmax function yields $\alpha’$. </li>
<li>With this $\alpha’$, we can extract crucial information from this sequence!</li>
</ul>
</li>
</ol>
<hr>
<p><img data-src="/images/posts/NLP-series/transformer-7.png"
style="width: 70%; margin: 15px auto;"></p>
<ol start="2">
<li>How to extract important information using $\alpha’$? The steps are as follows:</li>
</ol>
<ul>
<li>First, multiply $a_1$ ~ $a_4$ by $W^v$ to obtain new vectors, denoted as $v_1$, $v_2$, $v_3$ and $v_4$, respectively (where $W^v$ is the Value matrix).</li>
<li>Next, multiply each vector here, $v_1$ ~ $v_4$, by $\alpha’$, and then sum them to obtain the output $b_1$ (formula written in the top-right corner of the image).</li>
</ul>
<p>If a certain vector receives a higher score - for instance, if the relationship between $a_1$ and $a_2$ is strong, leading to a large value for $\alpha_{1,2}’$ - then after performing the weighted sum, the value of $b_1$ obtained could be very close to $v_2$.</p>
<p>Now that we know how to compute $b_1$, it naturally follows that we can deduce $b_1$, $b_2$, $b_3$, and $b_4$ using the same method. With this, we have completed the explanation of the internal computation process of Self-Attention.</p>
<p>Last but not least, the <strong>similarity matrix($\alpha_{i,j}$)</strong> is just the <strong>attention</strong> in the self-attention layer, i.e. importance or relevance to other elements in the same sequence..</p>
<h1 id="Summary"><a href="#Summary" class="headerlink" title="Summary"></a>Summary</h1><h2 id="Attention-Score-Weight-and-Output"><a href="#Attention-Score-Weight-and-Output" class="headerlink" title="Attention (Score, Weight and Output)"></a>Attention (Score, Weight and Output)</h2><ul>
<li>$ Attention Score = QK^T $</li>
<li>$ Attention Weights = Softmax(\frac{Attention Score}{\sqrt{d_k}}) $</li>
<li>$ Attention Output = (Attention Weights)V $</li>
</ul>
<h2 id="Key-Components-of-Transformers"><a href="#Key-Components-of-Transformers" class="headerlink" title="Key Components of Transformers"></a>Key Components of Transformers</h2><ol>
<li><p>Self-Attention Mechanism:<br>The core innovation of the Transformer is the self-attention mechanism, which allows the model to <strong>weigh the importance of different words in a sequence relative to each other</strong>. It computes attention scores for each word by considering its <strong>relationships with all other words</strong> in the same sequence. Self-attention enables capturing context and dependencies between words <strong>regardless of their distance</strong>.</p>
</li>
<li><p>Multi-Head Attention:<br>To capture different types of relationships, the Transformer employs multi-head attention. <strong>Multiple sets of self-attention mechanisms (attention heads) run in parallel</strong>, and their outputs are concatenated and linearly transformed to create a more comprehensive representation.</p>
</li>
<li><p>Positional Encodings:<br>Since the Transformer does not inherently understand <strong>the order of words in a sequence</strong> (unlike recurrent networks(RNN)), positional encodings are added to the input embeddings. These encodings provide <strong>information about the positions of words within the sequence</strong>.</p>
</li>
<li><p>Encoder-Decoder Architecture:<br>The Transformer’s architecture is divided into an <strong>encoder and a decoder</strong>. The <strong>encoder processes the input sequence, capturing its contextual information</strong>, while the <strong>decoder generates the output sequence</strong>. This architecture is widely used in tasks like machine translation.</p>
</li>
<li><p>Residual Connections and Layer Normalization:<br>To address the <strong>vanishing gradient problem</strong>, residual connections (skip connections) are used around each sub-layer in the encoder and decoder. Layer normalization is also applied to <strong>stabilize</strong> the training process.</p>
</li>
<li><p>Position-wise Feed-Forward Networks:<br>After the self-attention layers, each position’s representation is passed through a position-wise feed-forward neural network, which <strong>adds non-linearity to the model</strong>.</p>
</li>
<li><p>Scaled Dot-Product Attention:<br>The self-attention mechanism involves computing the <strong>dot product of query, key, and value vectors</strong>. To control the scale of the dot products and <strong>avoid large gradients</strong>, the dot products are divided by the <strong>square root of the dimension of the key vectors</strong>.</p>
</li>
<li><p>Masked Self-Attention in Decoding:<br>During decoding, the self-attention mechanism is modified to <strong>ensure that each position can only attend to previous positions</strong>. This masking <strong>prevents the model from “cheating” by looking ahead in the output sequence</strong>.</p>
</li>
<li><p>Transformer Variants:<br> The original Transformer model has inspired various extensions and improvements, such as BERT, GPT, and more. BERT focuses on pretraining language representations, while GPT is designed for autoregressive text generation.</p>
</li>
</ol>
<p>The Transformer architecture has become the foundation for many state-of-the-art NLP models due to its ability to <strong>capture context, parallelize computations, and handle long-range dependencies effectively</strong>. It has led to significant advancements in machine translation, text generation, sentiment analysis, and various other NLP tasks.</p>
<h1 id="Reference"><a href="#Reference" class="headerlink" title="Reference"></a>Reference</h1><ul>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vd2F0Y2g/dj1lTWx4NWZGTm9ZYw==">3Blue1Brown<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9pdGhlbHAuaXRob21lLmNvbS50dy9hcnRpY2xlcy8xMDI4MDM5Mg==">iThome - Day 27 Transformer (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9pdGhlbHAuaXRob21lLmNvbS50dy9hcnRpY2xlcy8xMDI4MTI0Mg==">iThome - Day 28 Self-Attention (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9oYWNrbWQuaW8vQGFibGl1L0JrWG16REJtcg==">Transformer 李宏毅深度學習 (Recommend)<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9zcGVlY2guZWUubnR1LmVkdS50dy9+aHlsZWUvbWwvbWwyMDIxLWNvdXJzZS1kYXRhL3NlcTJzZXFfdjkucGRm">Transformer 李宏毅老師簡報<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly93d3cueW91dHViZS5jb20vY2hhbm5lbC9VQzJnZ2p0dXVXdnhySEhIaWFESDFkbFE=">李宏毅老師YouTube channel<i class="fa fa-external-link-alt"></i></span></li>
<li><span class="exturl" data-url="aHR0cHM6Ly9hcnhpdi5vcmcvYWJzLzE3MDYuMDM3NjI=">Attention is all you need (paper)<i class="fa fa-external-link-alt"></i></span></li>
</ul>
]]></content>
<tags>
<tag>ML</tag>
<tag>AI</tag>
<tag>NLP</tag>
</tags>
</entry>
<entry>
<title>AWS Global Infrastructure</title>
<url>/posts/3777980008/</url>
<content><![CDATA[<h1 id="Exploring-the-AWS-Global-Infrastructure"><a href="#Exploring-the-AWS-Global-Infrastructure" class="headerlink" title="Exploring the AWS Global Infrastructure"></a>Exploring the AWS Global Infrastructure</h1><h2 id="Introduction"><a href="#Introduction" class="headerlink" title="Introduction"></a>Introduction</h2><p>In today’s digital landscape, ensuring the fault tolerance, stability, and high availability of applications is paramount. Amazon Web Services (AWS) provides a robust global infrastructure designed to meet these needs. This article explores the key components of the AWS global infrastructure, including Regions, Availability Zones, Local Zones, and Points of Presence.</p>
<span id="more"></span>
<h2 id="AWS-Regions-Geographical-Isolation-for-Fault-Tolerance"><a href="#AWS-Regions-Geographical-Isolation-for-Fault-Tolerance" class="headerlink" title="AWS Regions: Geographical Isolation for Fault Tolerance"></a>AWS Regions: Geographical Isolation for Fault Tolerance</h2><h3 id="Isolation-and-Data-Residency"><a href="#Isolation-and-Data-Residency" class="headerlink" title="Isolation and Data Residency"></a>Isolation and Data Residency</h3><p>AWS Regions are isolated geographic areas that enhance fault tolerance and stability. Each Region operates independently, and resources are not automatically replicated across Regions. This design ensures that data stored in one Region stays within that Region unless explicitly replicated, aiding compliance with regulatory requirements and optimizing network latency.</p>
<h3 id="Service-Availability"><a href="#Service-Availability" class="headerlink" title="Service Availability"></a>Service Availability</h3><p>Not all AWS services are available in every Region. To check which services are offered in a specific Region, you can refer to the <span class="exturl" data-url="aHR0cHM6Ly9hd3MuYW1hem9uLmNvbS9hYm91dC1hd3MvZ2xvYmFsLWluZnJhc3RydWN0dXJlL3JlZ2lvbmFsLXByb2R1Y3Qtc2VydmljZXMv">AWS Region Table<i class="fa fa-external-link-alt"></i></span>.</p>
<h2 id="Availability-Zones-Building-Blocks-of-Resilient-Applications"><a href="#Availability-Zones-Building-Blocks-of-Resilient-Applications" class="headerlink" title="Availability Zones: Building Blocks of Resilient Applications"></a>Availability Zones: Building Blocks of Resilient Applications</h2><h3 id="Structure-and-Fault-Isolation"><a href="#Structure-and-Fault-Isolation" class="headerlink" title="Structure and Fault Isolation"></a>Structure and Fault Isolation</h3><p>Each AWS Region consists of multiple Availability Zones (AZs). An AZ includes one or more data centers designed to be independent failure zones. These zones are physically separated, reducing the risk of simultaneous failure due to localized events.</p>
<h3 id="Power-and-Connectivity"><a href="#Power-and-Connectivity" class="headerlink" title="Power and Connectivity"></a>Power and Connectivity</h3><p>Availability Zones have their own power supplies and networking connections, further enhancing fault isolation. AWS recommends distributing applications across multiple AZs to achieve high availability and resilience.</p>
<h2 id="Local-Zones-Reducing-Latency-for-End-Users"><a href="#Local-Zones-Reducing-Latency-for-End-Users" class="headerlink" title="Local Zones: Reducing Latency for End-Users"></a>Local Zones: Reducing Latency for End-Users</h2><h3 id="Purpose-and-Use-Cases"><a href="#Purpose-and-Use-Cases" class="headerlink" title="Purpose and Use Cases"></a>Purpose and Use Cases</h3><p>AWS Local Zones extend AWS Regions by bringing services closer to large population centers. This reduces latency for end-users, making them ideal for applications requiring real-time processing, such as media content creation and gaming.</p>
<h3 id="Supported-Services-and-Connectivity"><a href="#Supported-Services-and-Connectivity" class="headerlink" title="Supported Services and Connectivity"></a>Supported Services and Connectivity</h3><p>Local Zones support a variety of AWS services, including Amazon EC2, Amazon VPC, and Amazon EBS. They provide a high-bandwidth, secure connection to other AWS services in the Region, ensuring seamless integration and performance.</p>
<h2 id="Data-Centers-The-Backbone-of-AWS-Infrastructure"><a href="#Data-Centers-The-Backbone-of-AWS-Infrastructure" class="headerlink" title="Data Centers: The Backbone of AWS Infrastructure"></a>Data Centers: The Backbone of AWS Infrastructure</h2><h3 id="High-Availability-and-Redundancy"><a href="#High-Availability-and-Redundancy" class="headerlink" title="High Availability and Redundancy"></a>High Availability and Redundancy</h3><p>AWS data centers are the physical locations where data resides and processing occurs. Designed with high availability in mind, they use custom network equipment and protocols. Core applications are deployed in an N+1 configuration, ensuring load balancing and failover capabilities.</p>
<h2 id="Points-of-Presence-Enhancing-Content-Delivery"><a href="#Points-of-Presence-Enhancing-Content-Delivery" class="headerlink" title="Points of Presence: Enhancing Content Delivery"></a>Points of Presence: Enhancing Content Delivery</h2><h3 id="Content-Delivery-Network-CDN"><a href="#Content-Delivery-Network-CDN" class="headerlink" title="Content Delivery Network (CDN)"></a>Content Delivery Network (CDN)</h3><p>AWS uses Points of Presence (PoPs) to deliver content with low latency through services like Amazon CloudFront and Amazon Route 53. These PoPs include Edge Locations and Regional Edge Caches, which cache content closer to users, improving performance and reducing the load on origin servers.</p>
<h2 id="Key-Takeaways"><a href="#Key-Takeaways" class="headerlink" title="Key Takeaways"></a>Key Takeaways</h2><ul>
<li><strong>Regions</strong>: Choose based on compliance and latency requirements.</li>
<li><strong>Availability Zones</strong>: Utilize multiple AZs for fault isolation and redundancy.</li>
<li><strong>Local Zones</strong>: Reduce latency for latency-sensitive applications.</li>
<li><strong>Points of Presence</strong>: Enhance content delivery and performance.</li>
</ul>
<p>By leveraging the AWS global infrastructure, you can build robust, high-performing, and resilient applications that meet your business needs.</p>
<h2 id="Conclusion"><a href="#Conclusion" class="headerlink" title="Conclusion"></a>Conclusion</h2><p>Understanding the components of the AWS global infrastructure is crucial for designing effective cloud solutions. By strategically utilizing Regions, Availability Zones, Local Zones, and Points of Presence, you can optimize your applications for performance, reliability, and compliance.</p>
<p>For more information on AWS global infrastructure, visit the <span class="exturl" data-url="aHR0cHM6Ly9hd3MuYW1hem9uLmNvbS9hYm91dC1hd3MvZ2xvYmFsLWluZnJhc3RydWN0dXJlLw==">AWS Global Infrastructure page<i class="fa fa-external-link-alt"></i></span>.</p>
]]></content>
<tags>
<tag>cloud</tag>
<tag>aws</tag>
</tags>
</entry>
<entry>
<title>Understanding CIDR - A Guide to Classless Inter-Domain Routing</title>
<url>/posts/1789986274/</url>
<content><![CDATA[<h1 id="Foreword"><a href="#Foreword" class="headerlink" title="Foreword"></a>Foreword</h1><p>Classless Inter-Domain Routing (CIDR) revolutionized IP address allocation and routing on the internet. By moving away from the rigid class-based system (Classes A, B, and C), CIDR introduced a more flexible and efficient method for managing IP spaces. This article delves into what CIDR is, how it works, and how to compute CIDR blocks for network planning.</p>
<span id="more"></span>
<h1 id="What-is-CIDR"><a href="#What-is-CIDR" class="headerlink" title="What is CIDR?"></a>What is CIDR?</h1><p>CIDR stands for <strong>Classless Inter-Domain Routing</strong>. It’s a method used for <strong>allocating IP addresses and routing Internet Protocol packets</strong>. CIDR allows for variable-length subnet masking which enables a more efficient allocation of IP addresses. It’s designed to replace the older system based on classes (A, B, C) to improve address space allocation and enhance routing scalability on the internet.</p>
<h1 id="Key-Concepts-of-CIDR"><a href="#Key-Concepts-of-CIDR" class="headerlink" title="Key Concepts of CIDR"></a>Key Concepts of CIDR</h1><ul>
<li>IP Address: A unique numerical label assigned to devices connected to a network that uses the Internet Protocol for communication. (e.g.192.168.1.0)</li>
<li>Subnet Mask: Defines a <strong>range</strong> of IP addresses considered to be in <strong>the same network segment</strong>. (e.g. mask 255.255.255.0 is represented as /24 in CIDR)</li>
<li>CIDR Notation: A compact representation of an IP address and its associated routing prefix in a format like 192.168.1.0/24.</li>
</ul>
<h1 id="How-CIDR-Works"><a href="#How-CIDR-Works" class="headerlink" title="How CIDR Works"></a>How CIDR Works</h1><p>CIDR introduces flexibility in the allocation of IP addresses by <strong>varying the length of the subnet portion of the address</strong>. Unlike the fixed subnet masks of the class-based system, CIDR notation allows the network boundary to be set anywhere, enabling both smaller and larger blocks of addresses to be allocated as needed.</p>
<h1 id="Computing-CIDR"><a href="#Computing-CIDR" class="headerlink" title="Computing CIDR"></a>Computing CIDR</h1><p>To compute a CIDR block, you need the starting IP address and the size of the network (i.e., how many addresses you need).</p>
<h2 id="Example"><a href="#Example" class="headerlink" title="Example"></a>Example</h2><p>Suppose you have an IP address of 192.168.1.0 and need to support 254 devices. You would start with the base IP address 192.168.1.0 and then use a <strong>subnet mask that supports 254 devices</strong>. The CIDR block 192.168.1.0/24 uses a subnet mask of <strong>255.255.255.0</strong>, allowing for 256 addresses total (the last part bits), which after accounting for the <strong>network and broadcast addresses</strong>, leaves 254 usable addresses for devices.</p>
<h2 id="Calculating-Subnets-and-Hosts"><a href="#Calculating-Subnets-and-Hosts" class="headerlink" title="Calculating Subnets and Hosts"></a>Calculating Subnets and Hosts</h2><ul>
<li>Subnets: The number of available subnets can be calculated based on the number of bits borrowed for subnetting, with more bits allowing for more subnets.</li>
<li>Hosts: The formula</li>
</ul>
<p>$$ 2^{(32−\text{subnet mask length})}−2 $$</p>
<p>calculates the number of usable host addresses in a subnet, subtracting 2 for the network and broadcast addresses.</p>
<h1 id="Extra"><a href="#Extra" class="headerlink" title="Extra"></a>Extra</h1><h2 id="Network-Address"><a href="#Network-Address" class="headerlink" title="Network Address"></a>Network Address</h2><blockquote>
<p>The network address represents the start of an IP address range assigned to a network. It is used to identify the network itself.</p>
</blockquote>
<p>The network address is calculated by applying the subnet mask to any IP address within the network, resulting in the lowest possible address in the range. In binary terms, the network address is formed by performing a bitwise AND operation between any IP address in the network and the subnet mask. This address is not assignable to any individual device within the network because it is used to identify the network as a whole.</p>
<h2 id="Broadcast-Address"><a href="#Broadcast-Address" class="headerlink" title="Broadcast Address"></a>Broadcast Address</h2><blockquote>
<p>The broadcast address is the last address in a network range and is used to send data to all devices within that network. </p>
</blockquote>
<p>When a packet is sent to the broadcast address, it is delivered to all hosts in the network rather than a single recipient. The broadcast address is determined by inverting the subnet mask (turning all subnet mask 0 bits into 1s) and performing a bitwise OR operation with the network address. Like the network address, the broadcast address is not assignable to any device, as its purpose is to facilitate the broadcasting of messages to all devices on the network.</p>
<h2 id="Subnet-Mask"><a href="#Subnet-Mask" class="headerlink" title="Subnet Mask"></a>Subnet Mask</h2><p>A subnet mask is a 32-bit number that masks an IP address and divides the IP address into network address and host address. Subnet masks are made up of two parts:</p>
<ol>
<li>The network part, which identifies a particular network and is represented by the binary 1s in the mask.</li>
<li>The host part, which identifies a specific device (host) on that network and is represented by the binary 0s in the mask.</li>
</ol>
<p>For example, in the subnet mask 255.255.255.0 or in CIDR notation /24, the first 24 bits are the network part (all 1s in binary), and the last 8 bits are the host part (all 0s in binary). This means any IP address with the same first 24 bits belongs to the same network, and the last 8 bits can vary to represent different devices within that network.</p>
<h3 id="Detailed-Example"><a href="#Detailed-Example" class="headerlink" title="Detailed Example"></a>Detailed Example</h3><p>Let’s consider the network 192.168.1.0/24:</p>
<ul>
<li>IP Address Range: 192.168.1.0 to 192.168.1.255</li>
<li>Subnet Mask: 255.255.255.0 or /24 in CIDR notation</li>
<li>Network Address: 192.168.1.0 (the first address in the range, represents the network itself)</li>