動態類神經網路~電機大師黃聰亮教授演講投影片

資訊科技新知︰
動態類神經系統
黃聰亮教授
國立臺北教育大學資訊科學系暨研究所系主任暨所長
美國財務金融管理學院 (AAFM) 院士
中華民國國際學術交流學會理事長

國立臺灣大學電機系電機博士
國立臺灣大學電機系電機碩士
國立臺灣大學電機系電機學士
NEUST STATE UNIVERSITY 企管博士
NEUST STATE UNIVERSITY 教育博士
國立摩納哥皇家大學 (I.U.M.) 財管碩士
September 2008

報告內容
 生物神經網路
 類神經網路
 動態類神經網路

1.1 簡介
• 電腦的有效性仍然令我們失望，譬如說在影像
辨識、語音辯認、以及決策處理等方面的問題
上。
• 在於數位電腦基本架構的限制，因其本質上就
只能根據使用者撰寫的電腦程式來執行運算。
• 期望能夠設計出一部能像人類大腦一樣，能夠
學習及具有智慧的機器，如此一來，許多複雜
難解、或有生命危險等高難度的工作，便可以
交由此等智慧型的機器來完成。

1.2 生物神經網路 (1)
• 如何藉助生物神經系統處理資訊的模式及
架構，來設計出有智慧的機器是一大挑戰。

外界刺激感覺接受器中樞神經系統受動器反應

EX 燙的物體觸覺腦和脊髓手
：放開

圖 1.1 ：人類神經系統示意圖。

• 人類對大腦的正確認知，是藉由逐漸地修正錯
誤知識而形成的。
• 18 世紀初，弗盧杭從移除各種動物的大腦的不
同區域的實驗中，觀察有哪些功能仍能遺留下
來？最後他認定腦部各區不可能具有不同的功
能。
• 但有些人卻認為腦部可區分成明確的區域，此
派學說以高爾醫生最為著名。發展成『腦相
術』。

腦部相關資訊 (1)
• 18 世紀初，弗盧杭從移除腦的不同區域實驗中
，得知腦部是整體而不具有特定區域功能。
• 尺寸的不同：腦部越大，越聰明？
大象是人的五倍。
• 腦部與體重之比例：比例越大，越聰明？
• 大象是 0.2% ，人是 2.33% ，地鼠是
3.33% 。
• 大腦皮質皺摺複雜度及面積：越大，越聰明
？
• 地鼠是郵票大小、黑猩猩是 A4 大小、人是 4
x A4 大小 ( 厚 ) 、海豚比人還大 ( 薄 ) 。

• 經過數百年演化，大腦逐漸由下層組織發展出
高階之上層組織，如圖 1.1 所示。
• 人類胚胎的大腦大抵遵循此發展。

人類的大腦具有
(1) 腦幹 (brainstem) ：又稱爬蟲類大腦。
負責呼吸等基本生命功，並控制生存所必須之
反應及運動。純粹機械性、無意識的過程；
(2) 邊緣系統 (limbic system) ：又稱哺
乳類大腦。情緒中樞，進化過程中又逐漸多了
學習及記憶功能；
(3) 大腦新皮層 (neocortex) ：大腦的最
外層的皺摺組織，思考重鎮，有了它才使得人
與其他生物之差別 [2] 。


(a) 人類大腦組織示意圖 ( 本圖摘自 [9])

• 腦電波 (EEG ， Electroencephalography) 、
• 核磁共振 (MRI ， Magnetic Resonance
Imaging) 、
• 功能性核磁共振 (fMRI) 、
• 正子斷層掃描 (PET ， Position Emission
Topography) 、
• 近紅外光光譜儀 (NIRS ， Near-infra-red
Spectroscopy)
• 腦磁波 (MEG ， Magneto encephalography) 等
• 近數十年來，認知科學的興起，讓人類對大腦的全盤認
識，越來越有指日可待的期盼。

摘自：大腦的秘密檔案洪蘭譯

生物類別 (1)
• 達爾文生物

生物類別 (2)
• 史金納生物

生物類別 (3)
• 巴柏生物

生物類別 (4)
• 格雷利高生物

能力來源
• 遺傳及後天環境刺激

• 基本上，有兩種不同的途逕來嘗試研究大腦的功能。第
一種屬於由下而上的方式，通常生物神經學家
(neurobiologists) 採用此種方式，藉由對單一神經細胞
的刺激與反應 (stimulus-response) 特徵的瞭解，進而對
由神經細胞聯結而成的網路能有所認識；而心理學家
(psychologists) 採取的是由上而下的途逕，他們從知覺
(cognition) 與行為反應來瞭解大腦。
• 目前我們對大腦運作模式的瞭解仍然十分有限。
• 人類的神經系統可視為三個子系統所互相協調而成的複
合系統，如圖 1.1 所示。

圖 1.1 ：人類神經系統示意圖。

1.3 生物神經細胞 (1)

• 1872 年發生了神經科學史上的重大突破，義
大利的年輕醫學院畢業生高基用肉眼看到腦部
的最基本的構成單元─神經細胞的重大發現。
• 人類的大腦是由大約 1011 個神經細胞 (nerve
cells) 所構成，每個神經細胞又經由約 104 個
突觸 (synapses) 與其它神經細胞互相聯結成一
個複雜，但具有平行處理能力的資訊處理系統。

一個典型的神經元可分為 (1) 細胞本體
(soma) 、 (2) 軸突 (axon) 、以及 (3) 樹突
(dendrites) 等三部份。

圖 1.2 ：生物神經細胞示意圖。

1.3 生物神經細胞 (2)

• 樹突的主要功能就是接受其它神經元所傳遞而
來的信號。
• 若導致位於軸突丘的細胞膜電位超過某一特
定閥值 (threshold) 時，則所謂的「活化電位
」 (action potential) 的脈衝就會被激發出來。
• 藉由管狀似的軸突傳遞至其它相連接的神經元
。
• 軸突的終點處是「突觸」，這種細胞間的信號
傳遞以化學性的方式居多。

圖 1.3 ：神經信號之傳遞。

1.3.1 生物電位與活化電位 (1)
• 細胞內外充滿了含有陰離子 ( 如：氯離子， Cl-
) 及陽離子 ( 如：鈉離子、鉀離子、鈣離子，
Na+, K+, 及 Ca+2 等 ) 的電解液。
• 細胞的外圍是一層半滲透性的組織，此組織稱
為細胞膜。
• 這些離子在 (1) 滲透壓和 (2) 電場效應的影
響下，最後會達到一種平衡狀態，使得鉀離子
大部份位在細胞體內，而鈉離子大部份位在細
胞體外，這時細胞便呈現約 -85mv 的「休止
電位」 (resting potential) 。

圖 1.4 ：理想化之細胞模型

• 我們可以用 Goldman ( 或 GHK) 式子來計
算細胞膜電位 [1] ：
 PNa   PCl 
K o + Na o   + Cli  
 PK   PK 
Vm = 58 log
 PNa   PCl 
Ki + Nai   + Clo  
 PK   PK 
其中 P 、 P 和 P 代表鈉、鉀、和氯離子
Na K Cl

穿透細胞膜的係數，下標 i 和 o 分別代表細
胞內與外， K 、 Na 、和 Cl 代表離子濃度。

• 大多數的哺乳類動物的神經細胞，其休止電位差大
約都接近 -70mv 。
• 當神經細胞被刺激時，在樹突部位的細胞膜的特性
會被改變，使得鈉離子可以進入細胞內，導致細胞
膜電位的改變，這種電位稱為「後突觸細胞膜電
位 (post-synaptic potential) 」，其振幅
與刺激強度成正比，並且會隨著傳遞距離的增加而
衰減。
• 後突觸細胞膜電位的產生，會依據神經傳導物質的
種類不同而有不同的效果，也就是說，可以分為兩
種刺激 : (1) 激發型 ─細胞膜的電位往增加的方向
改變，以及 (2) 抑制型 ─細胞膜的電位往更負的方
向改變。

• 所有位於樹突上，因刺激而引起的電位變化，
都會朝向位於細胞本體上之軸突丘方向傳遞。
• 此時，如圖 1.6 所示的「時間性相加
(temporal summation) 」─ 將所有在
不同時間到達的刺激相加起來和「空間性相加
(spatial summation) 」─ 將刺激型和抑
制型的刺激相加起來。
• 若這些信號的綜合效果，導致軸突丘的細胞膜
電位的增加，而且超過某一特定的閥值 ( 如
-55mv) 時，則「活化電位」會被激發。


圖 1.6 ：發生於神經元突觸之 [ 時間性相加 ] 示意圖

• 在軸突丘的細胞膜上的鈉離子通道會被開啟，導致大量的鈉離子
進入細胞膜內，進而激發「活化電位」的產生，如圖 1.7(a) 所
示。
• 由於鈉離子的大量進入，使得細胞膜電位呈現正值，這種現象被
稱作去極化 (depolarization) ；
• 一但電位呈現約 20mv 的正值時，鉀離子會離開細胞，導致細胞
膜電位呈現比休止電位還要負的現象，這種現象被稱作過極化
(hyperpolarization) 。
• 「活化電位」以一種振幅大小不變的方式，沿著軸突方向傳遞。
刺激強度的資訊與活化電位的發生頻率有關，而與其振幅的大小
無關，其傳遞的速度與 (1) 軸突的直徑大小以及 (2) 軸突上之細
胞膜的電容及電阻性有關。

圖 1.7 ： (a) ：活化電位示意圖

• 活化電位的產生及傳遞在在都需要時間，倘若
信號的傳遞方式就只有此種形式，那麼大型動
物豈非會因為體型大而註定行動遲緩？
• 自然的演化導致所謂的「髓鞘環繞的軸突
(myelinated axon) 」的產生，在整條被許
旺細胞包裹住的軸突上，會有一些稱為「郎威
埃氏結」 (Raviner nodes) 的小間隙沒有被
包裹住，因而活化電位可以在此產生，所以對
於髓鞘環繞的軸突來說，活化電位是以「跳躍
式的傳導 (salutatory conduction) 」的方
式進行，以加速傳遞的速度。

圖 1.7 ： (b) ：發生於髓鞘環繞的軸突的跳躍式傳遞。

• 我們可以用如圖 1.8 所示的電路來模擬
位於軸突的細胞膜。
• 其中 C 代表細胞膜的電容性， g 、 g
m k Cl

和 gNa 代表離子進出入細胞膜的難易度
，
• g 與 gNa 是用可變電阻來說明細胞膜的
k

特性，亦即對鉀與鈉離子來說，進出細胞
膜的難易度是會改變的

圖 1.8 ：軸突的細胞膜電位之等效電路。

• Vm 代表軸突的細胞膜電位差，而 EK 、 ECl 、
和 ENa 代表由 Nernst 方程式所推導出來的
細胞膜電位 [1] ，所謂的 Nernst 方程式定
義如下： RT Ion
Ek = ( ) ln( o
)
zF Ioni

其中， R 是氣體常數， T 為絕對溫度， z
為價電子數， F 是法拉第常數， Iono 是細胞
膜外的離子濃度， Ioni 是細胞膜內的離子濃度。

1.3.2 神經元之連接模式 (1)
• 發散 (divergent) 型：「傳入型神經元」 (afferent
neurons) 採取此種發散型模式，以便將所獲得之資訊
，以平行之方式快速地傳達至大腦。

圖 1.7 ：神經元的連接模式：發散型。

• 收斂 (convergent) 型：大致上，所謂的「輸出型神經
元」 (efferent neurons) 與神經末稍之間的連接方式是
屬於此種模式。

圖 1.8 ：神經元的連接模式：收斂型。

• 鏈接及迴路 (chains and loops) 型：大腦裏的神經元
為了處理傳送而來的複雜資訊，發展出這種複雜的連
接模式，其中有正迴授與負迴授等情形發生。

圖 1.9 ：神經元的連接模式：鏈接及迴路型。

範例 1.1 ：神經元的連接方式
• (1) 當四個突觸同時被激發時，四個突觸所產生的電位總合並未超過激發
此神經元的閥值，因此神經元不被激發，如圖 1.10(b) 所示；
• (2) 當四個突觸被激發的順序為： D ， C ， B ，與 A 時，四個突觸所產
生的電位總合超過激發此神經元的閥值，因此神經元處於激發狀態，如
圖 1.10(c) 所示。

圖 1.10 ：本圖摘自： M. Arbib, The Metaphorical Brain : Neural Networks and
Beyond, John Wiley & Sons, Inc., 1989.)

1.4 類神經元的模型 (1)
• 個別的神經元透過是否激發出活化電位的機制，使其
本身就具備處理部份資訊的能力。
• 至於我們要如何向生物神經網路借鏡呢？當然，第一
步是設法模仿單一神經元的運作模式。

圖 1.11 ：類神經元之數學模型。

• 鍵結值 (synaptic weights) ：突觸的效果實際上
可分為兩種：
(1) 刺激性的突觸：此種突觸會使得被連接的神
經元容易被激化，因而導致活化電位的產生。
(2) 抑制性的突觸：此種突觸會使得被連接的神
經元的細胞膜電位值，變得更負 ( 即遠離閥
值 ) ，因而導致此神經元不容易產生活化電位。

• 正值的鍵結值代表是刺激性的突觸，而抑制
性的突觸則由負值的鍵結值所代表，另外，突
觸影響性的大或小，則與鍵結值的絕對值成正
比。

• 加法單元：產生於樹突頂端的「層次電位」，會從四面八方朝向
軸突丘傳遞，此時軸突丘會執行「空間及時域」 (spatio-temporal)
的整合處理，這是個十分複雜的過程，在簡單的類神經元的模型
中，我們通常以一個加法單元來簡化此過程；而複雜一點的，可
以用一個「有限脈衝響應濾波器」 (finite impulse response filter)
來近似此過程。

圖 1.5: 時空性相加

• 活化函數 (activation function) ：在軸突丘部
位所呈現的整體細胞膜電位，若超過閥值，則
「活化電位」脈衝會被激發，整個傳遞而來的
資訊在這裏被調變 (modulation) 處理，軸突丘
將資訊編碼於 (1) 活化電位的是否產生及 (2)
活化電位脈衝的產生頻率中。因此，我們將經
過權重相加的輸入，透過活化函數的轉換，使
得類神經元的輸出代表短期間之平均脈衝頻率。

我們可以用以下的數學式子來描述類神經元的輸入輸出關係：
p
u j = ∑ w ji xi (1.3)
i =1

y j = ϕ (u j − θ j ) (1.4)

其中
w ji 代表第 i 維輸入至第 j 個類神經元的鍵結值；
θ j 代表這個類神經元的閥值；
x = ( x1 ,, x p )T 代表 p 維的輸入；
u j 代表第 j 個類神經元所獲得的整體輸入量，
其物理意義是代表位於軸突丘的細胞膜電位；
ϕ ( ⋅) 代表活化函數；
y j 則代表了類神經元的輸出值，也就是脈衝頻率。

如果我們用 w j 0 代表 θ j，則上述式子可改寫為：

p
(
v j = ∑ w ji xi = wT x
i =0
j ) (1.5)
及
y j = ϕ (v j ) (1.6)

w j = [ w j 0 , w j1 ,, w jp ]T
其中和
。 x = [ −1, x1 , x2 ,, x p ]T

所用的活化函數型式，常見的有以下四種型式：
• 嚴格限制函數 (hard limiter or threshold function) ：

1 if v ≥ 0
ϕ (v ) = 
0 if v < 0

圖 1.12 ：嚴格限制函數。
• 區域線性函數 (piecewise linear function) ：

1 if v > v1

ϕ (v) = cv if v 2 ≤ v ≤ v1
0 if v < v 2

圖 1.13 ：區域線性函數。

• s- 字型函數 (sigmoid function) ：
1
ϕ (v ) =
1 + exp(−cv)

ϕ (v) = tanh(cv)
圖 1.14 ： s- 字型函數。
• 高斯函數 (Gaussian function) ：

 v2 
ϕ (v) = exp −
 2σ 2 

 

圖 1.15 ：高斯函數。

1.5 網路架構 (1)
• 單層前饋網路 (single-layer feedforward
networks) ：如圖 1.16 所示，整個網路由一層具有處
理資訊能力的類神經元所組成，通常此種網路的功能
性較差，只能處理線性的問題。

圖 1.16 ：單層前饋網路。

1.5 網路架構 (2)
• 多層前饋網路 (multi-layer feedforward networks) ：
根據鍵結的聯接方式，此種網路又可細分為 (1) 部份
連結 (partially connected) 網路，如圖 1.17(a) 或 (2)
完全連結 (fully connected) 網路，如圖 1.17(b) ，此
種網路可處理複雜性高的問題。

圖 1.17 ：多層前饋網路： (a) 部份連結； (b) 完全連結。

1.5 網路架構 (3)
• 循環式網路 (recurrent networks) ：此網路的輸出會
透過另一組鍵結值，聯結於網路的某處 ( 如輸入層或
隱藏層 ) 而迴授至網路本身。

圖 1.18 ：循環式網路。

1.5 網路架構 (4)
• 晶格狀網路 (lattice networks) ：基本上，此種網路屬
於前饋型網路，只不過其輸出層的類神經元是以矩陣方
式所排列。

圖 1.19 ：二維的 3×3 之晶格狀網路。

1.6 學習與記憶 (1)
• 「學習」 (learning) 是自然生物或人造系統之所以有智慧的一個
極為重要的特徵。
• 學習與記憶是密不可分的，因為有學習行為的發生才導致記憶的
形成；能夠記憶才產生學習的效果。人類的記憶有以下的一些特
點：
• 人類的記憶屬於分散式 (distributed) 的儲存，並且是屬於聯想
式的記憶 (associate memory) 。
• 人類易於記憶，但卻難於回想 (recall) 。
• 人類的記憶，根據儲存的期間長短又分為三種： (1) 立即計憶
(immediate memory) ； (2) 短程記憶 (short-term memory) ；
(3) 長程記憶 (long-term memory) 。

1.6 學習與記憶 (2)
• 短程記憶 : 負責的就是將當時的外界狀態，經
過某種處理後暫存起來，隨著時間的增長而逐
漸消逝。根據生物神經學的理論，短程記憶是
動態的 (dynamic) ，而且不斷地在彼此聯結
的神經元間以反覆地產生脈衝的型式表現。
• 長程記憶 : 這種記憶是屬於靜態的
(static) ，並且是以神經細胞間的聯結強度、
聯結方式、及每個神經細胞本身閥值大小不同
的方式來儲存記憶。

1.6 學習與記憶 (3)
• 根據神經解剖學家的研究，發現剛出生的老鼠，平均每
個神經元有大量的突觸與其它神經元有所聯結，而年老
的老鼠卻擁有較少的突觸，因為學習過程會逐漸確定神
經元間的聯結方式。
• 基本上，短程記憶與長程記憶不是彼此毫無干涉地平行
運作，而是互相交替地連續運作。
• 所有的記憶都是以短程記憶開始，然後透過記憶力的集
中和複誦才能形成長程記憶。
• 當我們正記起某件事情時，腦中應該有一些與此記憶相
關的神經元正在活化。
• 許多臨床實驗發現內視丘 (medial thalamus) 負責
記憶類型的最初統合，專司將感覺的輸入訊息傳達至皮
質。。

1.7 類神經網路的學習規則
(1)
學習的策略 (strategies) 可分為以下幾種：
• 機械式的背誦學習 (rote learning) 。
• 指令式的學習 (learning by instruction) 。
• 類推式的學習 (learning by analogy) 。
• 歸納式的學習 (learning by induction) ：此種
學習又可分為以下三種方式：
(1) 從範例中學習 (learning from examples) ：又稱為監
督式 (supervise) 學習。學習者從一組含有正例
(positive examples) 與反例 (negative
examples) 的學習範例中，歸納出一個能夠解釋範
例的整體概念 (concept) 。

(2)
(2) 從觀察及發現中學習 (learning from observation and
discovery) ：又稱為非監督式 (unsupervised) 學習。
缺乏所謂的「加標過的資料」，這種學習法需要學習者
自行去發掘出資料本身的重要特徵或結構。
(3) 增強式學習 (reinforcement learning) ：
比非監督式學習法又多了一點資訊。學習者在學習的過
程中會和環境有一連串的互動，自行採掘適當的措施
來因應刺激，接著會有所謂的「評論家 (critic) 」來
評斷剛才學習者自行因應的措施是否恰當？此評論就是
所謂的增強式信號 (reinforcement
signal) 。大體說來，增強式學習法在學習的過程中
會借助評論家所提供的增強式信號，來調整學習者因應
刺激的措施，以便效能指標 (index of
performance) 達到極大化。

(3)
我們以數學式來描述通用型的學習規則
w ji ( n + 1) = w ji ( n ) + ∆w ji ( n )
其中
w ji (n ) 及 w ji ( n + 1) 分別代表原先的及調整後的鍵結
值；
∆w ji (n )
代表此類神經元受到刺激後，為了達成學習
效果，所必須採取的改變量。
∆w ji (n ) 此改變量，通常是 (1) 當時的輸入xi (n ) 、
(2) 原先的鍵結值w ji (n ) 、及 (3) 期望的輸出值
(desired output) di ( 若屬於非監督式學習，則無此項 )
的某種函數關係。

1.7.1 Hebbian 學習規則
• 神經心理學家 (neuropsychologist) Hebb 在他的
一本書中寫著 [7]
當神經元 A 的軸突與神經元 B 之距離，近到足以激發它的地
步時，若神經元 A 重複地或持續地扮演激發神經元 B 的角色，
則某種增長現象或新陳代謝的改變，會發生在其中之一或兩個神
經元的細胞上，以至於神經元 A 能否激發神經元 B 的有效性會
被提高。
• 因此我們得到以下的學習規則：
w ji ( n + 1) = w ji ( n ) + F ( y j ( n ), xi ( n ) ) (1.14)

這種 Hebbian 學習規則屬於前饋 (feedforward) 式的非監督
學習規則。以下是最常使用的型式：
w ji ( n + 1) = w ji ( n ) + ηy j ( n ) xi ( n ) (1.15)

1.7.2 錯誤更正法則 (1)
• 錯誤更正法則的基本概念是，若類神經元的真實輸出值
y j (n ) 與期望的目標值 d j (n ) 不同時，則兩者之差，定
義為誤差信號：
e j (n ) = d j (n) − y j (n)
• 我們可以選擇一特定的「代價函數」 (cost function) 來
反應出誤差信號的物理量；
• 錯誤更正法則的終極目標，就是調整鍵結值使得代價函
數值越來越小，亦即使類神經元的真實輸出值，越接近
目標值越好，一般都採用梯度坡降法 (gradient decent
method) 來搜尋一組鍵結值，使得代價函數達到最小。

1.7.2 錯誤更正法則 (2)
一、 Windrow-Hoff 學習法
代價函數定義為：

E = ∑ e j (n ) = ∑ ( d j (n ) − v j (n ) ) 2
1
j 2 j
(1.18)
1
( T
= ∑ d j (n) − w j (n) x(n)
2 j
2
)
因此根據梯度坡降法可得：
∂E
∆ w j ( n ) = −η
∂ w j (n )
( )
= η d j ( n ) − wT ( n ) x ( n ) x ( n )
j (1.19)
= η ( d j (n ) − v j (n ) ) x(n )

此學習規則，有時候亦被稱為最小均方演算法 (least square error
algorithm) 。

1.7.2 錯誤更正法則 (3)
二、 Delta 學習法

使用此種學習法的類神經網路，其活化函數都是採用連續且可微分
的函數型式，而代價函數則定義為：

E = ∑ e j (n ) = ∑ ( d j (n ) − y j (n ) ) 2
1
(1.20)
j 2 j

因此根據梯度坡降法可得：

∂E
∆ w j ( n ) = −η (1.21)
∂ w j (n )
= η ( d j (n) − o j (n) )ϕ ' ( v j ( n) ) x(n)

實際上，若 ϕ ( v j (n) ) = v j (n) 時，則 Widrow-Hoff 學習可視為
Delta 學習法的一項特例。

1.7.3 競爭式學習法
• 競爭式學習法有時又稱為「贏者全拿」 (winner-take-
all) 學習法。
步驟一：得勝者之篩選
假設在此網路中有 K 個類神經元，如果
wT ( n ) x ( n ) =
k max wT ( n ) x ( n )
j
(1.22)
j =1, 2,, K

那麼第 k 個類神經元為得勝者。

步驟二：鍵結值之調整
η ( x ( n ) − w j ( n ) ) if j = k
∆ w j (n ) =  (1.23)
 0 if j ≠ k

1.8 結語
• 首先簡單地介紹了生物神經網路，接下來引進
了模仿生物神經元運作模式的類神經元模型，
以及依據此類神經元模型所建立之網路架構。
• 了解生物神經元如何處理資訊之後，在我們
心理會不會有以下這些問題：
(1) 我們有所謂的自由意志嗎？還是只由
一堆神經元所構成的狀態機 (state
machine) 而已？
(2) 意志扮演何種角色以及如何去激發神
經元呢？。

典型腦神經元圖解
神經樹（ dendrites ）輸出入機構

神經突（ synapses ）：連結機構

神經軸（ axon ）：輸送機構

神經核（ soma ）：處理機構

定義
• 類神經網路是一種計算系統，包括軟體
與硬體，它使用大量簡單的相連人工神
經元來模仿生物神經網路的能力，像是
處理模糊的資訊、學習能力、歸納推廣
能力等。人工神經元是腦神經元的簡單
模擬，它從外界環境或其他人工神經元
取得資訊，透過非常簡單的運算，並輸
出其結果到外界環境或者其他人工神經
元。

類神經元模型
輸入訊號
閥值
X1 處理單元淨值
W
1j
轉移函數
X2
: W2
j
Xi
: W ij
θi net f Yj
j
:

Xn W
nj
鏈結加權值輸出訊號

f ( Σ WijXi – θ j ) = Yj

類神經網路模式的基本架構
W11
X1 Y1

X2 Y2
W22
X

‧‧‧‧
‧‧‧‧
‧‧‧‧

‧‧‧‧
Wi1

Xi Wij Yj

單層類神經網路

1 、處理單元 (Processing Element)

其他處理網路集成作用轉換
單元輸出連結函數 I 函數 net 函數 Y
集成函作用函數處理單
數值值元輸出

X W f1 f2 f3

2 、層 (Layer)
• 若干個具相同作用的處理單元集合成「層」
，層本身也有三種作用：
– 正規化輸出：
目的在於將同一層中的處理單元的原始輸出值所組成的向量加
以正規化，成為單位長度向量後，再作為「層」的輸出。
– 競爭化輸出：
目的在於將同一層中的處理單元的原始輸出值所組成的向量中
，選擇一個或若干個最強值的處理單元，令其值為 1 ，其餘為
0 後，再作為「層」的輸出。這些輸出值為一的處理單元稱優
勝單元 (winner) 。
– 競爭化學習：
目的在於將同一層中的處理單元的原始輸出值所組成的向量中
，選擇一個或若干個最強值的處理單元 ( 優勝單元 ) ，網路將
只調整與優勝單元相連的下層網路連結。

3 、網路 (Network)
• 若干個具不同作用的層集合成網路，網路本身也有二
種作用：
– 學習過程 (Learning) ：網路依學習演算法 , 從範例中學習 ,
以調整網路連結加權值的過程。
a. 監督式學習
b. 無監督式學習
c. 聯想式學習
– 回想過程 (Recalling) ：網路依回想演算法 , 以輸入資料決
定網路輸出資料的過程。
a. 監督式回想 b. 無監督式回想
c. 聯想式回想

類神經網路模式的分類

• 依學習策略
• 依網路架構

類神經網路分類
- 依學習策略
• 監督式學習網路 (Supervised learning network)
– 具有自我學習的能力，並經由學習過程調整網路中
的連結的權重，從問題領域中取得訓練範例 ( 包括
輸入變數值及輸出變數值 ) ，並從中學習輸入變數
與輸出變數的內在對映規則，以應用於新的範例
( 只有輸入變數值，而需推論輸出變數值的應用 ) 。
– 感知機網路，倒傳遞網路，機率神經網路，學習向
量量化網路，反傳遞網路
– 應用 : 診斷、決策、預測 ( 函數合成 )

- 依學習策略

• 無監督式學習網路 (Unsupervised learning
network)
– 相對於監督式學習網路而言，必須有明確的輸入與
輸出範例資料訓練網路，然無監督式學習只需要從
問題領域中取得輸入變數值範例資料，並從中學習
範例的內在聚類規則，以應用於新的範例 ( 有輸入
變數值，而需推論它與那些訓練範例屬同一聚類的
應用 ) 。
– 自組織映射圖網路
自適應共振理論網路
– 應用 : 聚類

- 依學習策略
• 聯想式學習網路 (Associate learning network)
– 所謂聯想性記憶 (Associative Memory) 應用問題是在於
如何去設計一個系統來記憶一組記載著輸入外界刺
激和其系統輸出值之間關係的供學習用的範例。也
就是從問題領域中取得訓練範例 ( 狀態變數值 ) ，並
從中學習範例的內在記憶規則，以應用於新的案例
( 只有不完整的狀態變數值，而需推論其完整的狀態
變數值的應用 ) 。
– 霍普菲爾網路
雙向聯想記憶網路
– 應用 : 雜訊過濾、資料擷取

- 依學習策略
• 最適化應用網路 (Optimization application
network)
– 類神經網路除了「學習」應用外，還有一類特殊應
用—最適化應用︰對一問題決定其設計變數值，使其
在滿足設計限制下，使設計目標達最佳狀態的應用。
設計應用與排程應用屬之。此類應用的網路架構大
都與聯想式學習網路的架構相似。
– 霍普菲爾 - 坦克網路
退火神經網路
– 應用 : 設計、排程

- 依網路架構

«e¦V¦¡¬[ºc ¦^õX¦¡¬[ºc ¦^õX¦¡¬[ºc ¦^õX¦¡¬[ºc

- 依網路架構
• 前向式架構 : • 回饋式架構 :
– 感知機網路 – 波茲曼機
– 倒傳遞網路 – 時空樣本識別網路
– 機率神經網路 – 新認知機
– 學習向量量化網路 – 自適應共振理論網
– 反傳遞網路路
– 雙向聯想記憶網路 – 霍普菲爾網路
– 雙向聯想記憶網路

特性
• 嚴謹的數學基礎
類神經網路透過嚴謹的數學運算與微積分的應
用作為調整網路參數的依據。
• 平行處理的本質
類神經網路對於資訊的處理方式是採取大量的
處理單
元 ( 神經元 ) 平行處理。
• 分散式聯想記憶
類神經網路是靠連結兩個不同的處理單元間的
權重值或連結強度來記憶，是分散式的，而且
處理單元間是聯想性的。

特性
• 容忍錯誤的能力
由於是分散式的記憶模式，所以部分連結的
損壞，不足以影響整個網路的功能。
• 自我調適的能力
在不穩定的環境下，類神經網路可調整處理
單元間權重值，以降低輸入與輸出間的差距。

• 非線性
類神經網路是模仿人類神經網路而設計的，
它的基礎是非線性的，所以對非線性的問題
有很強的求解能力。

倒傳遞網路
(Back-propagation Network, BPN)

韋伯斯 (P.Werbos) 於 1974 年在其博士
論文中提出了隱藏層的學習演算法，這
是最早的倒傳遞類神經網路模式，他用
這種模式作經濟預測方面的問題。

網路架構
輸入層隱藏層輸出層
Vii
X1 i i W ik

i Y1

X1 j j

j Y2

X1 k k

k Y3
X1 l l

網路架構
• 輸入層
用以表現網路的輸入變數，其處理單元數依問題而
定。使用線性轉換函數，即 f(x)=x 。
• 隱藏層
用以表現輸入處理單元間的交互影響，其處理單元
數目並無標準方法可以決定，經常須以試驗方式決
定其最佳數目。使用非線性轉換函數。網路可以不
只一層隱藏層，也可以沒有隱藏層。
• 輸出層
用以表現網路的輸出變數，其處理單元數目一問題
而定。使用非線性轉換函數。

網路演算法

• 訓練範例的輸入處理單元的輸入值 {X} ，計
算隱藏層隱藏處理單元的輸出值 {H} 如下：
H k = f (net k ) = f (∑ Wik X i − θ k )

• 隱藏層隱藏處理單元的輸出值 {H} ，計算輸
出層處理單元的推論輸出值 {Y} 如下：

Y j = f (net j ) = f (∑ Wkj H k − θ j )

網路演算法
• 誤差函數 :

E = (1 / 2)∑(T j − Y j ) 2

• 最陡坡降法 (the gradient steepest descent method):

∆W = −η ∂E
∂W

: 學習速率，控制每次權值修改的步幅
η

決定參數

•學習速率
•隱藏層類神經元數目
•隱藏層層數

學習速率（ learning rate ）
• 依據經驗取 0.5 或 0.1─1 之間的值作為學
習速率，大多可得到良好的收斂性。

隱藏層類神經元數目
• 問題難易性 :
簡單問題： ( 平均法 )
隱藏層單元數目 = （輸入層單元數 + 輸出層單
元數 )/2>=4
一般問題： ( 總和法 )
元數 )>=8
困難問題： ( 加倍法 )
元數 )*2>=16

隱藏層類神經元數目
• 問題雜訊高，隱藏層單元數目宜少。
• 問題複雜性高，即非線性、交互作用程
度高，隱藏層單元數目宜多。
• 測試範例誤差遠高於訓練範例誤差，則
發生過度學習，隱藏層單元數目宜減少
；反之，宜增加。

隱藏層層數
• 通常隱藏層之數目為 1 至 2 層時，有最好
的收斂性質。

Architecture of DSNN
Hidden
Neurons

Input
Output
Surface
Surface

Ld

Virtual 3-D Cube Space

Input Hidden Output
Layer Layer Layer

N1
wX1,1 w 2,1 w1,Y1
N2 w1,4
wX1,2 w 2,4
X1 N4 w4,Y1 Y1
N3

X2 Y2
N5
N6
...

...
N8
N7

Xx wXx,1 N9 Yy
w9,n wn,Yy
Nn
w Xx,n

The Wavelet-based Neural Network Classifier
Disturbance
Waveform

Estimate Amplitude &
Subtract Disturbance Waveform by the
Estimated Perfect Waveform

Detection of Amplitude Irregular
Disturbances

Wavelet Transforms

Detection of Impulsive Transient
Disturbances

Dynamic Structural Neural Networks
&
Detection of Harmonic Distortion and
Voltage Flicker

Output Final Result of Detection

The Wavelet

• This work utilizes the hierarchical wavelet
transform technique to extracting the time
and frequency information by the
Daubechies wavelet transform with the 16-
coefficient filter.

The four-scale hierarchical decomposition of G0(n).

The Neural Network Classifier

• The detection and extraction of the features from
the wavelet transform is then fed into the DSNN
for identifying the types of PQ variations.
• The inputs of the DSNN are the standard
derivations of the wavelet transform coefficients of
each level of hierarchical wavelet transform.
• The outputs of the DSNN are the types of
disturbances along with its critical value.

Wavelet Transform
• Let f(t) denotes the original time domain signal.
The continuous wavelet transform is defined as
follow: t −b
1 ∞  
CWT f (a, b) =
a ∫
−∞
f (t )ψ   dt
 a 
where ψ(t) represents the mother wavelet, a is the
scale parameter, and b is the time-shift parameter.

Wavelet Transform
• The mother wavelet ψ(t) is a compact support function that
must satisfies the following condition:
∞
∫−∞
ψ (t )dt = 0
• In order to satisfying the equation above, a wavelet is
constructed so that it has a higher order of vanishing
moments. A wavelet that has vanishing moments of order N
if
∞
∫
−∞
t pψ (t )dt = 0 for p = 0, 1, …, N-1

• The distinct features of the DSNN:
tune itself and adjust its learning capacity.
• The structure of the hidden layer of the
network must be reconfigurable during the
training process.

• The length of the edge of the virtual 3-D cube space is
defined as follows:
Ld = ρ × (10 × N )
where N is the total initial number of neurons of the network,
(10×N)3 is the space used for deploying the initial neurons, and
ρ is the space reserve factor for preserving extra space to place the new
generating neurons.
• Typically, ρ is predetermined within
an interval from 1.5 to 3, or the
interval can be set according to
experiments.

Model of Neurons
Model of an input Input Hidden
vector feeds into the Neuron Neuron
hidden neurons Input Vector yi yo
i w io o

bo

Hidden Hidden
Neuron i Neuron j
Model of signals yi yj
propagation between i wij o
two hidden neurons.
bj

Model of Neurons
The output of the hidden neuron is given by
 
yo (n) = ϕo  ∑ wio (n) ⋅ yi ( n) + bo (n) 
 i∈C 
where yo is the output of neuron o,
C denotes the index of the input neurons,
n is the iteration number of the process,
wio is the synaptic weight between neuron i and neuron o,
yi is the input of neuron i,
bo is the bias of neuron o,
φo is the activation function.

Supervised Training of Output
Neurons
The output error is defined by following

eo (n) = d o (n) − yo (n)
where eo is the error of output neuron o,
do is the target value of output neuron o, and
yo is the actual output value of output neuron
o.

Neurons
• The correction Δwio(n) can be calculated by:
∆wio (n) = l ⋅ η ⋅ eo (n) ⋅ y o (n)
 1 if ∆ yo ( n ) > 0
l=
 −1 if ∆ yo ( n ) < 0
where Δwio(n) is the weighting correction value of
the connection from original terminal neuron i to
destination terminal neuron o.
η is the learning rate,
l is the refine direction indicator used for deciding
the direction for weighting tuning.

Neurons
• The correction Δbo(n) is defined as:
∆bo ( n) = η ⋅ eo (n)
where Δbo is the bias correction value of the output neuron
o.
• The weighting and bias are adjusted by following
formulas: w (n + 1) = w (n) + ∆w
io io io

bo (n + 1) = bo (n) + ∆bo

where wio(n+1) and bo(n+1) are the refined weighting and
bias of output neuron o.

g j ( n) = η ⋅ si ( n) ⋅ y i ( n)

Supervised Training of Hidden
Neurons
• The updating the hidden neuron:

g i (n) = η ⋅ eo (n) ⋅ yo (n)
where gi(n) is the turning momentum of the hidden neurons to
the output neurons.
• The momentum of the hidden neuron i is defined as:
g i ( n)
si (n) = g j (n) = η ⋅ si (n) ⋅ yi (n)
Ci
where si(n) is the momentum of the hidden neuron i,
gj(n) is the turning momentum of the hidden neuron j
connected to the hidden neuron i,

Neurons
• The correction weighting is
∆ w ji (n) = l ⋅ η ⋅ si (n) ⋅ yi (n)
 1 if ∆ yi (n) > 0
l=
 − 1 if ∆ yi (n) < 0
where Δwji(n) is the weighting correction value of the connection from
original neuron j to destination terminal neuron i.
• The correction to bi(n) is
∆bi (n) = η ⋅ si ( n)
where Δbi is the bias correction value of the hidden neuron i.

Neurons
• The function of tuning indicator for backward
neurons is described as below.

g j (n) = η ⋅ si (n) | yi (n) |
where gj(n) is the tuning indicator for hidden
neuron j that connected to hidden neuron i.

Flow chart of tuning of weighting and
bias of the output neuron

do Target
Output
Neuron Vector
yo
wi,o o Error
bo eo

Δwi,o
Tw Δyo
Delta Delay

Δb o
Tb

Dynamic Structure
• creating new neurons and neural connections.
• The restructuring algorithm can produce or prune
neurons and the connections between the neurons
in an unsupervised manner.
y1 Grow Direction
N1
1
N2 Wnf1
2
y3 Wnf2 yn
N3 Nn
Wnf3
yk 3
Nk

Dynamic Structure
• The correction of the coordinate of the free connectors can
be formulated as follow:
gj
∆ ( x fn , y fn , z fn ) = ∑ D ⋅
L
( x , y ,z )
j j j
j j

 1 if attraction
D= 
 − 1 if repulsion

where Δ(xfn,yfn,zfn) is the correction of coordinate of the free
connector, Lj is the distance between the free connector and
the scanned neuron, and (xj,yj,zj) is the coordinate of the
scanned neuron.

Creating New Neurons
• The probability P of a new neuron being created is
given by:
 N max_ h − N h 
P = ∑ ei ⋅  
 N 
i  max_ h 

where ei is the error of the output neuron i,
Nh is the current number of the hidden neurons in the
middle layer.
Nmax_h is the maximum number of neurons that can be
created in the virtual cube space.

Block Diagram of the DSNN
1 to 4 scale Wavelet Coefficients

Input Disturbance Disturbance Standard D1 D2 D3 D4 S4
Types and Waveform Derivation
Conditions

Impulse Impulsive Impulsive Transient
Yes
Detector Transient? Disturbance
RMS Voltage
Calculation No

Impulsive Transient Filter
Estimate the Amplitude
Estimated
of the Fundamental
Amplitude
System Frequency 1 to 4 scale Wavelet Coefficients
Standard
Derivation
Generator D1 D2 D3 D4 S4
Perfect Waveform Sag? Sag Disturbance or
Swell? Yes Swell Disturbance or
with the Estimated Interrupt? Interrupt Disturbance
Amplitude
Neural
No Dynamic Structural
Weighting
and Bias Neural Networks
Waveform
Subtraction

Harmonic Distortions
Daubechies-8 Harmonic?
Wavelet Yes and/or
Wavelet Flocker?
Transform Voltage Flicker
Coefficients
No

End

Amplitude Estimator
• The estimating RMS value of voltages can be calculated by
the following equation:
M

∑ ( f (t ) )
2

RMS = t =1

M

where f(t) represents the value of the voltage sampled from the
disturbance waveform, and
M is the total amount of sampling points.
• In order to reduce the computational complexity, the RMS
value of f(t) can be approximated by
M

∑ f (t )
RMS A = t =1

M

Amplitude Estimator
• Then the amplitude of the fundamental voltage can be
predicted as
AmpEst = 2 × RMS
AmpEst_A = 1.5725 × RMS A

where AmpEst is the estimated amplitude of the fundamental voltage
obtained from the RMS value, and
AmpEst_A is the approximately estimated amplitude of the fundamental
voltage obtained from the approximately RMS value RMSA.

Wavelet Transform
• According to the estimated amplitude AmpEst produced by
the amplitude estimator, a perfect sinusoidal waveform
with the amplitude of AmpEst can be generated.
• And, subtract the generated perfect sinusoidal waveform
from the original measured waveform we have the
disturbance signal. Then, the wavelet transform is applied
to the extracted disturbance signal for analysis.

Wavelet Transform
• The disturbance features reside in four scales of the
decomposed high-pass and low-pass signals.
• The first scale of high-pass signal is most sensitive than
other scales of decomposed signals because it contains the
signals with high frequency band.
• Therefore, it is employed for extracting the features of the
impulsive transient disturbance within the disturbance
waveform.

Feature Extraction of Impulsive
Transient

An example of impulsive transient disturbance.

results of wavelet analysis in high-pass band and low-pass band, respectively.

Feature Extraction of Impulsive
Transient
• The values of mean and standard derivation of the signal in
high-pass band (D1) are calculated as follows to identify the
impulse disturbance.
M /2 M /2

∑ D (t ) ∑ ( D1 (t ) − µ )
2
1
µ1 = t =1
ρ1 = t =1

M /2 M /2

where μ1 and ρ1 are the mean and standard derivation of the signal in
high-pass band (D1), respectively.
• The impulsive transient disturbance event is identified
according to the following rule:
∀ t , ∋ D1 (t ) ≥ µ + 1.25ρ

Impulsive Transient Removal
• However, the impulsive transient disturbance may contain
multiple frequency components, which could make the
decomposed signals contain irregular disturbance.
• Hence, the impulsive transient components must be
removed from all scales of the decomposed signals, after
the impulsive transient disturbance has been identified.
• Then, the values of mean and standard derivation on each
scale of the decomposed signals D1, D2, D3, D4 and S4 are
calculated again for identifying other disturbance.
• This procedure can prevent the following DSNN classifier
misclassifying.

Example of hybrid of
Harmonic and Flicker

Example waveform of combining several harmonic distortions and voltage flicker

Decomposed signals D1, D2, D3, D4 and S4 form the 4-scale wavelet transform

Generating Waveform Data
(Training/Testing Dataset)
Number of
Condition Name Disturbances Options
Included

Single Disturbance
1 all type of PQ disturbances
Waveform
One is randomly chosen from
Dual Disturbances Type A, B, or C,
2
Waveform the other is randomly chosen from
Type D, E, or F.
One of them is randomly chosen from
Multiple Disturbances Type A, Type B or Type C and
2~4 the others are randomly chosen from
Waveform
Type D, Type E or Type F

Types of PQ Disturbances
Types Name RMS (pu) Duration

A Momentary Swell Disturbance 1.1~1.4 30 cycles~3 sec

B Momentary Sag Disturbance 0.1~0.9 30 cycles~3 sec

Momentary Interrupt
C Disturbance
<0.1 0.5 cycles~3 sec

Impulsive Transient Microseconds to
D Disturbance milliseconds

E Harmonic Distortion 0~0.2

F Voltage Flicker 0.001~0.07

Multiple PQ Disturbances
• From the field measurements, usually there existed
multiple types of disturbances in a PQ event.

• Recognizing a waveform that consists of multiple
disturbances is far more complex than that consists of
single disturbance.

• This work develops a new method that is capable of
recognizing several typical types of disturbances existing in
a measured waveform and identifying their critical value.

Multiple PQ Disturbances

Hybrid of voltage flicker and impulsive transient disturbance.

Hybrid of momentary sag disturbance and voltage flicker.

Hybrid of momentary sag disturbance and high-frequency harmonic distortions.

Examples

Single Disturbance

Dual Disturbances

Multiple Disturbances

Experimental Results
(Parameters)
• This section presents the classification results of 6
types of disturbances under 3 kinds of conditions.
• The sampling rate of the voltage waveform is 30
points/per cycle, the fundamental frequency is 60
Hz and the amplitude is 1 pu.
• The parameters of the proposed DSNN are that
space preservation factor ρ is set as 1.5, the number
of initially generated neurons is 50, and active
function φo of neurons is the hyperbolic tangent
function.

(Parameters)
• The disturbance waveforms are randomly
generated according to the definition of IEEE Std.
1159.
• The minimum amplitudes of harmonic distortions
and voltage flicker are both 0.01 pu.
• There are 3000 randomly generated waveforms for
three kinds of PQ variations. 250 waveforms of
each kind of PQ variations and 100 normal
waveforms are utilized for training the DSNN.

(Training of DSNN)

(Single Disturbance)

(Dual Disturbances: accuracy rate )
• 100%: Type C, E and F
• Type A, B and D: 99.985%, 99.986% , and 99.94%, respectively

(Multiple Disturbances)
• 96.89%: impulsive transient disturbance
• 97.75%: momentary swell and sag disturbance
• 99.79%: harmonic distortions
• 99.52%: voltage flicker

動態類神經網路~電機大師黃聰亮教授演講投影片

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Más de vincent8899

Más de vincent8899 (20)

動態類神經網路~電機大師黃聰亮教授演講投影片