Chapter 4

Chapter 4 BAM and the Hopefield Networks

What is Associative Memory and BAM ? Associative memory หมายถึงความทรงจำที่อยู่ในลักษณะความสัมพันธ์ เชิงการจัดคู่ระหว่างข้อมูล เช่น เวลาที่เรานึกถึงชื่อเพื่อน เราจะนึกถึงหน้าเพื่อนไปด้วย หรือเวลาเราฟังเพลง เราอาจจะนึกไปถึงนักร้องที่ร้องเพลงนั้น Bidirectional Associative Memory (BAM) นั้นหมายถึง Associative memory ที่ เชื่อมโยงแบบ “ไป-กลับ” ได้ แบม จณิสตา รูปจากวารสาร ผู้หญิงฉบับ 340 ตุลาคม 1999

Types of Associative Memory Associative memory ประกอบด้วยเซตของคู่ลำดับความสัมพันธ์ {(x1,y1), (x2,y2),…, (xL,yL)} โดยxi และ yi เป็น pattern ที่สัมพันธ์ กัน (เช่น ชื่อคน = x กับรูปถ่ายของคนๆนั้น = y)และอยู่ในรูปของ vector xiÎ RNและ yiÎ RMคู่ลำดับเหล่านี้เรียกว่า Examplar Associative memory สามารถเขียนให้อยู่ในรูปของ mapping function F(xi) = yi

Types of Associative Memory (cont.) • คู่ลำดับความสัมพันธ์ (xi,yi) มีหลายลักษณะดังนี้ • Heteroassociative memory F(xi) = yi โดย xi และ yi เป็น Pattern คนละชนิดกัน (เช่นชื่อกับรูปถ่าย) และในกรณีที่ป้อน input xใดๆเข้าไป F(x) จะให้ค่า yiโดยที่ xใกล้เคียงกับ xi มากที่สุด เช่น F เป็นความสัมพันธ์ระหว่าง x = ชื่อ กับy = นามสกุล เมื่อให้ x = ทักษิF(ทักษิ) จะให้ output y = ชินวัตร

Types of Associative Memory (cont.) • 2. Interpolative associative memory F(xi) = yi และในกรณีที่ป้อน input xใดๆเข้าไป โดย x = xi + d F(x) จะให้ค่า yi + e ออกมา เช่น F เป็นความสัมพันธ์ระหว่าง x = ชื่อ กับy = นามสกุล เมื่อให้ x = ทักษิF(ทักษิ) จะให้ output y = ชินวัต

Types of Associative Memory (cont.) • 3. Autoassociative memory F(xi) = xi ในกรณีที่ป้อน input xใดๆเข้าไป F(x) จะให้ค่า xiที่ใกล้เคียงกับ x มากที่สุด เช่น F เป็นความสัมพันธ์ระหว่าง ชื่อ กับชื่อ เมื่อให้ x = ทักษิF(ทักษิ) จะให้ output เป็น ทักษิน Notations: เราให้ ตัวอักษรตัวพิมพ์ตัวตรงเข้มแทน vector ส่วนตัวอักษรตัวเอียงเล็กแทน component ของ vector นั้น เช่น Component of vector x Vector x

Hamming Distance จะเห็นได้ว่า Associative memory ต้องมีการประเมินความ “ใกล้เคียง” กันของ Pattern ซึ่งโดยทั่วไปเราจะใช้ระยะทางเป็นมาตรวัดเช่น Euclidean distance ระยะทางยูคลิเดียนระหว่าง 2 vectors, xและ y คำนวณได้จาก โดย และ

Hamming Distance ในกรณีของ Binary pattern และ โดยxiและ yiÎ {+1,-1} เรานิยมใช้ Hamming distance ในการวัดระยะทาง h = number of mismatched components of x and y หรือ h = number of bits that are different between x and y

(1,-1,1) (-1,-1,1) (-1,1,1) (1,1,1) (-1,-1,-1) (1,-1,-1) (-1,1,-1) (1,1,-1) Hamming Distance vs Euclidean Distance และ Hamming Space เนื่องจากว่าเราจำกัดให้ xiและ yiÎ {+1,-1} เราจะได้ possible pattern ในรูปของจุดยอดของ Hyper cube (ในรูปนี้แสดงกรณีของ 3 มิติ) (0,0,0)

Linear Associator : Building a Map from Orthonormal Vectors ในกรณีที่ input xเป็น orthonormal vector : xT = Transpose of x เราสามารถสร้าง mapping functionFอย่างง่ายโดยให้ ในกรณีที่ x = x2 เราจะได้

Linear Associator (cont.) ในกรณีที่ x = xi + dเราจะได้ โดย โดยทั่วไป input vector มักจะไม่เป็น Orthonormal vector ดังนั้นเราจะไม่ ได้ mapping function อย่างใน slide ก่อนหน้าแต่เราสามารถดัดแปลงจากหลักการ เดิมนี้ได้

y y Output layer Feedforward part F-1 Feedback part F x x Input layer The BAM : Network Architecture เราจะต้องสร้าง bidirectional mapping function ที่สามารถ map ไปกลับระหว่าง xและ yได้ Note: แต่ละ Node หมายถึงแต่ละ component ของ vector ในแต่ละด้าน

The BAM : Processing ในการ run BAM network เรามีค่าเริ่มต้นเป็น (x(0),y(0)) BAM network ทำงานเป็นลำดับขั้นตอนดังนี้ 1. Feedforward part : ป้อน input x(t) เข้าที่ Network ทางด้าน input layer แล้วคำนวณและ update ค่า y (t+1) (t = iteration number = หมายเลขจำนวนรอบในการคำนวณ ) 2. Feedback part : ป้อน y (t+1) ที่ได้จาก step 1 เข้าที่ Network ทางด้าน output layer แล้วคำนวณและ update ค่า x (t+1) 3. ทำการคำนวณซ้ำใน step 1 และ 2 จนกว่าจะไม่มีการเปลี่ยนแปลงหรือ t > tmax

The BAM : Mathematics Feedforward part เราสามารถสร้าง mapping function ให้อยู่ในรูปของ weight matrix: และ โดยแต่ละ component ของ yคำนวณจาก

The BAM : Mathematics(cont.) Feedback part โดยแต่ละ component ของ xคำนวณจาก

The BAM : Mathematics(cont.) พิจารณา เนื่องจาก จำนวน components ของ vector x Hamming distance between x1 and x จะได้ ดังนั้น ในกรณีที่ xkใกล้เคียงกับxมากที่สุด h(xk,x)จะมีค่าต่ำสุด ทำให้ เทอม จะมีค่ามากที่สุด ซึ่งจะส่งผลให้แนวโน้มที่ outputเป็น ykมีมากที่สุด เมื่อเทียบกับ patternอื่น (แต่คำตอบอาจไม่ได้เป็น ykเพราะ cross talkอาจใหญ่กว่า )

2 0 0 0 -2 0 2 0 -2 0 0 2 2 -2 0 -2 0 2 0 -2 0 2 2 -2 0 -2 0 2 0 -2 0 2 2 -2 0 -2 0 2 0 -2 -2 0 0 0 2 0 -2 0 2 0 0 -2 -2 2 0 2 0 -2 0 2 w = The BAM : Example Matlab code x1 = [1 -1 -1 1 -1 1 1 -1 -1 1]'; y1 = [1 -1 -1 -1 -1 1]'; x2 = [1 1 1 -1 -1 -1 1 1 -1 -1]'; y2 = [1 1 1 1 -1 -1]'; w = y1*x1' + y2*x2' Examplar No. 1 Examplar No. 2 Weight matrix

The BAM : Example (cont.) x0 = [-1 -1 -1 1 -1 1 1 -1 -1 1]'; y0 = [1 1 1 1 -1 -1]'; xold = x0; yold = y0; for i=1:5 nety = w*xold ynew = (nety>0)*1 + (nety==0).*yold - (nety<0)*1 netx = w'*ynew xnew = (netx>0)*1 + (netx==0).*xold - (netx<0)*1 if (xnew==xold) break; end xold = xnew; yold = ynew; pause end Unknown input Set initial values Feedforward part Feedback part Stop if no change Update value for Next round

4 -8 -8 8 -4 8 4 -8 -4 8 1 -1 -1 1 -1 1 1 -1 -1 1 4 -12 -12 -12 -4 12 1 -1 -1 -1 -1 1 netx = xnew = nety = ynew = The BAM : Example (cont.) Iteration 1: Final state : xnew = x1

The BAM : Difference between BAM and Feedforward Networks ในบทก่อนๆ Network พยายามปรับ weights ของตัวเองเพื่อที่จะ minimize error or cost function ขั้นตอนนี้อยู่ในการ Train ของ Network Error function สามารถมองในลักษณะของ Energy ได้ โดยที่ network พยายามจะลดค่า energy ให้ต่ำที่สุดเท่าที่จะทำได้ Network จะเข้าสู่สภาวะเสถียรเมื่อ weights ไม่มีการเปลี่ยนแปลง Weights ของ BAM network ถูกคำนวณไว้ก่อนแล้วไม่ได้มีลักษณะของการ Train เพื่อหาร error แล้วจึงปรับ weights BAM network ไม่ได้ปรับ weights แต่ปรับ state ของ xและ yแทน ซึ่ง xและ yสามารถมองเป็น state variables ของระบบได้

The BAM : Energy Function เราต้องการพิสูจน์ว่า BAM มีโอกาสเข้าสู่สภาวะเสถียร ซึ่งต้องนิยาม Energy function ของ BAM network มาใช้พิสูจน์ Energy function ของ BAM network ในรูปของ function ของ input-output vectors สามารถเขียนในรูป xและ y = input-out vectors ขณะใดๆ Energy function ในรูปของ function ของ components ของ input-output vector คำนวณได้จาก

The BAM : Energy Function: Lyapunov Function ทฤษฎีของ Lyapunov กล่าวไว้ว่าถ้าเราสามารถหา bounded function ของ Dynamics system ได้ โดยทุกๆครั้งที่ state variables มีการปรับ ค่าแล้วทำให้ function นี้มีค่าลดลงตลอดเวลา แสดงว่า ระบบนี้สามารถมีคำตอบที่ เสถียรได้ เราเรียก function นี้ว่า Lyapunov function หรือ Energy function ทฤษฎีนี้ประยุกต์ใช้กับ BAM network และ BAM energy function จะได้ 1. การเปลี่ยนแปลงใดๆของ xและy ระหว่างการประมวลผลของ Network จะส่งผลให้ Energy function มีค่าลดลง 2. Energy ของ Network จะมีค่ามากกว่าหรือเท่ากับ Eminเสมอ โดย 3. เมื่อ E มีการเปลี่ยนแปลงค่า จะเปลี่ยนแปลงเป็นค่าที่จำกัด (finite value)

The BAM : Energy Function: Lyapunov Function (cont.) ทฤษฎีของ Lyapunov หมายความว่า รอบ 0 รอบ 1 รอบ 2 รอบ n ระบบนี้จะมีคำตอบที่เสถียร (คำตอบนี้ไม่จำเป็น ต้องถูกต้อง แม้ว่าคำตอบจะ converge เข้าสู่ค่า ที่แน่นอนก็ตาม) ถ้า

The BAM : Prove of the BAM Energy Theorem 1. การเปลี่ยนแปลงใดๆของ xและy ระหว่างการประมวลผลของ Network จะส่งผลให้ Energy function มีค่าลดลง พิสูจน์: พิจารณาที่ component yk

The BAM : Prove of the BAM Energy Theorem (cont.) ในกรณีที่ component ykเปลี่ยนเป็น yknew จะได้การเปลี่ยนแปลงของ energy เป็น

The BAM : Prove of the BAM Energy Theorem (cont.) จากสมการของ output node กรณีแรกykเปลี่ยนจาก -1 เป็น yknew=1 เพราะว่า ykเปลี่ยนจาก -1 เป็น yknew=1 ได้ในกรณีเดียวคือ ดังนั้น

The BAM : Prove of the BAM Energy Theorem (cont.) กรณีที่ 2ykเปลี่ยนจาก 1 เป็น yknew=-1 เพราะว่า ykเปลี่ยนจาก 1 เป็น yknew=-1 ได้ในกรณีเดียวคือ ดังนั้น กรณีที่ 3ykไม่มีการเปลี่ยนแปลง

The BAM : Prove of the BAM Energy Theorem (cont.) 2. Energy ของ Network จะมีค่ามากกว่าหรือเท่ากับ Eminเสมอ โดย พิสูจน์

1 -1 -1 1 -1 1 1 -1 -1 1 -1 -1 -1 1 -1 1 1 -1 -1 1 1 -1 -1 -1 -1 1 1 1 1 1 -1 -1 xnew = xnew = ynew = ynew = The BAM : Energy Function Example Iteration 1: Iteration 0: E = -64 E = 40 Emin = -64

x F x Hopefield Network Hopefield memory จัดว่าเป็น BAM ชนิดหนึ่งที่ input และ output อยู่ในรูปเดียวกัน (Autoassociative memory) F(xi) = xi

Hopefield Network : Processing ในการ run Hopefield network เรามีค่าเริ่มต้นเป็น x(0) Hopefield network ทำงานเป็นลำดับขั้นตอนดังนี้ 1. ป้อน input x(t) เข้าที่ Network แล้วคำนวณและ update ค่า x (t+1) (t = iteration number = หมายเลขจำนวนรอบในการคำนวณ ) 2. ป้อน x (t+1) ที่ได้จาก step 1 เข้าที่ Network แล้วคำนวณและ update ค่า x (t+2) 3. ทำการคำนวณซ้ำใน step 2 จนกว่าจะไม่มีการเปลี่ยนแปลงหรือ t > tmax

Hopefield Network : Mathematics weight matrix ของ Hopefield network คำนวณได้จาก L = No. of examplar และ โดยแต่ละ component ของ xคำนวณจาก โดย Ui = Threshold value ของแต่ละ unit

Hopefield Network: Example Examplar : Binary images of “A” and “T” of size16x16 pixels a = imread('a.bmp');®Read image file a = double(a)*2-1;® Convert from binary 0,1 to bipolar -1,+1 imagesc(a);® Display an image a = reshape(a,[size(a,1)*size(a,2)1]);® Convert an image to a column vector t = imread(‘t.bmp'); t = double(t)*2-1;imagesc(t); t = reshape(t,[size(t,1)*size(t,2)1]); w = (a*a’ + t*t’)/2; ®Compute weight matrix

Hopefield Network: Example (cont.) x = imread('a.bmp');®Read input image file x = double(x)*2-1; x = reshape(x,[size(x,1)*size(x,2) 1]); rand('state',6); for i=1:size(x,1) if rand(1)<0.1 x(i) = -x(i); ®Add 10% salt and pepper noise end end

Hopefield Network: Example (cont.) xold = x for i=1:20 netout = w*xold; ®Compute network output for j = 1:size(netout,1) if netout(j) < 0 xnew(j) = -1; elseif netout(j) == 0 xnew(j) = xold(j); Update x else xnew(j) = 1; end end if xnew == xold break Break if no change end xold = xnew; end

Hopefield Network: Example (cont.) Results: Initial input Iteration 1 Hopefield network พยายามจะให้ output ที่ใกล้เคียงกับ input มากที่สุดคล้าย กับว่า examplar แต่ละตัวมีแรงดึงดูดให้ pattern ที่ใกล้เคียงตกเข้ามาหาตัวเอง บาง ครั้งเราจึงเรียกว่า attractor

Hopefield Network: Weak point (cont.) เพิ่มจำนวน examplar ได้ผลลัพธ์ที่แปลกไปจาก examplar Initial input Final iteration

Hopefield Network: Weak point วิเคราะห์ สมมุติให้ input = xn พิจารณาแต่ละ component ของ netx Contribution from xn Cross talk from other patterns

และ จะต้องมีเครื่องหมายเดียวกัน Cross talk term จะต้องไม่ไปเปลี่ยนเครื่องหมายของ Hopefield Network: Weak point (cont.) ถ้าต้องการให้ F(xn) = xn โดยทั่วไปถ้าจำนวน examplar มีมากจะทำให้ cross talk มีขนาดใหญ่และจะ ไปทำให้ Network output ไม่เสถียร จำนวน Examplar ที่สามารถเก็บใน Network โดยมี error ต่ำกว่าค่า Error ที่ยอมรับได้จะเป็น Function of จำนวน Node

Chapter 4

Chapter 4

Presentation Transcript

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4

Chapter 4-4

Chapter 4

Chapter 4

Chapter 4 - 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

CHAPTER 4

Chapter 4

Chapter 4

Chapter 4