השלבים הראשונים של עיבוד מידע ראייתי: מודל עצבי

השלבים הראשונים של עיבוד מידע ראייתי: מודל עצבי אבי ליבסטר יהודה ברודי בהנחיית: ד"ר אורן שריקי

השלבים הראשונים של עיבוד המידע הראייתי.

הקורטקס הראייתי הראשוני תלמוס שלבים ראשונים של עיבוד מידע ראייתי מהרשתית לקורטקס

הקורטקס הראייתי הראשוני תלמוס בקורטקס V1 נוצרת לנו מעין "מפה טופוגרפית" של שדה הראייה. מהרשתית לקורטקס תא 3 תא 2 תא 1

לתאים המשתתפים במסלול יש "שדות קלט" Receptive Feilds (RF)

התאים בקורטקס הראייתי הראשוני רגישים לקווים בזוויות (Receptive Field)

תפקיד השדות הרצפטיביים הוא לזהות את איזורי השינוי בתמונה

התאים בקורטקס הראייתי הראשוני מגיבים לקווים בזוויות, ביחס לניגודיות בתמונה רמות קונטרסט שונות פעילות התא זווית הגירוי (במעלות)

כל נוירון בקורטקס הראייתי "אחראי" למקום מסויים בשדה הראייה ולזווית מסויימת

התאים בקורטקס הראייתי הראשוני רגישים לקווים בזוויות (Receptive Field) ***סרט***

מפת זוויות בקורטקס הראייתי הראשוני - סכימה יחידת עיבוד מקומית לכל זווית מותאם צבע

הגדלה של אחת השבשבות מפת זוויות בקורטקס הראייתי הראשוני - תוצאות ניסוייות

שאלות המחקר שלנו • כיצד מתפתחים השדות הרצפטיביים? • האם ניתן להסביר מדוע כל תא רגיש רק לאזור מסוים בשדה הראייה ולזווית מסויימת? • מה היתרונות שיש לשדות הרצפטיביים לגבי ייצוג מידע.

ICAIndependent Component Analysis הנחות היסוד של האלגוריתם • הקלט הוא צירוף לינארי של N מקורות, שלכל מקור התפלגות שאיננה גאוסיאנית • המקורות הם בלתי תלויים סטטיסטית (אי קורלאציה היא מקרה פרטי של אי תלות סטטיסטית(

ICA Continued נשתמש בורסיה החזקה של משפט הגבול המרכזי : X1, X2 … סדרה של משתנים מקריים בלתי תלויים שמקיימת: התוחלת של כל משתנה מקרי היא אפס .1 2. השונות של כל משתנה מקרי היא סופית. 3. האסימטריות של כל משתנה מקרי (המומנט השלישי) היא סופית. 4. הסופרמום של סכום סטיות התקן כפול סכום האסימטריות שואף לאפס אזי ההתפלגות של סכום הסדרה שואפת להיות N (1,0)

ICA Continued ולכן הרעיון הוא שעל מנת לשחזר את הקלט או את המקורות אנו צריכים ללכת נגד כיוון ההתפלגות הנורמלית (כי התפלגות הסכום, שמורכב מחלק מערכי המשתנים המקריים, היא פחות נורמלית מאשר סכום שמורכב מכל המשתנים המקריים) .

Solving Problems Using ICA

Deciding Who are the sources and how to derive the output Step 1

In our case each neuron in the input layer is a pixel in the picture Each neuron in the output receives all the pixels of the picture.

W Overcomplete representation Input from the retina The visual cortex S>X S can be viewed as the independent sources that cause the reaction on the retina. X is that reaction and S is a reconstruction of the sources that caused this reaction

Natural scene 12 X 12 patch

Row i contains the values of pixel i in each patch. Number of rows equals to patch size, i.e. 144 in our case. After the pictures were cutted into 12x12 patches, We have the X matrix which represent the training data Column j is all of the pixel values of the j patch. Number of columns equals the number of patches, i.e. 15000 in our case.

Deriving the output The function above calculates the output of neuron i when presented with patch k.

The input output function of each neuron is a non linear function Squashing the output sensitivity, to the range of input values, into a plausible biological behavior.

matrix W represents the connection between the input neurons and output neurons Row i represents N (number of input neurons) weights on the i output neuron. Column j represent M (number of output neurons) weights from the j input neuron

Overview to so far…

Step 2 Preprocess the Data

Why do we preprocess the data ? • Adhering to the conditions of the strong central theorem, if not fully then at least some of them. • Reducing the dimensions those serving two purposes : 1. Easier to compute 2. We can decide the network input and output size.

Methods of preprocessing • Centering (reducing the mean from the data) • PCA At least one condition is met. Also calculating the correlation matrix ,between the pixels, becomes easier <XX'> reducing the dimensions of the data and getting rid from the second order statistics

After Preprocessing • Data is called whitened. • The mean of the data is zero. • The Pixels have no correlation. • We reduced the data dimension from 144 to 100. • In order to reconstruct filters later, some processing is done.

Step 3 Cost function and the learning rule

ICA Again … ICA can be implemented in several ways. The main difference between the ways are the method that is being used to estimate how much is the output distribution normal.

Infomax approach The purpose of the learning process is to improve the information representation in the output layer Using Mathematical methods taken from information theory, information can be quantified, and algorithms aimed to improve the NN information representation, through changing the weights, could be developed. We assume that the brain is using similar methods to better represent information

Three Important Information theory quantities Information – defined as -log(p(x)). The more rare the appearance of a given value the more information it carries. Entropy – the mean value of a given random variable information Mutual Information – The amount of uncertainty of a given variable Y which is resolved by observing X.

Infomax basic assumptions Assumption Intuition • Minimize mutual information between neurons of the output layer • Maximize mutual information between the input and the output • The noise level is fixed, so it’s effect on the entropy of the output layer • activity of one neuron of the output layer shouldn’t give information about the activity of the other neurons • different reaction in the output layer for each input pattern and consentience for the pattern • only the entropy of the output plays an effect on the total MI between the layers

I\O layers Mutual information Mutual information between the input and the output layers depends on the entropy of the output layer, because s value is a function of x. We Want to Maximize H(s)

Output Layer Mutual information From the equation above (although for discrete values) we can see that if the neurons are statistically independent then the log in the expression becomes zero and the mutual information is zero We want to minimize MI in output layer

Estimating output distribution After long and painful math we derive the mentioned above expression as the estimation to s distribution. Chi is called the susceptibility matrix.

Entropy of output estimation because we don’t use the explicit equation of s entropy The integral solution. H(x) is considered as zero or constant Estimation of P(s) For the same reason as previously mentioned

The cost function The minus sign takes care for the value of the error to the decrease as the value on the right increases Sum of each changes in Si in power of 2 according to the input.

OUTPUT INPUT s x M-dimensional N-dimensional Geometrical Interpretation of the Cost function Geometrically, the target is to maximize the volume change in the transformation. This improves discrimination. And increases the mutual information between the input and the output

Learning Rules Using gradient descent method we define learning rules (how to change the W in response for a given set of outputs) . The rate of learning Derivatives of the cost function by W

Step 4 Writing the simulation in matlab and getting results

התוצאות שקיבלנו: תכונות התאים המתהווים בלמידה דומות לאלו של התאים בשלבים הראשונים של עיבוד המידע הראייתי במוח.

התאמת פילטרים מטיפוס גאבור

היסטוגרמה של פיזור הזוויות שקיבלנו מהשדות הרצפטיביים Number Of cells angle (Rad)

מה הלאה? • מיפוי טופוגרפי. • הוספת קשרי משוב. • ניסיון של חשיפת גירויים בפני הרשת, בחינת התמודדויות הרשת עם "אשליות". • חקירה אנליטית של תכונות הרשת • המודלים המתמטיים שתוארו מציעים כלים חדשים לחקר הפרעות נוירולוגיות ומחלות נפש, כגון:אפילפסיה, אוטיזם וסכיזופרניה.

ביקורת על המודל הנוכחי *הריחוק מהמציאות הפיזיולוגית. *ביקורת על דרך איסוף המידע הפיזיולוגי, באופן כללי.

“When we will discover the laws underlying natural computation . . . . . . we will finally understand the nature of computation itself.” --J. von Neumann

השלבים הראשונים של עיבוד מידע ראייתי: מודל עצבי

השלבים הראשונים של עיבוד מידע ראייתי: מודל עצבי

Presentation Transcript