1 / 58

# Games, Random Numbers and Introduction to simple statistics - PowerPoint PPT Presentation

Games, Random Numbers and Introduction to simple statistics. PRNG Pseudo R andom N umber G enerator. 蔡文能 [email protected] Agenda. What is random number ( 亂數 ) ? How the random numbers generated ? rand( ) in C languages: Linear Congruential

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Games, Random Numbers and Introduction to simple statistics' - reese

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Games, Random NumbersandIntroduction to simple statistics

PRNG

PseudoRandom Number Generator

• What is random number(亂數) ?

• How the random numbers generated ?

• rand( ) in C languages: Linear Congruential

• Why call “Pseudo random” ? (P不發音)

• How to do “true random” ?

• Application of Rrandom number ?

• Other topics related to Random numbers

• Introduction to simple statistics (統計簡介)

• http://www.atariarchives.org/basicgames/showpage.php?page=14

• An ancient game of two players

• One pile of match sticks (or stones)

• Takes turn to remove [1, maxTake]

• (至少拿 1, 至多拿 maxTake)

• 可規定拿到最後一個贏或輸 !

• Winning strategy ??

Games 須用到 Random Number! Why?

Bulls and Cows Game

http://5ko.free.fr/en/bk.html

Games須用到 Random Number! Why?

NIM Game

• http://en.wikipedia.org/wiki/Nim

• Nim is a two-player mathematical game of strategy in which players take turns removing objects from distinct heaps. On each turn, a player must remove at least one object, and may remove any number of objects provided they all come from the same heap.

• 可規定拿到最後一個贏或輸 !

• Winning strategy ??

Games 須用到 Random Number! Why?

• Sequence of independent random numbers with a specified distribution such as uniform distribution (equally probable)

• Actually, the sequence generated is not random, but it appears to be. Sequences generated in a deterministic way are usually called Pseudo-Random sequences.

Normal distribution? exponential, gamma, Poisson, …

<stdlib.h>

Turbo C++ 的 rand( )與srand( )

static global 變數請參考K&R課本4.6節

#define RAND_MAX 0x7fffu

static unsigned long seed=0;

int rand( ) {

seed = seed * 1103515245 + 12345;

return seed % (RAND_MAX+1);

}

void srand(int newseed) {

seed = newseed;

}

static使其它file裡的function 看不見這 seed

<stdlib.h>

Unix 上 gcc 的 rand( )與srand( )

static global 變數請參考K&R課本4.6節

#define RAND_MAX 0x7fffffffu

static unsigned long seed=0;

int rand( ) {

seed = seed * 1103515245 + 12345;

return seed % (RAND_MAX+1);

}

void srand(int newseed) {

seed = newseed;

}

Pseudo random number

• Linear Congruential Generators

• Simple way to generate pseudo-random numbers

• Easily cracked

• Produce finite sequences of numbers

• Each number is tied to the others

• Some sequences of numbers will not ever be generated

• Cryptographic random number generators

• Entropy sensors (i.e., extracted randomness)

Linear Congruential Generator (LCG) for Uniform Random Digits

• Preferred method: begin with a seed, x0, and successively generate the next pseudo-random number by xi+1 = (axi + c) mod m, for i = 0,1,2,… where

• m is the largest prime less than largest integer computer can store

• a is relatively prime to m

• c is arbitrary

• Let [A] be largest integer less than A (就是只取整數部份),

then N mod m = N – [N/T]*T

• Accept LCG with m, a, and c which passes tests which are also passed by know uniform digits

mod在C/C++/Java 用 %

1. Simulation

2. Recreation (game programming)

3. Sampling

4. Numerical analysis

5. Decision making randomness

an essential part of optimal strategies

( in the game theory)

6. Game program, . . .

Uniform Distribution(齊一分配 )

• 在 發生的機率皆相同

Normal Distribution (常態分配 )

Standard Normal Distribution(標準常態分配)

• N(0, 1)

• 平均是 0

• 標準差 1

• 在統計學中，最常被用到的連續分配就是常態分配。在真實世界中，常態分配常被用來描述各種變數的行為，如考試成績、體重、智商、和商店營業額等。

• 若 X 為常態隨機變數，寫成 X ~ N(,2)。

• 其中參數  為均數，2為變異數。

Central Limit Theorem (CLT)(中央極限定理 )

• 如果觀察值數目 n 增加，則 n 個獨立且具有相同分配(independent and identically distributed, I.I.D.)的隨機變數(Random variable)之平均數向常態分配收斂。

#include <stdlib.h>

double randNormal( ) { // 標準常態分配產生器

int i;

double ans =0.0;

for(i=1; i<=12; ++i) ans = ans + rand( )/(1.0+RAND_MAX);return ans - 6.0; // N(0, 1)

}

• Pseudo-Random Number Generators(PRNG) depend solely on a seed, which determines the entire sequence of numbers returned.

• How to get true random ?  change random seed

• How random is the seed?

• Process ID, UserID: Bad Idea !

• Current time: srand( time(0) ); // good

If you use the time, maybe I can guess which seed you used (microsecond part might be difficult to guess, but is limited)

### Introduction to simple Statistics

• 大學入學考試中心指出民國96年指考國文科較接近「常態分布」，即中等程度的人數最多、高分、低分人數較少。

• 教育部修訂資賦優異學生鑑定標準，自九十六學年度起，各類資優鑑定標準已提高為「平均數正二個標準差或百分等級九十七以上」。

2010大高雄市長選舉民調

• 目前將在明年登場的大高雄市長選舉，根據《財訊》雙週刊所公佈的最新民調顯示，高雄市民有50%挺陳菊，朱立倫僅有32%支持度；若是由國民黨內佈局明年市長最明顯的立委黃昭順對上陳菊，則更有19%：60%的大段差距。

• 本次《財訊》雙週刊民調，係委託山水民意研究公司，以北、高兩市住宅電話隨機取樣，高雄市於11月2~3日進行，有效樣本1273人，在95%的信心水準下，誤差約 ±2.75個百分點。

Sampling 抽樣

2005南投縣長選舉大調查

2009南投縣長選舉民調

2008總統大選 蘋果民調

2006年10月台北市長候選人民調

2005台北縣長選舉民意調查

• 根據TVBS在11月21至22日的民意調查顯示，國民黨台北縣候選人周錫瑋的支持度為48%，民進黨的候選人羅文嘉則獲得27%的支持度。

• 此次民調和上月前相比，繼永洲案爆發後及日前沸騰的“瑋哥部落格(BLOG)”的抹黑，周錫瑋的支持度不降反升，多了2個百分點，羅文嘉則是下降4個百分點。

• 這份民調是TVBS民調中心在11月21日到22日間，成功訪問了1033位20歲以上的台北縣民，在95%信心水準下，抽樣誤差約為正負3.0個百分點。

• 法蘭克羅斯福總統爭取連任、肯薩斯州州長蘭登為共和黨總統候選人

• 美國經濟正由大蕭條中逐漸恢復

• 九百萬人失業，於1929年至1933年間實際所得降低三分之一。

• 蘭登州長選戰主軸為「小政府」。口號為The spender must go。

• 羅斯福總統選戰主軸為「擴大內需」 (deficit financing)。口號為Balance the budget of the American people first。

• 宣稱一：大部分的觀察家認為羅斯福總統將大勝

• 宣稱二：Literary Digest雜誌認為蘭登將以57%對43%贏此選戰。

• 此數字乃根據於二百四十萬人之民意調查結果。

• 該機構自1916年起，皆能依照其預測辦法作正確的預測。

• 選舉結果：羅斯福以62%對38%贏此選戰。為什麼？

• 新興競爭者－蓋洛普－民調：

• 依據Literary Digest雜誌所取的二百四十萬人樣本中，蓋洛普抽樣三千人，而預測蘭登將以56%對44%贏此選戰。

• 依據自己所取的五萬人樣本中，蓋洛普預測羅斯福將以56%對44%贏此選戰。

Literary Digest雜誌錯在那裡？

• 取樣辦法：郵寄一千萬份的問卷，回收二百四十萬份，但問卷對象係從電話簿及俱樂部會員中選取。

• 在當時僅有一千一百萬具住宅用電話，但九百萬人失業。

可能問題的所在：

• 取樣偏差：Literary Digest雜誌的取樣中包含過多的有錢人，而該年貧富間選舉傾向相距極大。

• 拒回答偏差：低回收率。

• 以芝加哥一地為例，問卷寄給三分之一的登記選民，回收約20%的問卷，其中超過一半宣稱將選蘭登(Landon)，但選舉結果卻是羅斯福拿到三分之二的選票。

• When we use to construct a 95% confidence interval for , the bound on error of estimation is B =

• n =

• The estimated standard deviation of p is

• 1- = Confidence Interval

• B= the bound on error of estimation

• Using a conservative value of  = 0.5 in the formula for required sample size gives

n = (1-) = 0.5(1-0.5) =1067.11

• Thus, n would need to be 1068 in order to estimate to within .03 with 95% confidence.

95%信心水準之下，抽樣誤差在正負3個百分點以內。

• 台北車站廣場打算設置一台體重統計機,任何人站上去後立刻顯示其體重

• 並且立即顯示以下統計:

n : 共已多少人在此量過

Average : 平均體重

STD : 這 n人的體重標準差

frequency distribution

Histogram (長條圖)

Central tendency

Mean

Median (中位數)

mode (眾數)

Dispersion

Range

Standard deviation

Variance

N

Not P (inferential stats)

Descriptive Statistics

Dispersion 資料之散亂;發散

Distribution 資料之分佈; 分配

Central tendency 資料之集中趨勢

• Parameters (常見統計參數)

• Mean (平均數) ─ the average of the data

• Median (中位數)─ the value of middle observation

• Mode (眾數) ─ the value with greatest frequency

• Standard Deviation (標準差) ─ measure of average deviation

• Variance (變異數) ─ the square of standard deviation

• Range (範圍) ─ 例如 Max(B2:B60) ~ Min(B2:B60)?

Mean and Variance

Population Mean / Sample Mean

Sample Variance

• Variance describes the spread (variation) of that data around the mean.

• Sample variance describes the variation of the estimates.

• Standard deviations is the square root of s2

Compute Variance without mean

Variance = (平方和 – 和的平方/n) / n

From Wikipedia.org

• The probability distribution of sample means is a normal distribution

• If infinite number of samples with n > =30 observations are drawn from the same population where X ~ ??(μ,σ), then

Central Limit Theorem （中央極限定理）

• For a population with a mean and a variance , the sampling distribution of the means of all possible samples of size ngenerated from the population will be approximately normally distributed - with the mean of the sampling distribution equal to and the variance equal to assuming that the sample size is sufficiently large.

• Described by

• (mean)

• (standard deviation; 標準差)

• Variance 變異數 = 標準差的平方

• Write as N( , ) 或 N( , 2)

• Area under the curve is equal to 1

• Standard Normal Distribution

Why is the Normal Distribution important?

• It can be a good mathematical model for some distributions of real data

• ACT Scores

• Repeated careful measurements of the same quantity

• It is a good approximation for different types of chance outcomes (like tossing a coin)

• It is very useful distribution to use to model roughly symmetric distributions

• Many statistical inference procedures are based on the normal distribution

• Sampling Distributions are roughly normal (TBC…)

Normal Distribution

Black line - Mean

Red lines - 1 Std. Dev. from the mean (68.26% Interval)

Green lines – 2 Std. Dev. from the mean (95.44% Interval)

What about 3 Std. Dev. from the mean?

95% Confidence interval

±1.96 Std. Dev.

95.44%

-

µ

+

-2

µ

+2

99.74%

-3

+3

µ

68-95-99.7 Rule for Normal Curves

68.26% of the observations fall within  of the mean 

95.44% of the observations fall within 2 of the mean 

99.74% of the observations fall within 3 of the mean 

• It is important to distinguish between empirical and theoretical distributions

• Different notation for each distribution

Density function of Normal Distribution

• The exact density curve for a particular normal distribution is described by giving its mean () and its standard deviation ()

• density at x = f(x) =

Confidence Intervals (CI) for µ,from a single sample mean

• 當我們使用軟體去模擬真實環境時，通常會用亂數(random number)模擬很多次，假設第一次模擬的結果數據是X1，第二次是X2，重覆了n次後，就有X1、X2．．．Xn共n個數據，這n個數據不盡相同，到底那個才是正確的? 直覺上，把得到的n個結果加總求平均，所得到的值應該比較能相信。

• 但是我們可以有多少程度的去相信這個平均值(sample mean)呢?

• 這個問題討論的就是所謂的Confidence Interval (信賴區間)與顯著水準(significance level)。

• 在實務上，想要在有限個模擬數據結果中得到一個較完美接近真實結果的數據，其實是不可能的。

• 因此我們能做的就是去求得一個機率範圍(probability bound)。若我們可以得到一個機率範圍的上限c1和一個範圍的下限c2，則就有一個很高的機率1 – α ，會使得每次所得到的模擬結果平均值μ(sample mean)都落在c1到c2的範圍之間。

Probability { c1 <= μ <= c2} = 1 –α

α稱為顯著水準(significance level)；

100(1-α)%稱為信心水準(confidence level)，用百分比表示；

1-α稱為信心係數(confidence coefficient)。

• 試想抽取16所醫院來預測393所醫院的平均出院病人數的例子，

• 共有約1033種的不同樣本。

• 依據中央極限定理，所得到的平均出院病人數分佈像個鐘形曲線，其中心位於所有醫院的平均出院病人數，且大多數的16所醫院平均出院病人數都離中心(大數法則)不遠。

較有保障的抽樣辦法，被選取的樣本應使用隨機的原理取得。

Hypothesis Testing假設之檢定

• The null hypothesis for the test is that all population means (level means) are the same. (H0)

• The alternative hypothesis is that one or more population means differ from the others. (H1)

PRNG 相關補充

• 請用 http://gogle.com打 “PRNG” 查看

• ANSI X9.17 PRNG

(PRNG = Pseudo Random Number Generator)

• Von Neumann想出的 middle square method

• Von Neumann architecture ?

• PRNG in RC4 (RC4用於 802.11 無線網路加解密)

• http://www.rsa.com

• http://www.wisdom.weizmann.ac.il/~itsik/RC4/rc4.html

• WEP : RC4 Stream cipher

• Use 3DES and a key K

• Ti = Ek(current timestamp)

• output[i] = Ek(Ti seed[i])

• seed[i+1] = Ek(Ti output[i])

• Weaknesses

• Only 64 bits are used for Ti

• seed[i+1] can be easily predicted if state compromise

Jon von Neumann 1946 suggested the production of random number using arithmetic operations of a computer, "middle square", square a previousrandom number and extract the middle digits, Example generate 10-digit numbers, was 5772156649, square 33317792380594909201the next number is 7923805949

"middle square" has proved to be a comparatively poor source of random numbers. If zero appear as a number of the sequence, it will continually perpetuate itself.

Von Neumann architecture(http://wikipedia.org/)

• The term von Neumann architecture refers to a computer design model that uses a single storage structure to hold both programs and data. The term von Neumann machine can be used to describe such a computer, but that term has other meanings as well. The separation of storage from the processing unit is implicit in the von Neumann architecture.

• The term "stored-program computer" is generally used to mean a computer of this design.

Von Neumann bottle neck ?

RC4 PRNG (1/2)

for(I = 0; I < 256; I++)

S[I] = I;

for (I = J = 0; I < 256; I++) {

j += S[I] + K[I % klen];

SWAP(S[I], S[J]);

}

I = J = 0;

RC4 PRNG (2/2)

rc4byte()

{

I++;

J += S[I];

SWAP(S[I], S[J]);

return (S[ S[I] + S[J] ]);

}

Byte version

Encryption Key K

WEP: RC4 加解密 (http://rsa.com)

Random bit stream b

Plaintext bit stream p

Ciphertext bit stream c

XOR

Decryption works in the same way: p = c b

WEP : Wired Equivalent Privacy

Games, Random NumbersandIntroduction to simple statistics

http://www.csie.nctu.edu.tw/~tsaiwn/introcs/

http://gogle.com/