熵與信息的概念

上傳人：s*** IP屬地：天津上傳時間：2022-07-31 格式：DOCX 頁數(shù)：8 大小：20.45KB 積分：15 舉報 版權申訴

已閱讀5頁，還剩3頁未讀，繼續(xù)免費閱讀

版權說明：本文檔由用戶提供并上傳，收益歸屬內(nèi)容提供方，若內(nèi)容存在侵權，請進行舉報或認領

文檔簡介

1、Entropy and InformationStatistical entropy is a probabilistic measure of uncertainty or ignorance; information is a measure of a reduction in that uncertaintyEntropy (or uncertainty) and its complement, information, are perhaps the most fundamental quantitive measures in cybernetics, extending the m

2、ore qualitative concepts of variety and constraint to the probabilistic domain.Variety and constraint, the basic concepts of cybernetics, can be measured in a more general form by introducing probabilities. Assume that we do not know the precise state s of a system, but only the probability distribu

3、tion P(s) that the system would be in state s. Variety V can then be expressed as entropy H (as originally defined by Boltzmann for statistical mechanics): H(P)2P(sgP(s)H reaches its maximum value if all states are equiprobable, that is, if we have no indication whatsoever to assume that one state i

4、s more probable than another state. Thus it is natural that in this case entropy H reduces to variety V . Like variety, H expresses our uncertainty or ignorance about the systems state. It is clear that H = 0, if and only if the probability of a certain state is 1 (and of all other states 0). In tha

5、t case we have maximal certainty or complete information about what state the system is in.We define constraint as that which reduces uncertainty, that is, the difference between maximal and actual uncertainty. This difference can also be interpreted in a different way , as information, and historic

6、ally H was introduced by Shannon as a measure of the capacity for information transmission of a communication channel. Indeed, if we get some information about the state of the system (e.g. through observation), then this will reduce our uncertainty about the systems state, by excluding-or reducing

7、the probability of-a number of states. The information I we receive from an observation is equal to the degree to which uncertainty is reduced: I = H(before) - H(after)If the observation completely determines the state of the system (H(after) = 0), then information I reduces to the initial entropy o

8、r uncertainty H.Although Shannon came to disavow the use of the term information to describe this measure, because it is purely syntactic and ignores the meaning of the signal, his theory came to be known as Information Theory nonetheless. H has been vigorously pursued as a measure for a number of h

9、igher-order relational concepts, including complexity and organization. Entropies, correlates to entropies, and correlates to such important results as Shannons 10th Theorem and the Second Law of Thermodynamics have been sought in biology, ecology, psychology, sociology, and economics. We also note

10、that there are other methods of weighting the state of a system which do not adhere to probability theorys additivity condition that the sum of the probabilities must be 1. These methods, involving concepts from fuzzy systems theory and possibility theory, lead to alternative information theories. T

11、ogether with probability theory these are called Generalized Information Theory (GIT). While GIT methods are under development, the probabilistic approach to information theory still dominates applications.Reference:Heylighen F. & Joslyn C. (2001): Cybernetics and Second Order Cybernetics , in: R.A.

12、 Meyers (ed.), Encyclopedia of Physical Science & Technology , V ol. 4 (3rd ed.), (Academic Press, NewYork), p. 155-170An Intuitive Explanation of the Information Entropy of a Random Variable Or: How to Play Twenty QuestionsDaniel Shawcross Wilkerson8 and 15 October 2006, 26 January 2009AbstractInfo

13、rmation Entropy is not often explained well, yet it is quite straightforward to understand and that understanding can be quite helpful in everyday life. Here we explain Information Entropy from scratch using simple mathematics and examples from everyday life, in particular deriving from first princi

14、ples the best method of playing the game Twenty Questions.Twenty QuestionsThere is a popular game called Twenty Questions that works like this. One person is the Knower and picks a point out of the probability space of all objects (thinks of an object). The other is the Guesser and asks the Knower t

15、o evaluate various random variables (questions) at that point (answer the questions of the object). The Guesser wins if he can guess the object the Knower is thinking about by asking at most Twenty Questions.Here is a stupid way to play the game:Knower: I got one.Guesser: Is it my left big toe?Knowe

16、r: No.Guesser: Is it the Washington Monument?Knower: No.Guesser: Is it Aunt June?.Anyone who has played this game can do better. But can we examine rigorously how, and say precisely by how much, we can do better?InterestingnessWhen my telephone rings, I do the experiment of answering it: sometimes i

17、ts a hot girl; sometimes it is a telemarketer. Even if we know which is more likely, we never know for sure. I called a girl last night and she immediately exclaimed we were just talking about you!; we are quite surprised when the measurement of picking up the phone has even the slightest predictabi

18、lity, such as being the person about whom we were just speaking. Picking up the phone is quite an interesting experiment to do.On the other hand, there are also much less interesting experiments: we basically know how they are going to turn out. When I turn the key in my car, it starts. I have a wel

19、l-maintained car and a great mechanic so it doesnt surprise me when my car starts. It isnt interesting enough to even enter into a discussion with another person when making plans with them: I dont say Ill be there at 8pm, as long as my car starts; I take it completely for granted. Starting my car i

20、s not a very interesting thing to do.Further we rely on our ability to predict the difference between interesting and un-interesting experiments. In particular, we rely on most of our day consisting of rather un-interesting experiments, such a taking a step, consulting our pay-stub, or greeting a fr

21、iend. If all of these events were suddenly to be come interesting, we would be in trouble. What if it was a coin-flip whether my car starts, the floor is there, my paycheck comes, or my old friend recognizes me? Life would be unlivable. We are careful to know when an interesting experiment is coming

22、: we put our attention on it, we plan for the various eventualities, it punctuates the otherwise more predictable parts of our life.While we may consider some things in life to be predictable or known and others random, there is in fact no fundamental separation. Each moment is an experiment. Is the

23、re a way to measure which experiments are more interesting and which are less?Random variablesA Random Variable it is a mathematical abstraction for modeling what we call in the real-world a measurement, observation, or experiment. The definition that mathematicians give is that it is a function of

24、a set you dont know anything about called the probability space (they add a few other technical conditions that basically just make the math work). You could think of the probability space as the complete state of the world beyond your knowledge.That is, if my phone rings, to me whoever is at the ot

25、her end of the line is random: although it might more often be one person than another, I really dont know. But to someone w ho has a god-like knowledge of the complete state of the universe, they know before I answer that it is my mom (forget the problems resulting from quantum mechanics for the mo

26、ment). Another way to think about it is that mathematics is deterministic, so for us to be able to apply mathematics to this question, we have to squeeze all the randomness down into one thing, so we put it into the mysterious and unknowable probability space and then think of our random variables a

27、s being deterministic functions of it.InformationA bit is a binary decision between two things. If you ask for a milkshake and the waiter says chocolate or vanilla in theory he just wants one bit of information from you. He might ask whipped cream?; he wants another bit. Sprinkles?; another bit. But

28、 notice that these two possibilities and another two and another two actually multiply for a total of eight. That is, the total number of possibilities is two to the power of the number of bits; here 2A3 = 8:chocolate no whipped cream no sprinkles,chocolate no whipped cream sprinkles,chocolate whipp

29、ed cream no sprinkles,chocolate whipped cream sprinkles,vanilla no whipped cream no sprinkles,vanilla no whipped cream and sprinkles,vanilla whipped cream no sprinkles,vanilla whipped cream and sprinkles,EventsWhen a random variable has a particular value this is called an event. That is, the set of

30、 possible world-states where the person calling me on the phone is my mom is an event; when we are in one of those worlds we say the mom is calling event has occurred. From the perspective of all possible worlds at once we say the subset of the probability space where it is mom that is calling is th

31、e event.Note that this event really is a subset of the possible worlds space or probability space. When mom is calling, the number of birds nesting in the Campanile tower on the Berkeley campus could be two, or it could be zero; it doesnt matter, because either way my mom is calling. So an event con

32、sists of a whole subset of possible worlds. Only some of the information about a given world is relevant: its membership in the event-set; other information is not relevant.A random variable as an information-losing filterIf we look at all possible values of a random variable, we notice that their e

33、vents are a partition of the probability space. That is, any possible world is in exactly one of them: the events cover the space and also do not overlap.It is helpful to partition the idea of a measurement of a random variable into two steps:Pick a point in the probability space.Compute the random

34、variables function of that point and return its value.Now, note that a random variable can be thought of as an information-losing filter for the state of the world observed in step 1. That is, when we pick a point from the probability space, we do not get to find out all of that information about th

35、at point; instead we only get to find out the value of the random variable function when computed on that point. The particular value we get out of the random variable tells us that an event has occurred, but all we know from that is that the state of the world is one of the many points in that even

36、t-set, the set of points that would give that value. That is, observing only the value of the random variable, some information about the state of the world has been lost.Expected valueSuppose you play a game where you flip a fair coin and if it comes up heads you win a dollar and if it comes up tai

37、ls you win nothing. How much do you expect to win?I basically expect to win fifty-cents. Now, thats a bit odd, because I cant actually win fifty-cents! However, what we mean is expect on average of identical independent games played in the long run. We discuss further what that means exactly below,

38、but for the moment your intuition will work rather well here.The number fifty-cents is called the expected value. We computed it as follows.Consider each possible event;Compute its value and its probability;Multiply the value and probability;Add those up.That is in the previous example we did this:

39、(1/2 * $0) + (1/2 * $1) = $0.5.It can be shown in our mathematical model called probability theory that the amount you actually win of independent games played in the long run is in fact fifty-cents per play on average. Such results are called Limit Theorems, the famous one being the Central Limit T

40、heorem. These are beyond the scope of this essay but most people dont find this understanding of expectation so unintuitive, so we rely on intuition for our purposes here.In general for a random variable that has a value that is a number (so we can add and multiply it), the expected value is the sum

41、 for each outcome of the product of the probability of the outcome and the value of the outcome. This is a useful number because in the long run, you do tend to get what you expect. If we call our random variable R, we get the following formula.Expected Vilue of R = Sum over x Prob(R = x) * Value(x)

42、Measuring information preserved by a random variableNow we come to the main question: can we measure how much information is preserved by a random variable? Another way to say it is, how interesting is that random variable? If a random variable told us the whole state of the world, that would be pre

43、tty interesting; we would get lots of information. However if a random variable always said 0, that would be pretty boring. How do we measure this interestingness?Imagine a one meter wide by 10 cm high board tiled with colored rectangles. Suppose our complete state of the world, or probability space

44、, is a point somewhere on this board. Our random variable is the color of the region that the point is in. TOC o 1-5 h z +1111111+| A | B | C |D|+1111111+Information about the points vertical position has no effect on the outcome, so we just ignore it; it is part of the information that is irretriev

45、ably lost. However, we do get some information about its horizontal position from the random variable output: the color.Lets say that the horizontal position of the point is represented by a string of zeros and ones as follows. If the point is on the left half, the first digit in the string is zero,

46、 otherwise if it is on the right half, the first digit is one. Once we are restricted to only one half of the board, we can repeat this process of naming which half of the half we are on and so on. We may pin down the position of the point as narrowly as we like.In mathematics one just assumes that

47、this process can continue forever, but at small enough scales, quantum mechanics intervenes. Lets imagine we go out to ten digits, so or resolution would be 2A-10 m which is about 10A-3 = one thousandths of a meter or one millimeter. So in reality a point is picked at the millimeter granularity, one

48、 in about a thousand possibilities, but we only get to find out one of four possibilities. Information is clearly lost.Information and probability of eventsFurther, the information we get when we do get a color is non-uniform. Lets consider each color and see how many bits of the real ten bit string

49、 we can recover.D: all we find out is that the point is somewhere on the right. This just tells us that the first bit of the point is 1. Only one bit of information is preserved.A: we find out that the first bit is 0 and the second is 0. Two bits preserved.B: the first bit is 0, the second 1, the th

50、ird 0. Three bits.C: the first bit is 0, the second 1, the third 1. Again, three bits.Wow, sometimes we can get a whole three bits of information! Recall however that we cant control a random variable: you never know which point from the probability space is going to happen. What we really want to k

51、now is when played independently in the long run how much information do we expect to get on average?LogarithmHave you ever heard he has a six-figure income? If you have ever referred to the magnitude of someones income by the number of figures it has then you already understand logarithm. Basically

52、, every time the number of figures in someones income goes up by one, we know that they make ten times more money. That is, their income is an exponential function of their digits.Reversing it, their digits are a logarithmic function of their income. Thats all there is to the logarithm.The logarithm

53、 base 10 of x, also written log_10(x), just means how many 10s you have to multiply together to get x.For example, log_10(100) = 2, as two 10s multiplied give 100. Further, log_10(10) = 1, because you only need one 10 to make a 10 (duh!). It is important to realize that log_10(1) = 0: if you want to

54、 multiply things together to get one, there is nothing to do! When I was a little kid that one drove me crazy - I ran around the house yelling at anyone who would listen: No! You cant get something from no multiplies at all! It was straightforward once I realized that, in the context of multiplicati

55、on, a sequence of multiplies starts at one, the multiplicative identity, rather than zero, the additive identity, so you get a one for free as it were.What if x is not a power of 10? Well first notice that if you multiply something by 3 twice, you almost get 10 - that is the square root of 10 is 3.1

56、6 So if you have to multiply by 3.16 then that only counts as multiplying by half of a 10 , so the logarithm of the product only goes up by 1/2. That is:log_10(31.6) = log_10(10 * 3.16). but the number of 10s we need to multiply together to get the product of two numbers is just the sum of the numbe

57、r of 10s we needed to multiply to get each number separately, so.=log_10(10) + log_10(3.16) = 1 + 0.5 = 1.5.Above, 10 is the base of the logarithm, but we could have just as easily used another number, such as 2, and we would have written log_2(x) instead. You get the idea.Entropy: Expected informat

58、ionTo compute expected information we need to know the information value of each color-event, which we computed in the previous section, and its probability. Now note for a moment how we pick the point on the board: while some random variables may be non-uniform, the events may have different probab

59、ilities, the points in the probability space are always of the same value. That is, the probability of an event is measured as the size of the part of the probability space that results in that event.Therefore, when considering the probability of a color-event we can just consider what percentage of

60、 the board is covered by that color. Lets consider each color again and note the probability of its occurrence.D: all we know is that the point is somewhere on the right. This is half the board so there is a 1/2 chance of this.A: one fourth of the board is covered, so the chance is 1/4.B: takes one

人人文庫> 全部分類> 圖紙下載 > 畢業(yè)設計

溫馨提示

1. 本站所有資源如無特殊說明，都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
2. 本站的文檔不包含任何第三方提供的附件圖紙等，如果需要附件，請聯(lián)系上傳者。文件的所有權益歸上傳用戶所有。
3. 本站RAR壓縮包中若帶圖紙，網(wǎng)頁內(nèi)容里面會有圖紙預覽，若沒有圖紙預覽就沒有圖紙。
4. 未經(jīng)權益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
5. 人人文庫網(wǎng)僅提供信息存儲空間，僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護處理，對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯，并不能對任何下載內(nèi)容負責。
6. 下載文件中如有侵權或不適當內(nèi)容，請與我們聯(lián)系，我們立即糾正。
7. 本站不保證下載資源的準確性、安全性和完整性, 同時也不承擔用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。

熵與信息的概念

文檔簡介

溫馨提示

最新文檔

評論

熵與信息的概念

文檔簡介

溫馨提示

最新文檔

評論

相關文檔