Monday, July 9, 2018

Shalizi On Jaynes’ Arrow Of Time

Though early, grossly Orientalist, commentators* described Jnana Yogi Swami Cosma Rohilla Shalizi in terms similar to Descartes, it is clear from the historical record that Sw. Shalizi wrote his works in robes of saffron and kermes. As a small contribution to restoring historical reality, this paper is demonstrates how the recent publication by Carnegie-Mellon of Sw. Shalizi’s deutero-canonical writings throws new light on one of the canonical Shalizi texts in the Cornell Canon.

Some historical background is necessary. It is well known that Sw. Shalizi was a controversialist in the school of Guru Josiah Gibbs, opposing the faction led by Adhyapakah Edwin Jaynes. Gershom Scholem’s wonderful essay on the politics of saints “Religious Authority & Mysticism” gives a wonderful examination of how Sw. Shalizi likely felt about his work.

Adhyapakah Jaynes proclaimed, in seeming harmony with the tradition from Laplace to Schopenhauer, that the so-called “thermodynamic functions” - heat, pressure, etc. - had a mere conventional existence. Reality - the microstate - is something much stranger.

Against this preaching, Sw. Shalizi proposes that the true tradition would affirm the existence of heat even when there was no observer present. There seems to be an error in transcription in the Carnegie-Mellon text, as sometimes the text seems to be implying that without thermodynamic functions there would be no history prior to consciousness. But evolution of the objective microstate is not a function of the subjective macrostate. Fortunately, examination of the Cornell Canon is sufficient to demonstrate that Sw. Shalizi was aware of this.

Sw. Shalizi’s objection was much more subtle. Adhyapakah Jaynes essentially denied Mahapandita Ludwig Boltzmann’s claim that large numbers of particles play a role in thermalization, arguing that subjective ignorance is sufficient. Of course, Sw. Shalizi could not stand such a rejection of tradition.

We now move to the main text, Sw. Shalizi’s controversial article “False Jnanachakra”. In it, Sw. Shalizi imagines a contest between a mighty asura Andhaka and  Bhagvan Shiva. The Mahayogi bets that if Andhaka can turn the Jnanachakra then Andhaka may have a night with Parvati. Of course, Andhaka cannot resist such a carnal opportunity. The price he pays is this: if Andhaka cannot but Shiva can, then Andhaka must die and live the rest of his lives in non-violence and celibacy.

The Jnanachakra is a weightless Jade wheel with a single needle along it’s rim which may be lowered or raised. It has no effect on the pendulum. The Jnanachakra, as its name suggests, is merely a metaphor for the player’s knowledge. Beneath the Jnanachakra is an n-ary pendulum whose fobs beyond the first are invisible. Based only on macroscopic observation, the player attempting to turn the wheel must raise or lower the needle which will be pushed by the pendulum in the correct direction.

The observable state space of a single visible fob is a cylinder with one axis, up-down, as the velocity of the fob and the the other axis, around, as the velocity of the fob. The same for the Jnanachakra. Holding the current velocity of the Jnanachakra constant gives us a circle around the phase space. We want the needle down only when the fob is above the current circle in phase space. 

Andhaka’s strategy is to lower the needle in front of the fob when the fob is swinging right faster than the Jnanachakra. He believes this will allow him to rotate the Jnanachakra counterclockwise. For a short time, this works. But the hidden fobs cause the visible fob to bounce randomly on a long timescale - a timescale which can be calculated from coupling of the fobs (cite pg 44 Chaos & Coarse Graining In Stat Mech). Eventually, Andhaka’s predictions of will be no better than chance. Therefore Andhaka’s strategy will fail in the long run.

Now the Mahabuddhi goes to the wheel. He follows a similar strategy. At first the Jnanachakra seems to thermalize - bounce randomly. But soon the wheel begins turning counterclockwise. Why? Shiva can *learn* the motion of hidden fobs by observing the system at large. Since the state space of the system is fixed, in the long run Shiva learns the whole system perfectly. Following the teachings of Guru Shannon, Sw. Shalizi calculates the rate at which one may learn and finds it is either 0 (for Andhaka) or exponentially rapid (for Shiva).

Now Sw. Shalizi makes his attack. Nrityapriya’s dance requires only learning a finite state space and converges rapidly. Why cannot Andhaka do the same? Sw. Shalizi challenges the anti-Boltzmannian to explain without reference to the enormous size of the state space. If he cannot, then the anti-Boltzmannian has admitted that an objective property of reality - the dimension of the state space - plays the role Adhyapakah Jaynes has denied.

The case for Bhagavan Shiva and against the asura Andhaka can be made explicit. Start by recalling that the rate of learning is either zero or exponential. 
Andhaka’s capacity to learn is constant. As dimension of the state space - the number of fobs - grows large large, one soon finds that the asura cannot converge on the true state because he doesn’t have enough memory to hold the state in his mind. Therefore his learning rate is zero.
But Akshayaguna is different. His capacity to learn grows large holding the dimension of the state space constant. Therefore his learning rate is exponential.

There is no denying that Sw. Shalizi’s logic is absolutely sound. But before the publication of Three Toed Sloth, the full extent of his argument couldn’t be appreciated. Sw. Shalizi's implicit claim is that subjective arrows of time must have a consistent direction only if the state space of possibilities is much larger than the capacity any relevant learner. But Sw. Shalizi’s Cornell Canon piece doesn’t explicitly argue that why the learning capacity of a mere asura is insufficient.

Only now are we learning that Sw. Shalizi intended this work as part of a campaign of Arhat Ashby against the Jaynesians. Sw. Shalizi intends to use Ashby’s Law Of Requisite Variety, showing that a model of a system must be as complex as the system itself. This demonstrates that nobody but one who is unified with Brahman may have a backwards arrow of subjective time.

In sum, Carnegie-Mellon’s publication of Shalizi’s deutero-canonical work has opened new fields for scholarship. Already his brief note has been revealed to be more than a mere grumbling of a lover of controversy. His is a philosophical distinction between the monotonic subjective arrow of time of an ordinary being and the free arrow of time of an extraordinary one. All philologists of this canon  must take note.
*In case it isn't clear: this is written as a parody of Western commentary on Eastern religion - thus all the Sanskrit jargon despite none of the figures speaking the language. Why? Well, it seemed funny at the time. Of course, this blogpost shouldn't be considered scientific, much less serious theology.

Saturday, March 17, 2018

Two Dogmas Of Bayesianism

W V O Quine

There are different levels of disagreeing with a theory. To illustrate, we can imagine ourselves the central planner dispensing funds for researchers. The lowest level of disagreement is outright rejection - if a plan for a spaceship begins with "Assume we can violate the laws of thermodynamics..." I would simply not disperse any funds. Above this, there is non-fundamentalness. If I see a numerical simulation for a spaceship engine which I know doesn't respect energy conservation exactly, I might not dismiss it out of hand but would pay for investigation into whether the simulation flaws are fundamental. Off to the side, above rejection and below non-fundamentalness but not between either, there is suspicion. To speak ostensively: WVO Quine's attitude toward Carnap's dogmas was suspicion - he neither thought them to be wastes of paper nor without flaw. The notion of "analyticity" - words being true by definition - seemed both worth investigating and a bad foundation for the reconstruction of scientific investigation.

As Quine was suspicious of an any empiricism that took analyticity as atomic, I am suspicious of the philosophies that in the main call themselves "Bayesianism". What exactly Bayesianism consists of and when it began is not easy to say because self-described Bayesians do not all agree with one another.

The followers of de Finetti and Savage would trace Bayesianism to the logicians Ramsey and Wittgenstein. And indeed, whoever you think deserves credit as the origin of Bayesian philosophy, one must grant that Ramsey & Wittgenstein gave unusually clear statements of Bayesian philosophy. In Wittgenstein, the intuition of Bayesian philosophy were expounded in set of propositions 5.1* of his Tractatus. Ramsey gave these notions a more explicit development paper Truth And Probability. The particular style of the arguments on pages 19 - 23 of this paper has become known as "Dutch Book Arguments".

The general concept is simple. We start with a Wittgensteinian metaphysics: each possible state of the world corresponds to some proposition p. The propositions can be "arranged in a series" - there is an ordering Rpq such that there is an isomorphism between the propositions and the real numbers that respects the probability calculus. This ordering is something like "q is at least as good as p". Further, Ramsey realized, the relation R comes pretty near to an intuitive idea of rational behavior. And even more interesting is that the converse also holds (or nearly holds): "If anyone's mental condition violated these laws ... [he] could have a book made against him by a cunning bett[o]r and would then stand to lose in any event". This philosophical interpretation of two way equivalence between orderings on propositions and probability calculus is what I call the first dogma of Bayesianism.

Many other Bayesians - such as the followers of Jaynes - would trace it to the great physicists Laplace and Gibbs. The Bayesianism of early physicists is, granting that it exists at all, implicit and by example. Gibbs asked us to imagine a large collection of experiments floating in idea space. We should expect* that our actual experiment could be any one of those experiments, the great mass of which are functionally identical. Consider a classical ideal gas held in a stiff container. In the ensemble of possible experiments, there are a few where the gas is entirely in the lower half of the bottle. But the great mass of possible experiments the gas has had time to spread through the bottle. Therefore, for the great mass of possibilities, the volume the gas occupies would be the volume of the bottle. This means that if we measure the temperature, we get the pressure for free. Using Iverson Brackets, we see that for each possible temperature t and pressure p, Prob(P=p|T=t) = [p =(nR/V)t] or near abouts. The information we get out of measuring the temperature - that is, putting a probability distribution on temperature - is a probability distribution on pressure. The lesson the Gibbs example teaches ostensively is - supposedly - that what we want out of an experiment is a "posterior probability". This is the second dogma of Bayesianism.


Both dogmas are open to question. Let us question them.


It is almost too easy to pick on the Dutch Book concept. Ramsey himself expresses a great deal of skepticism about the general argument - "I have not worked out the mathematical logic of this in detail, because this would, I think, be rather like working out to seven places of decimals a result only valid to two.". Ramsey even gives a cogent criticism of the application of Dutch Book Argument (one foolishly tossed aside as ignorable by Nozick in The Nature Of Rationality):

"The old-established way of measuring a person's belief is to propose a bet, and see what are the lowest odds which he will accept. This method I regard as fundamentally sound; but it suffers from being insufficiently general, and from being necessarily inexact. It is inexact partly because of the diminishing marginal utility of money, partly because the person may have a special eagerness or reluctance to bet, because he either enjoys or dislikes excitement or for any other reason, e.g. to make a book."

The assumption "Value Is Linear Over Money" implies the Ramsey - von Neumann - Morgenstern axioms easily, but it is also a false empirical proposition. Defining a behavioristic theory whose in terms of behavior towards non-existent objects is the height of folly. Money doesn't stop becoming money because you are using it in a Bayesian probability example.

This brings us to the deeper problem of rationality in non-monetary societies. Rationality was supposed to reduce the probability calculus to something more basic. But now it seems to imply rationality was invented with coinage. Did rationality change when we (who is this 'we'?) went off the gold standard? These are absurd implications but phrasing a theory of rationality in terms of money seems to imply them. What about Dawkins' "Selfish Gene", isn't its rational pursuit of "self-interest" (in abstract terms) one of the deep facts about it? Surely this is a theory that might be right or wrong, not a theory a priori wrong and not a theory a priori wrong because genes don't care about money.

Related to this is that Dutch Book hypothesis seems to accept that people are "irrationally irrational" - they take their posted odds far too literally. As phrased in the Stanford Encyclopedia Of Philosophy article on Dutch Books

"An incoherent agent might not be confronted by a clever bookie who could, or would, take advantage of her, perhaps because she can take effective measures to avoid such folk. Even if so confronted, the agent can always prevent a sure loss by simply refusing to bet."

This argument is clarified by thinking in evolutionary terms. Let there be three kinds of birds: blue jays, cuckoos and dodos. Further assume that we've solved the problem from two paragraphs ago and have an understanding of what it means for an animal to bet. Dodos are trusting and give 110%. Dodos will offer and accept some bets that don't conform to the probability calculus (and fair bets). Cuckoos - as is well known - are underhanded and will cheat a dodo if it can. They will offer but never accept bets that don't conform to the probability calculus (and accept fair bets). Blue jays are rational. They accept and offer only fair bets. Assuming spatially mixed populations there are seven distinct cases: Dodos only, Blue Jays only, Cuckoos only, Dodo/Blue Jay, Dodo/Cuckoo and Dodo/Cuckoo/Blue Jay. Contrary to naive Bayesian theory, each of the pure cases is stable. Dodo-dodo interaction is a wash, every unfair bet lost is an unfair bet won by a dodo. Pure blue jay and pure cuckoo interactions are identical. Dodo/blue jay interactions may come out favorably to the dodo but never to the blue jay, the dodos can do favors for each other but the blue jays can neither offer nor accept favors from dodos or blue jays. The dodo is weakly evolutionarily dominant over the blue jay. Dodo/cuckoo interaction can only work out in the cuckoo's favor, but contrariwise a dodo can do a favor for a dodo but a cuckoo cannot offer a favor to a cuckoo (though it can accept such an offer). We'll call this a wash. Cuckoo/blue jay interaction is identical to cuckoo/cuckoo and blue jay/blue jay interaction, so neither can drive the other out. Finally, a total mix is unstable - dodos can drive out blue jays. The strong Bayesian blue jay is on the bottom. Introducing space (a la Skyrms) makes the problem even more interesting. One can easily imagine an inner ring of dodos pushing an ever expanding ring of blue jays shielding them from the cuckoos. There are plenty of senses of the word "stable" under which such a ring system is stable.

What does the above analysis tell us? We called Dodo/Cuckoo interactions a wash, but it really depends on the set of irrational offers Dodos and Cuckoos accept & reject/offer. As Ramsey says, for a young male Dodo "choice ... depend[s] on the precise form in which the options were offered him...". Ramsey finds this "absurd.". But to avoid this by assuming that we are living in a world which is functionally only blue jays seems an unforced restriction.


The logic underlying Bayesian theory itself is impoverished. It is a propositional logic lacking even monadic predicates (I can point to Jaynes as an example of Bayesian theory not going beyond propositional logic). This makes Bayesian logic less expressive than Aristotlean logic! That's no good!

Such a primitive logic has a difficult time with sentences which are "infinite" - have countably many terms - or even just have infinite representations. (An example infinite representation of the rational number 1/4 is a lazily evaluated list that spits out [0, ., 2, 5, 0, 0, ...]) Many Bayesians, such as Savage and de Finetti, are suspicious of infinite combinations of propositions for more-or-less the same reasons Wittgenstein was. Others, such as Jaynes and Jeffery, are confident that infinite combinations are allowed because to suppose otherwise would make every outcome depend on how one represented it. Even if one accepts infinite combinations, there also is the (related?) problem of sentences which would take infinitely many observations to justify.


An example of a simple finite sentence that takes infinitely many experiments to test is "A particular agent is Bayesian rational.". This is a monadic sentence that isn't in propositional logic. Unless one tests every possible combination of logical atoms, there's no way of telling the next one is out of place. The non-"Bayes observability" of Bayes rationality has inspired an enormous amount of commentary from people like Quine and Donald Davidson who accept the Wittgensteinian logical behaviorism that Bayesianism is founded upon.


So, is this literature confused or getting at something deeper? I don't know. But I believe that one can now see the reason Dutch Book arguments are hard to knock down isn't because there aren't plenty of criticisms. The real reason is the criticisms don't seem to go to the core of the theory. A criticism of Dutch Book needs to be like a statistical mechanics criticism of thermodynamics - it must explain both what the Dutch Book gets as well as what it misses. There does seem to be something there which will survive.

What You Want Is The Posterior

Okay okay okay, be that as it may, isn't it the case that what we want is the posterior odds? If D is some description of the world and E is some evidence we know to be actual, then we want P(D|E), right? Well, hold your horses. I say there's the probability of a description given evidence and probability of the truth of a theory and never the twain shall meet. To see it, let's meet our old uncle Noam.

Chomsky was talking about other people. Who he thought he was talking about isn't important, but he was really talking about Andrey Markov and Claude Shannon. Andrey Markov developed a crude mathematical description of poetry. He could write a machine with two states "print a vowel" and "print a consonant". He could give a probability for transferring between those states - given that you just printed a vowel/consonant, what is the probability you will print a consonant/vowel? The result will be nonsense, but look astonishingly like Russian. You can get more and more Russian like words just by increasing the state space. Our friend Claude Shannon comes in and tells us "Look, in each language there are finitely many words. Otherwise, language would be unlearnable. Grammar is just a grouping of words - nouns, verbs, adjectives, adverbs. By grouping the words and drawing connections between them in the right way we can get astonishingly English looking sentences pretty quickly.".

Now our "friend" Chomsky. He says "Look, you have English looking strings. They're syntactically correct. But you don't have English! The strings are not semantically constrained. There is no way to write a non-trivial machine that both passes Shakespeare and fails a nonsense sentence like 'Colorless green ideas sleep furiously.'! In order to know that sentence fails the machine needs to know ideas can't sleep, ideas can't be green, green things can't be colorless and one cannot sleep in a furious manner.".

Now, this is a scientific debate about models for language. Can it be put in a Bayesian manner? It cannot. The Markov-Shannon machine - by construction - matches the probabilistic behavior of the strings of the language. But it cannot be completely correct (otherwise, the set of regular expressions would be the Turing complete languages). Therefore, the statement that "What we want is the posterior odds!" cannot be entirely true.

I have said before that I am in the "Let Ten Thousand Flowers Bloom" school of probability. I think these arguments are interesting and worth considering. But it is also clear that despite books like Nozick's The Nature Of Rationality and Jaynes' similar tome, Bayesian philosophy does not replace the complicated mysteries of probability theories with simple clarities.

There is a further criticism that applying behavioristic analysis to research papers is unwise but I will make that another day.

* Both Ramsey's paper and Gibbs' book assume that the expectation operator exists for every relevant probability distribution. This is a minor flaw that can be removed without difficulty, so I will not mention it again.

Friday, January 12, 2018

Absolute And Comparitive Advantage



Adam Smith

Adam Smith imagined two firms with a choice of production of two products*. The two firms' entrepreneurs are Alice and Bob. Alice and Bob have the choice of producing two independent goods - maybe apples and blackboards. The firms have what has become known as "constant technology", the ratio of their outputs and their inputs is a constant independent of said quantities (different for each firm and product to avoid indeterminacy). Unemployment is ignored in both inputs and outputs - all inputs are bought and all output sold (Say's Law). The law of one price holds.

Each entrepreneur is then left with only one decision: how much of each good shall I make?

Leonid Kantorovich


The fastest way to this answer is through the theory of linear programming. Start by denoting \( L_{firm,product}\) the amount of labor Alice or Bob uses to make apples or blackboards. If \(L \) is the amount of labor available in society, then we have as a constraint

\[ L_{Alice,apples} + L_{Bob,apples} + L_{Alice,blackboards} + L_{Bob,blackboards} = L \]

The outputs are denoted \( Y_{firm,product} \) and the prices are \(p_{product}\) so that the total output is

 \(Y = p_{apples} (Y_{Alice,apples}+Y_{Bob,apples})+p_{blackboards} (Y_{Alice,blackboards}+Y_{Bob,blackboards}) \).

Finally, the technical coefficients are \( a_{firm,product} = Y_{firm,product} / L_{firm,product} \) . Our goal then is to maximize the above equation We know from LP theory* that only the vertices matter. Why? Well, the short answer is interior point optimization. Look at this picture:


The dark lines are the constraints and the darkened point is a guess at the optimum. The line through that point has a slope equal to the price ratio. It's obvious that more output by moving to a guess in the direction of the arrows that is still feasible. The triangle  made by the price ratio and the constraints (and its interior) are the "interior points". If an interior point set is one point, it must be the optimum. It's obvious that unless the price ratio is exactly the same slope as one of the boundaries, the only way to squeeze the interior point set down to one point is to chose one of the verticies.

Okay, so let's cycle through those verticies. The solution where no labor is used is the global minimum (the entire feasible set is the interior point set!). This also doesn't match the full employment constraint, so toss it.

Next, there's the where only one firm is buying labor to produce output (so that three \( L_{i,j}=0\) and one \( L_{i,j}=L\). We'll call this the \(Y_{one}\) solution as in "Why one?". Mathematically, it is written

\[ Y_{one} = max_{i,j}(p_j a_{i,j}L) \]

The next class of solutions is the instructive one. In these vertices, the non-negativity constraint is active on two possibilities - that is to say: labor is being hired for two reasons. But there's a problem. If both entrepreneurs  are making apples, then \(Y = p_{apples} (a_{Alice,apples} L_{Alice,apples} + a_{Bob,apples}L_{Bob,apples}) \). But if Alice is more efficient, then this can't be a maximum, because moving some labor from Bob's firm to Alice would increase output. This knocks out the two competitive verticies and leaves the four non-competitive verticies. Either one firm allocates labor two both products (and the other is dead) or two firms specialize in one product.

Read the above paragraph again, slowly. It's important to understand the next part. If you look at the verticies with three or all four labor hiring reasons, then the accounting for \( Y \) always has at least one term like the above. For instance, one of the three term verticies is \(Y = p_{apples} (a_{Alice,apples} L_{Alice,apples} + a_{Bob,apples}L_{Bob,apples}) + p_{blackboards}L_{Bob,blackboards} \) . But this can't be a maximum, because of the above paragraph - if Alice is more efficient in apple growing we can get more output by moving some labor from Bob's firm to hers.

There are therefore only possible three classes of extreme verticies: Alice or Bob make one thing, Alice or Bob make everything and finally Alice and Bob specialize.

David Ricardo

Phew! The great economist David Ricardo was uncomfortable with Smith's story. The vertices where one firm did everything and the other was non-existent troubled him.

Why would a non-existent firm trouble someone? Well, Ricardo was exploring an analogy between firms and nations. Nations have different technical coefficients - nobody needs to explain why California produces more wine than Utah or why Kazakhstan produces more uranium than Italy. But the idea that a country could "dominate" and produce everything was very troubling - not to mention that it seemed contrary to the facts.There was a political economy problem as well. Absolute advantage seems to suggest that global and local output could be at cross terms - global output might be maximized at the minimum of local output.

Ricardo "solved" this problem by ... restricting free trade! Yes, the classic argument for free trade assumes trade restrictions. Ricardo supposed that labor and capital couldn't move over national borders, but that consumption goods can.

John von Neumann

In the language of Linear Programming, Ricardo breaks the one labor constraint in the above problem into two:

\[ L_{Alice,apples} + L_{Bob,apples} =L_{Alice} \]

\[ L_{Bob,apples} +L_{Bob,blackboards} = L_{Bob}\]


Now the one firm produces everything equilbria are knocked out - they don't satisfy the above constraints. How they will specialize depends on the price ratio and technical coefficients, but they must always produce.




Comparative advantage is often thought of as a "long run" view - this is the supposed justification for the assumption of full employment and full consumption. But if capital and labor can flow over borders, it is a short run view that ignores unemployment and underconsumption.


Or is there a better interpretation of how comparative advantage works?

* The Adam Smith of my imagination.

Thursday, December 28, 2017

Who Cares About Ergodic Systems?

This is a quick teaching post. This stuff is high school level, but to make it formal can push it beyond the research level into high level philosophy.

 
The creator of Dr Pepper, Dr Alderton
 
First, some physical intuition. I pour some Dr Pepper into a cup. What shape does the fluid take on? There are really three fluids - Dr Pepper (largely water, basically incompressible), carbon dioxide (which takes the shape of tiny, interacting bubbles) and ambient air. There are many forces - solid resistance, buoyancy force, skin and interaction forces on the bubbles, gravity and (since the Dr Pepper and carbon dioxide are colder than the ambient temperature) thermal forces. The short run question of what happens to the fluids is complicated and depends on many tiny factors.

Despite this, the long run solution is easy - basic fluid statics tells us that the Dr Pepper will take the form of the cup and basic thermostatics tells us it will be the same temperature as the ambient atmosphere.

 
John von Neumann

How do we capture this intuition that - roughly - in the short run history matters but in the long run only structure matters? For many years, physicists and mathematicians have turned to Ergodic Theory to answer this question. Ergodic theory doesn't exactly have a great reputation.

Many people - including high powered top level experts - think that not only does ergodic theory require the formal manipulation skills of a von Neumann, the geometric insight of a Clerk Maxwell and the engineering experience of a Shannon - it doesn't even solve the problem.

But really ergodic theory is very simple - except for all the parts that are hard. Shannon's paper can be polished off in a couple days, and (with all due respect to Joe Doob) it's not clear that there is more to the theory than that.

You don't want to take a few days. Well, here's the few minutes version.
 

A connected and a disconnected network

We start with the intuitive idea of a network. We call the nodes the state. There are finitely many states and the each have a name. From a given state, there is a rule to transfer to one of the other states to which that node is connected. The rule and the network together are called the system. In theory the rule can be anything, for instance it might be "always go as far down as possible" where down is defined geometrically or topologically. The rules can be probabilistic.

A system is called "ergodic" if the long run amount of time spent at each node is independent of which node you start at. The idea of state gives us the short run detail dependence and the ergodicity gives us long run structure dependence.

For my deliberately dumb "go as far down as you can" rule on the above connected network, I have six possible runs

Pink, Purple, Purple, Purple...
Brown, Purple, Purple, Purple...
Blue, Black,  Purple, Purple, Purple...
Orange, Purple, Purple, Purple...
Black, Purple, Purple, Purple...
Purple, Purple, Purple...

No matter where I start, the long run relative frequency \(f_{purple} = 1\) and all others are \( 0 \). Therefore, this dumb system is ergodic. If we try the same thing on the disconnected network:

Black, Brown, Pink, Pink, Pink...
Brown, Pink, Pink, Pink...
Purple, Orange, Orange, Orange...
Blue, Orange, Orange, Orange...
Orange, Orange, Orange...

For the first two starting places the relative frequency of pink goes to one, for the second three, the relative frequency of orange goes to one. This dumb system is non-ergodic. But notice it is two ergodic pieces. In general, a non-ergodic system can be severed into ergodic components (in this case, the two connected subnetworks).

The underlying being connected isn't in general sufficient for being ergodic. On the above left graph, sever the Orange-Purple connection and follow the "go down" rule (question to check if you understand: what are the two ergodic subsystems?). It turns out* kinds of rules that are of physical interest are usually of the form "Given that I am on state N, I go down each connection NM with a certain probability \( p_{NM}\neq 0 \)". For such a rule, being connected is sufficient for ergodicity**. So in this informal blog post I'll choose rules and networks such that connectedness and ergodicity are equivalent.

The ergodic distribution tells us the long run behavior of the system, but it also teaches us about the medium run behavior. We know that if the frequency at a state is "too low" (compared to the ergodic frequency), then we will see a flow into that state. This is more or less a definition of what a flow is!


This is all well and good, but what does it have to do with physics? A continuous system is ergodic if the one can cut up the possible states of the system into a discrete ergodic system. Let's make a pair of networks out of a physical model - a billiards model. I mentally divide a square billiard table into four regions A, B, C and D


Being in a region isn't sufficient to fix the dynamics - I need to know the velocities. I think that it's obvious that velocity digitizes into four chunks based on the number of regions away from the starting region you end up in after a time step. So the states are really:

A0, A1, A2, A3
B0, B1, B2, B3
C0, C1, C2, C3
D0, D1, D2, D3

Each 0 connects only with itself (remember, the billiard isn't necessarily staying still, it could be through all four blocks in one tick). There's a cycle A1 connects to B1 connects to C1 connects to D3 connects to C3 connects to B3 connects to A1. There are three cycles of length two, A2 connects to C2 connects to A2, B2 connects to D2 connects to B2 and A3 connects to D1 connects to A3. This is illustrated below



This particular digitization of the underlying continuous system isn't ergodic. If you start off with A0, then \( f_{A0} =1 \), if you start off with a non-zero velocity state, then \( f_{A0} =0 \). That's enough to show that this isn't an ergodic system.

It turns out that there is no nontrivial digitization of this system that is ergodic. That's because this system is exactly solvable... and I won't tell you why that's connected to ergodicity***.


Let's put a circular block in the middle of the square. Now the graph isn't disconnected. A particle that started four blocks per tick can have it's angle of attack by hitting the circular block to now be 3 blocks per tick (that is, it may be turned around because 3=-1). I don't know if this particular graph is really ergodic and I'm not going to check. In a tour de force, Yakov Sinai proved that this system has an ergodic digitization. This shows that the system is itself ergodic.

That means if the particle isn't in, say, C enough (compared to the ergodic distribution) we will see a flow towards C, just as in the discrete case. This is how ergodicity connects to physical quantities.

Finally: wasn't that Black Thought freestyle great?

*By the magic of symbolic dynamics
** By the magic of Markov Chains
*** I'll let wikipedia do it

Thursday, December 21, 2017

What We Talk About When We Talk About Food

The vast majority of Discourse about health is lies. Go to a supermarket and look at the "health shakes". Beyond the outright lies that are the overwhelmingly greater part there are the tentative part-truths - usually represented as absolute certainties. Supposedly there could logically could be definite truths. I have never seen them.

How do we talk about food? In episode 2 of Frasier, Frasier Crane discusses his preferred breakfast with his father. A "low fat, high fiber" breakfast with (terribly expensive) plain black coffee. There's a lot to be said about the implicit social views of eating in this scene. But instead imagine this: out of the woodwork I wander onto the show and say "Actually Dr Crane, you would be better off substituting that bran muffin for bacon.".

What does that mean? A calorie neutral substitution? A mass neutral substitution? A subjective substitution? The second is not a joke - it's clear (is it?) that a person who is fed via IV would "feel hungry" and attempt to eat whether the IV was sugar or oil. In the language of economics such a person would hit their first order conditions for diet optimality (I.e. calories would be right) but not their second order conditions. They're at a minimum utility. This cannot last. Believing in diet advice unstable to perturbations is unscientific - completely methodologically unsound. The third option is not necessarily unscientific - subjective feelings of fullness are related to the health relavent properties of food. Attempting to lose weight via a simplistic, objective calorie/mass accounting system may again put you at an unstable equilbrium. These kind of yo-yo unstable diets aren't obviously healthy.

The market for food is broken, possibly irrevocably broken. There has never been, in all the history of mankind, a society where farm labor is economically valued. As a result, all industrial societies prop up production - often in highly distortionary ways. It is obvious that, for instance, the US overproduces corn carbohydrates. This is a Bad Thing.

On the consumer side, it must be admitted by any person who desires to be taken seriously that branding and monopolistic competition more generally are real. Government intervention has been ineffective at policing this, even when it has been pointed in the right direction. This is unsurprising - Coase on the right and Stiglitz on the left are always fond of pointing out that the conditions in which governments/markets can fail are the exact conditions markets/governments can fail.

This brings us back to the first point. How do we talk about food? There is an enormous signaling problem here. Just as every prospective worker has an incentive to appear to be valuable to a prospective employer, so every prospective meal has an incentive to appear to be healthy to a prospective eater. (Healthiness & productivity of course defined variationally) Eaters therefore statistically discriminate, choosing foods with a few easily observed outward signs of "healthiness". Sugary cereals put photos of nutritious meals on the box. Having blueberries and a green container turns a malted into a diet food. More informative statistics - carbohydrate and protein and fat measures, total calories, ingredients, etc. - are buried in confusing, neutrally colored, small type statistical abstracts.

Adding exercise to a given diet is generally good, since exercise to a large extent determines the distribution of variable masses for a person of a given mass. We may not be indifferent between being Akebono and Bob Sapp. As Kimball notes above & every bodybuilder knows - total weight balances the mass of food that comes in and goes out. (Also, cardio health is good, even though the mass of the cardio system isn't particularly variable in mass) But there are huge difficulties here. First, it isn't methodologically sound to assume a person can vary exercise and not vary their diet (it assumes that their old equilibrium was unstable, which is exactly what it is not for an obese person who can't lose weight). The market for exercises is not obviously healthy. Like with food, every prospective exercise plan has an incentive to seem healthy & sustainable even if it is not. Survivor bias is endemic here - everyone who keeps up My Super Special Program long enough achieves their weight & weight distribution targets and everyone who doesn't leaves.

You can't sit around thinking about your diet all day. Simplistic accounting techniques can beat advanced techniques just because they're easier to understand. It may not be easy to account for carbohydrate, protein or fat intake simply because their are so many kinds of carbohydrates, proteins and fats that eating the "wrong" kind may not give the eater noticeable feedback. Fasting can then empirically outperform theoretically superior modes because it's easy to notice when you cheat.

These are only the simplest economic metaphors. Beyond this there are cultural and even political factors. But that'll have to wait for another post.

Saturday, December 16, 2017

Nozick On ... Inequality?

Robert Nozick was a Harvard philosopher, a political philospher among other things. He was an odd duck with an interesting sense of humor - speculating that autofellatio plaid a role in classical Hindu yoga was typical of his crass jokes. But he was very serious about one thing - Nozick sincerely believed in a philosophical theory of social desert - that one should be allowed to own all and only the goods entitled to you. Nozick traced this theory to the Lockean theory of production & distribution. Locke believed that one owned those goods which one mixed with one's labour (if you were European anyway).

Nozick brought his beliefs to so-called "libertarian" ends. If you are entitled to those goods to the extent which you mixed your labor into them, then it is not clear that you are entitled to any public goods at all. Nozick had an argument - the Utility Monster argument - that utilitarianism (the philosophical position that public policy should aim at some measure of aggregate happiness) could not be a priori true. Consider a society consisting of two consumer classes, one with decreasing returns to consumption (normal people) and one with increasing returns to comsumption (utility monsters). Holding a nations's output constant, the utilitarian political advisor says it is always worth it to tax the normal people and subsidize the utility monsters, which seems unjust a priori. Nozick says, shortly, that utilitarianism is false because it can excuse income inequality.

But here Nozick reaches an impasse. You see, it's not clear that a entitlement theory of desert avoids income inequality. In fact, Nozick argues that entitlement is true despite the fact that it can justify income inequality. His argument is not complex. First of all, he assumes that the set of just objects is closed - any outcome that is reached by individually just actions is just. Next he constructs a possible world where income inequality seems justified. In this imaginary Pittsburgh, everyone starts with $1. But one person is special - he's Wilt Chamberlin. If even one person pays 1¢ to see an exhibition from Wilt The Stilt then he becomes - through no fault of his own - the richest man in Pittsburgh. Why is this unjust?

There are several responses to this. One is tu quoque - why is this income inequality obviously good but utilitarian income inequality obviously bad? But there are sharper critiques. Despite the appearance of dollars and cents, Nozick's example is not economic. The people of imaginary Pittsburgh are not given alternate uses for their money. Why would they hold cash? Why does Chamberlin hold cash? This points us to the deeper problem arising from Nozick's economic ignorance - income is a flow but he treats it as a stock. It is of interest to Nozick that Chamberlin's capital recieves dividends, but it's clear he hasn't placed those dividends in a society - his "possible world" is unworthy of the name.

The choice of Wilt Chamberlin is careful rhetoric. Most capital that pays dividends as the Wilt Chamberlin case can be transferred - in the case of extreme regimes, land can be nationalized, factories can be seized, etc.. But Nozick clearly believes Chamberlin's God given talents are just that - in born natural talent (that Nozick thinks this about Chamberlin brings up the issue of race in ways I won't address) that can't be transferred. There might be a similar point about transferable utility in the utility monster case. The only way to redistribute the returns on Chamberlin-capital is to tax the income - right?

Well, maybe maybe not. Most modern theorists on inequality concentrate on wealth inequality rather than income inequality. Unlike income inequality, wealth inequality doesn't correlate with Wilt Chamberlin like capital - not with things that a priori seem justified. Not IQ, for instance. This is a positive, not a priori case but it is still a hole in Nozick's point. Even if one believes that Chamberlin deserves the income he recieves from his non-transferable capital, that doesn't mean that one believes he deserves his stock of wealth. So Nozick's argument is all a little bit old fashioned.

In sum, I think that Nozick's argument is unpersuasive for two reasons. It isn't obvious that he solves what he thinks are problems with other theories and even the internal coherence of his story is questionable. Still, Invariances is a good book.

Wednesday, August 30, 2017

Price Gouging Is Bad

 J L Austin

J L Austin once wrote a book How To Do Things With Words. One of the most important - or at least most analyzed - branch of words is "prices" - those signals that firms use to advertise their willingness to part with wares. Prices reflect many things: cost of production*, noise, willingness to purchase and the spatial, temporal & political relations between the seller(s) and the purchaser(s).

Milton Friedman, Theodore Schultz & George Stigler

There's an easy case. In the Heaven of "perfect competition", the price of a good balances two aspects: 1) the aggregate choices of all consumers and potential consumers is indifferent between purchasing and not purchasing an extra bit of that good and 2) the aggregate of all producers must be indifferent between manufacturing and not manufacturing an extra bit of good.

In this topsy-turvy Never-Never Land, "price gouging" - sudden price rise immediately after a natural disaster - is Actually Good. Technocratically and morally good. A rising price reflects a greater need on the part of the consumer, which will be met by profit hungry producers**.

Despite what Richard Posner tells you, we do not live in this place.

Harold Hotelling


It is simply not the case that the rise in price necessarily reflects a change in demand. The change in price can reflect a rise in the monopoly power of firms - a flood creates huge transaction costs. Recall the famous Hotelling Spatial Model of competition - one of the earliest completely specified monopolistic competition models. What happens when the transaction costs increase? We already know this. Transparency goes down, consumer surplus is consumed ... and profits go up. Exactly what is observed.

Ed Chamberlin

Neither is it the case that profit hungry firms can necessarily enter the market to meet demand. In order for a firm to enter a market, the long term expected profit to an entrepreneur must be non-negative. They must be able to overcome, for instance, fixed costs and compete with established firms with increasing returns. A flood creates higher fixed costs for entry - depressing the number of firms that can enter. A bit of price gouging is probably not enough to overcome this effect.



Okay, but let's say you really want to believe in this "perfect competition" story. Maybe it isn't right in detail but you think it gives you the right ... laws of motion. Maybe not always but on average, in a broad sense of the term "average". This was Ol' Frank Knight's opinion on the nature of perfect competition predictions, so you're in good company.

Yes, you admit that most people who hold this position are just contrarians who haven't thought beyond the textbook case. But you're not. You genuinely believe that local multipliers are generally strong enough that price increases - perhaps alongside government spending - generally returned devastated regions to the "status quo ante clades". This is a defensible position, econometrically. At least with small disasters, it seems to be true: every rainy day increases transaction costs - but they don't all destroy the city.

Then how do you take into account that this isn't true in general? Do you think that the unregulated markets of the late 19th century just didn't gouge enough?

I'll make my long story short: automatic disaster relief > price gouging. It's true that there should be changes in economic fundamentals: rain taxes, infrastructure investment, enforcing flood insurance laws, fixing zoning so that flood absorbing lands aren't eaten by sprawl (this is probably irreversible at this point - urban sprawl is one of worst ecological disasters in history but nobody does anything about it...) - but locally, around the disaster the important thing is to get spending back and let the multiplier work itself out. There's no a priori reason to think price gouging will help and not hurt.


Prices are signals, words. Don't think those words can't be "Screw you!".

*"cost" should be understood in a very wide sense.

**It has been well established since Walras that "perfect competition" means constant returns, so small firms can always come up to meet demand in that mystic realm.