Peirce - Science: §16. Reasoning from Samples

§16. Reasoning from Samples

92. Many persons seem to suppose that the state of things asserted in the premisses of an induction renders the state of things asserted in the conclusion probable. The fact that Macaulay's essay on Bacon was admired in its day shows how little the absurdity of such a position was perceived. Even John Stuart Mill holds that the uniformity of nature makes the one state of things follow from the other. He overlooks the circumstance that if so it ought to follow necessarily, while in truth no definite probability can be assigned to it without absurd consequences. He also overlooks the fact that inductive reasoning does not invariably infer a uniformity; it may infer a diversity. I watch the throws of a die, I notice that about half are odd and half are even, and that they follow one another with the utmost irregularity. I conclude that about half of all the throws of that die are odd and that the odd and even follow one another with great irregularity. How can any principle of uniformity account for the truth of such an induction? Mill never made up his mind in what sense he took the phrase »uniformity of nature« when he spoke of it as the basis of induction. In some passages he clearly means any special uniformity by which a given character is likely to belong to the whole of a species, a genus, a family, or a class if it belongs to any members of that group. In this sense, as well as in others, overlooked by Mill, there is no doubt the knowledge of a uniformity strengthens an inductive conclusion; but it is equally free from doubt that such knowledge is not essential to induction. But in other passages Mill holds that it is not the knowledge of the uniformity, but the uniformity itself that supports induction, and furthermore that it is no special uniformity but a general uniformity in nature. Mill's mind was certainly acute and vigorous, but it was not mathematically accurate; and it is by that trait that I am forced to explain his not seeing that this general uniformity could not be so defined as not on the one hand to appear manifestly false or on the other hand to render no support to induction, or both. He says it means that under similar circumstances similar events will occur. But this is vague. Does he mean that objects alike in all respects but one are alike in that one? But plainly no two different real objects are alike in all respects but one. Does he mean that objects sufficiently alike in other respects are alike in any given respect? But that would be but another way of saying that no two different objects are alike in all respects but one. It is obviously true; but it has no bearing on induction, where we deal with objects which we well know are, like all existing things, alike in numberless respects and unlike in numberless other respects.¹⁾

93. The truth is that induction is reasoning from a sample taken at random to the whole lot sampled. A sample is a random one, provided it is drawn by such machinery, artificial or physiological, that in the long run any one individual of the whole lot would get taken as often as any other. Therefore, judging of the statistical composition of a whole lot from a sample is judging by a method which will be right on the average in the long run, and, by the reasoning of the doctrine of chances, will be nearly right oftener than it will be far from right.

94. That this does justify induction is a mathematical proposition beyond dispute. It has been objected that the sampling cannot be random in this sense. But this is an idea which flies far away from the plain facts. Thirty throws of a die constitute an approximately random sample of all the throws of that die; and that the randomness should be approximate is all that is required.

95. This account of the rationale of induction is distinguished from others in that it has as its consequences two rules of inductive inference which are very frequently violated, although they have sometimes been insisted upon. The first of these is that the sample must be a random one. Upon that I shall not dwell here. The other rule is that the character, toward the ascertainment of the proportionate frequency of which in the lot sampled [the sampling is done], must not be determined by the character of the particular sample taken. For example, we must not take a sample of eminent men, and studying over them, find that they have certain characters and conclude that all eminent men will have those characters. We must first decide for what character we propose to examine the sample, and only after that decision examine the sample. The reason is that any sample will be peculiar and unlike the average of the lot sampled in innumerable respects. At the same time it will be approximately like the average of the whole lot in the great majority of respects.

96. In order to illustrate the necessity of this rule I take a random sample of eminent persons. It is quite a random one, for it consists of the first names on pages 100, 300, 500, 700, 900, of Phillips's Great Index of Biography [Biographical Reference, second edition, 1881]. The names are as follows:

	Born	Died
Francis Baring	1740	1810 Sept. 12
Vicomte de Custine	1760	1794 Jan. 3
Hippostrates	(of uncertain age)
Marquis d'O.	1535	1594 Oct. 24
Theocrenes	1480	1536 Oct. 18

Now I might, in violation of the above rule of predesignation, draw the following inductions:

1. Three-fourths of these men were born in a year whose date ends in a cipher. Hence about three-fourths of all eminent men are probably so born. But, in fact, only one in ten is so born.

2. Three eminent men out of four die in autumn. In fact, only one out of four.

3. All eminent men die on a day of the month divisible by three. In fact, one out of three.

4. All eminent men die in years whose date doubled and increased by one gives a number whose last figure is the same as that in the tens' place of the date itself. In fact, only one in ten.

5. All eminent men who were living in any year ending in forty-four died at an age which after subtracting four becomes divisible by eleven. All others die at an age which increased by ten is divisible by eleven.

97. This rule is recognized in the requirement of physicists that a theory shall furnish predictions which shall be verified before any particular weight is accorded to it. The medical men, too, who deserve special mention for the reason that they have had since Galen a logical tradition of their own, recognize this rule, however dimly, in their working against reasoning »post hoc, ergo propter hoc.«. . .