03 September 2009

The Laws of Elbonia

A substantial body of comparative legal scholarship considers statements applicable to large, conceptually infinite numbers of countries. Such statements gain in credibility if they are supported by evidence from large samples of countries. Processing such vast evidence requires quantitative methods. Designing the requisite numerical measures of law is not straightforward, but an important insight from statistics suggests that this problem can be overcome by appropriate research design. While in practice considering more countries comes at the expense of less information per country, on balance large sample, quantitative research designs promise to yield interesting insights for comparative law.

From here (Holger Spamann of Harvard Law School, American Journal of Comparative Law, Forthcoming, Harvard Law and Economics Discussion Paper No. 32 "Large Sample, Quantitative Research Designs for Comparative Law?").

The (sad?) truth is that there are only a couple hundred countries in existence, not an infinite number of countries, and that due to historical ties and processes of legal development, there are far fewer real life examples of law in any area. No amount of statistical massaging can cure this reality.

UPDATED: The post title is a reference to a fictional country that frequently appears in the comic strip "Dilbert." It is a reference to the alleged "conceptuallly infinite number of countries" mentioned in the abstract of this paper.

While the state of affair claimed by the author in the abstract would be bad, in fact, the body of the paper and the footnotes it uses to support it assertion in light of the statements made in the body of the paper, undercut this claim. Instead, it recites that a large amount of international law studies actually only purport to compare two or three countries. The papers, several of which I have read, which are cited in support of the claim that they make "statements applicable to large, conceptually infinite numbers of countries," don't really support that assertion in anything but a tortured interpretation given to those papers by the author.

Of course, what I say in the original post in another way of saying that the data from the countries in the sample are not independent. For example, many of these countries used to be part of the same country and have since split up. Korea and Japan both made more or less literal translations of large parts of the German civil code into their own vernacular for their own civil codes.

The data points are also not comparable even when superficially similar. One of the main conclusions from projects attempting to strengthen international legal institutions is that rules and laws that a superficially the same don't act that way in the field; law's impact is deeply context specific. As the author notes in the study, two of the most widely cited examples of this kind of analysis have been discredited upon reanalysis.

Also, many countries have bad statistical data collection. For example, tax data and banking statistics may be far more accurate in some countries than others. GDP data are often wildly inaccurate in undeveloped economies, for example, because so much of these nations' economic activity takes place in the gray or non-monetary economy. Sample bias in regard to questions used as examples in the study, such as the relationship between how many words are put into contracts and legal environments, are highly vulnerable to sampling bias, and to country specific issues like word length (e.g., the German language compresses what would be phrases or sentences in English into single words). The bigger the sample, the more likely it is that bad data is introduced into the sample. This means that large samples pose a grave risk of increasing, rather than increasing error, but reducing average data quality.

Even seemingly high quality data sets, such as surveys of international business executives about corruption levels in particular countries, can mask bad data problems. Small countries with small economies and little international trade will be the subject of fewer informed data points than large countries with large economies and a great deal of international trade. This makes the sampling error much greater in smaller countries, but often this isn't disclosed. Indeed, some papers in comparative law that rely on survey data don't even consider issues of statistical significance at a country by country level at all. Likewise, even when the number of responding business executives in a survey like that is large enough to be statistically significant, the number of transactions or incidents that the business executives may be relying on for an assessment of a country that engages in little international trade may itself be very small.

Even relatively superficial international legal information, like determining citation forms for large numbers of countries (a project I worked on while on the Michigan Journal of International Law in law school) is much more difficult in practice, even with a premire international legal library, than one might suspect. Outside the OECD, even for countries as large and economically important as China, when using a library with a collection of legal resources reportedly as good as any library actually in China, the quality of the information available is depressingly poor, and hard to place in context. Amnesty International, for example, doesn't even purport to be able to accurately determine how many executions are carried out in China each year.

The paper does not meaningfully raise serious questions like weighting issues. Treating each country as an equal separate country gives disproportionate weight to small, economically insignificant countries. It also means that an event like the breakup of the Soviet Union or Yugoslavia significantly changes the sample size even when the underlying data may not have actually changed at all with regard to a particular issue. But, using GDP or population, for example, to give increased importance to a small number of big countries that dominant the sample, undermining the statistical benefits of having a larger sample size. For example, more than 40% of the world's population lives in India or China. A sample of twenty countries, carefully chosen, can easily include a supermajority of the world's population and GDP, but, of course, is harder pressed to capture issues like systemic difference between large and small countries.

When, independence, lack of comparability, weighting issues and data accuracy issues are considered, a larger sample may actually increase error.

For example, while civil law codes theoretically apply in Sudan, there are so few lawyers there, particularly in Southern Sudan which is only marginally under the control of the central government anyway, that any inclusion of quantitative data about Sudan into a sample of countries with a civil law code, would reduce the accuracy of the study.

Another subtle but fairly common quantitative data comparison between countries that is frequently misleading is a number of attorneys' per capita measure, because attorney means different things in different countries.

The kinds of impacts are frequently subtle yet serious. For example, one could do a quantitative study on comparative letter of credit law (a specialized form of bank guarantee often used in international trade) that would look very convincing until one knew that the vast majority of letter of credit instruments provide for arbitration and opt out of local law by agreement in favor of international chamber of commerce rules, and that the situations where local law on letters of credit were used differed greatly from country to country.

The concern about data quality is particularly great in large sample comparative law research because these studies are very often domestic policy driven. Many are indexes of "economic freedom" or "transparency" of some similarly vague concept prepared to a great extent for use in political debates.

The only pertinent publication in his biography is a 2006 paper concerning the "Anti-Director Rights Index." His 2008 paper on SSRN on the topic acknowledges that this leading example of large sample comparative law was so grossly inaccurate that its original results are basically meaningless.

How can we trust him to be any more honest or accurate than the original study author? Keeping the author honest is frequently a genuine concern in these studies; no one is pure in their academic interests here. It is hard enough to monitor data analysis in an in depth, small sample comparative law paper. Monitoring data quality with large samples is much harder. It also doesn't go without mention that the particular paper in question is sponsored by a foundation with a strong conservative policy agenda in its own work, which is characteristic of the law and economics subdiscipline that its funds generally. This doesn't help the paper's credibility.

And, since when did some subset of 180 data points become "vast evidence" incapable of analysis without quantitative statistical methods? Moreover, many comparative law questions, such as relative allocations of federal power, or differences in voting laws, have a total data set that is even smaller, because relative few countries have federal states, and not all countries conduct elections. When the actual data set is incomplete, the concerns about gains in statistical power from large data sets, relative to reduced statistical power from reduced data quality is a very serious concern. The statistical power of a study is also reduced if one is seeking to compare two incomplete subsamples of the total data set to each other, rather than simply determine some statistical quality of the entire world.

Even an expanded sample size is too small to seriously reduce experimental error unless is it very nearly complete. While a survey sample of 3,000 data points can provide very simple levels of accuracy whether the total population is 300,000 or 300,000,000, very small samples (say, those of 100 or less), very quickly become almost inaccurate as similar sample sizes drawn from a large population, even if the total data set is quite small. For example, in a truly statistically random survey of 100 members of a population of 180 on a yes or no question, the random sampling margin of error is +/- 6.55%. This low resolution of accuracy can capture only the most gross discrepancies.

His work advancing the same thesis for the American Association of Law Schools in the area of crime and punishment is similarly weak; the sociological literature on the subject treats the same issues much more rigorously. He admits this problem in his crime and punishment work he admits that this kind of study can never show causation because of a small sample size, but fails to quantify just how big the inaccuracies involved can get or the seriousness of the non-statistical sources of error that can't be cured by the law of averages (which doesn't apply in this context) are in this type of study.

When the only way you can do a job is poorly, it may be better not to do it at all. Studies with high margins of error can convey a misleading sense of accuracy and knowledge, when the most useful fact may actually be knowing that we don't know. If anything, this paper casts doubt on the desirability of every conducting large sample, quantitative research in comparative law. If this is the best case that can be made for it, maybe it isn't worth doing.

Holger Spamann's paper as found on the Social Science Research Network, which I cite above in the original post, is alarmingly sloppy and has a remarkably shallow analysis for a fifth year PhD student at Harvard Law School, even for a mere twelve page discussion paper. It doesn't engage the literature about law and economic development in a meaningful way, despite the fact that this is what the research method he is using is mostly used to address and despite the fact that this is his core area of PhD research. Most of the most serious issues, which are basic considerations in any undergraduate social science research design class, aren't even acknowledged. Indeed, given his academic background, it isn't obvious that he had this kind of field research design background as an undergraduate; his undergraduate degrees, which appear to be in economics and in law, both often omit that subject. Yet, surely has has at least a social science research design class in graduate school.

If I were teaching a undergraduate class where his paper was submitted, I might give it a B- at best; for a student at this level of his education, I wouldn't consider it passing work. This is hardly what one would expect of a Phd/SJD graduate student who is a specialist in this area at one of the nation's most prestigious law schools.

N.B.: Spamann's recent work as a junior author in a paper on banker's compensation may be much better. It has received national media and policy maker notice, and its basic conclusion is a sound one, although I haven't examined the details of that paper. I came to a similar conclusion as part of my recent conference presentation at the Law and Society Conference in Denver this year. Co-authorship also makes it hard to figure out whose work is whose. At any rate, economic analysis at this single individual's incentives levels in the well understood and measured American financial sector, is a very different matter than research design. One can be brilliant at one and a dismal failure in another.

No comments: