Bayesian Reasoning Is Discouraged In Guilt-Innocence Trials.
The legal system, especially the criminal justice system, in the United States aspires mightily to avoid Bayesian statistical reasoning, in which your end analysis of probabilities is influenced by your "priors" - i.e. your expectation of what the probabilities will be going forward based upon past experience.
For example, one of the reasons that most criminal trials are jury trials is that the jury is not told (at least if the defendant refuses to testify, which he has a right to do without an adverse inference under the 5th Amendment) about a defendant's prior criminal record.
Defendants routinely decline to testify because this opens the door to admission of evidence of their prior criminal records.
But, empirical evidence suggests that even though juries are strongly admonished to ignore a defendant's failure to testify when rendering their verdict, that juries do, in fact, penalize defendant's for failing to testify to approximately the same degree that they penalize defendants with a prior criminal history for testifying when that is revealed to them.
Our willingness to admit Bayesian prior establishing evidence of witness credibility undermines our efforts to suppress Bayesian prior establishing evidence that someone has a propensity to commit a crime.
The British resolve this conundrum by allowing criminal defendants to testify without being under oath and without revealing their criminal histories, and not creating strong incentives for defendants to suppress this testimony probably does more to enhance the accuracy of fact finding in criminal trials than to undermine it. The mere fact that someone has a huge self-interest in providing self-serving testimony when one is a criminal defendant is more than sufficient to cause juries to consider that evidence skeptically, especially when the offense charged is a serious one, even without knowing that a defendant may have a significant criminal history.
Also, while it may create an unfair snowball effect for a jury to know that a defendant has a criminal history, increasing the probability of a wrongful conviction in the instance case where one would not have been made if the jury had not known that fact, criminal defense attorneys, if they are clever, can present evidence that strongly implies a lack of a criminal history without openly saying so. They can call their criminal defendant client to the stand to testify. They can try to sprinkle in references in passing to activities that someone with a criminal history couldn't have engaged in (like notarizing a document or practicing in some other licensed profession), they can solicit testimony about things like a long, continuous employment history that would be inconsistent with having served time in prison, and they can ask questions like "do you know how a drug test works?" that answered in the negative, indicates a lack of a criminal record. So, suppressing the truth can work.
And, given that the majority of participants in the criminal justice system are recidivists, and that juries presume as much, perhaps defendants with no criminal history ought to be entitled to disavow that presumption by telling the jury that fact. If people are inherently Bayesian in how they evaluate probabilities, and that is was a trier of fact in a criminal trial is charged with doing, than an absence of evidence to the contrary of a widely held Bayesian prior of the jury is substantively prejudicial.
Also, while a judge in a rare criminal bench trial will usually know a defendant's criminal history even though the judge is supposed to ignore it when rendering a guilt or innocence verdict, it is unlikely that they actually do so, and certainly, prosecutors negotiating plea bargains which is the source of the lion's share of criminal convictions do not ignore a defendant's criminal history.
So, while the criminal justice system gives the appearance of not considering a defendant's prior criminal history in an effort to make the process seem more legitimate, in practice, the signaling of guilt caused by a failure to testify and other cues inevitably influences juries just as strongly, and undermines one of the foundational premises of the exclusion of Bayesian prior propensity evidence from juries.
The theory is that we want to avoid a "snowball effect" in which past arrests (whether or not resulting in convictions) and past convictions, increase the probability that someone will be wrongfully convicted going forward.
Similarly, we exclude "propensity" evidence at trial, unless that evidence is so particularized and compelling that the modus operandi it demonstrates has become an identifying trademark linking a crime to a defendant.
On the other hand, this disdain for Bayesian priors in the criminal justice system evaporates once we depart the trial that determines guilt or innocence on the merits.
Also, practices in the guilt or innocence trial phase designed to suppress criminal investigation practices based upon inappropriate Bayesian priors, such as the Fourth Amendment exclusionary rule, which acquits people known to be guilty or makes their acquittal or favorable plea bargain more likely, undermines the legitimacy of the formal guilt-innocence phase of the trial in the eyes of law enforcement and victims, which can cause law enforcement and victims to feel morally justified in engaging in retaliatory cheating through false testimony in affidavits, hearings and trials, when the believe that a defendant is guilty but present the evidence which is the actual basis of their beliefs to a judge or jury.
Bayesian Reasoning In Criminal Investigations
Police routinely focus on the "usual suspects" with prior criminal records when investigating crimes, because the usual suspects are usually the people who are guilty.
Knowledge of an individual's prior criminal history can either mean that a suspect has no legally enforceable right to privacy because he is on probation or parole, or can be used by law enforcement and prosecuting attorneys as one part of demonstration of probable cause for a search, a seizure, a wiretap, an arrest, an indictment for a felony, or a use of force, or as part of demonstration of reasonable suspicion for a "Terry stop," tested in a preliminary hearing, a grand jury presentation, or in defending a civil rights lawsuit.
Police are legally prohibited from resorting to Bayesian priors based upon race in making decisions with a probable cause or reasonable suspicion threshold, unless, of course, a witness or the electronic or photographic equivalent or DNA evidence establishes or at least strongly suggests a suspect's race. But, of course, there is pervasive evidence that police do so anyway in almost all matters in which they are vested with discretion.
Police are also permitted to consider, and routinely do consider, arrest records of people who are not convicted based upon arrests in their investigative work. And, it is often through this portal that impermissible racial bias leading to Terry stops and arrests of minority individuals (especially young minority men) who are not guilty of the offense for which they are being arrested on suspicion of, without probable cause, are laundered into arrest records that police are allowed to legally consider, and custodial searches that reveal evidence of crimes such as drug possession or driving with a suspended license, that would never have been discovered but for the wrongful stop.
The Fourth Amendment exclusionary rule seeks to limit the extent to which effectively random searches and arrested based solely upon racial priors turn into convictions, but police frequently treat arrest records which are the result of police discretionary decision making, as more credible than convictions, which are influenced by plea bargaining between prosecutors and criminal defenses lawyers for all sorts of reasons (the lion's share of convictions are a result of guilty pleas rather than trials). So, the snowball effects that the system tried to avoid at trial absolutely emerge in the pre-trial investigation portion of the criminal justice process.
Because racism is fundamentally somewhat flawed Bayesian reasoning, rather than mere ignorance, both white and minority police officers often employ it.
But, this is problematic in multiple respects.
One is that it lead to snowball effects that create pervasive, lasting, inappropriate bias that can turn people who weren't on the prison track into criminals, when they weren't criminal at first.
Another is that if police perceive that young black and Hispanic men are criminals, in general, even if they haven't been caught yet, they are less likely to be morally troubled when those young men are wrongfully arrested, searched or convicted of crimes which they didn't commit, because the police feel that those young men had it coming to them for crimes for which the young men were not caught. And, police have to be self-policing, and are less likely to take self-policing action when they believe that no moral harm has occurred, even if rules were technically violated.
Perhaps most importantly, acting on racial Bayesian priors, even if those priors have a real factual basis because people fitting a demographic profile are significantly more likely to commit crimes in a particular context than people who do not, the cost that those racial Bayesian priors impose on law abiding people who happen to be of the same race are crushing.
"Driving while black" in a predominantly white neighborhood prompts police calls and stops made without reasonable suspicion or probable cause with great frequency.
Innocent unarmed individuals holding cell phones are shot in their backyards because police assumed that the cell phone was a gun.
Law abiding minority individuals learn to distrust police, and lose the full benefits of protection from the criminal justice system, because police don't get the information that would give rise to legitimate probable cause from the community and because they refrain from calling the police for assistance when there is a great risk that they or loved ones will be the victims of police misconduct if they do.
Basically law abiding minorities don't get the benefit of the doubt and leniency from law enforcement that causes many stops of whites to result in a warning, and for dubious offenses that almost never result in criminal charges like driving with a defective headlight, jaywalking, driving just a few miles an hour above the speed limit, possession of small amounts of marijuana, open contain law violations, and the like, to be enforced to the full extent of the law against them.
This mistreatment of law abiding minorities, even if motivated by reality based Bayesian priors, undermines the legitimacy of law enforcement for whole communities, stigmatizes people who shouldn't have criminal records in ways that make it harder for them to function legitimately, and undermines the incentives the criminal justice system is supposed to create to encourage people to follow the law.
Bayesian Reasoning In Sentencing
Similarly, once a defendant has been convicted, Bayesian priors are routinely used to determine an appropriate sentence for a defendant, through formal sentence enhancers for recidivist defendants, through formal consideration as part of sentencing guidelines in the federal system and in some states such as Florida, and through informal evaluation of a pre-sentencing report by a judge who has discretion to choose from a range of permissible sentences for an offense. The Bayesian priors are not restricted to prior criminal records either.
While judges (and juries imposing death sentences) are legally forbidden from using race as a prior in making sentencing decisions (although overwhelming statistical evidence demonstrates that this is done pervasively), they are permitted to consider a defendant's education, marital status, employment status, community ties, motive for the crime, age, substance abuse issues, mental health and more. Gender is routinely considered as well, whether or not this is legally proper. Judges are also permitted to consider gut level perceptions of the individual defendant from a sentencing hearing and that perception inevitably is influenced by race and cultural differences between (or similarities with) the judge and the defendant. Judges are more lenient when they can empathize with a convicted defendant and the more similar a convicted defendant is in culture and life experiences to a judge, the more a judge will empathize with the defendant.
Similarly, Bayesian reasoning is pretty much mandated in parole hearings in states that have indeterminate sentences, to determine that likelihood that a convicted criminal will reoffend upon release.
One important reform in sentencing is to use empirically validated risk assessment tools to overcome the individual and less accurate Bayesian priors of individual judges acting in a non-systemic manner based upon gut instincts. But, while these tools can be more accurate than judges acting based upon interpersonal interactions in a brief sentencing hearing, there is still room for hidden basis based upon which factors are and are not considered by the risk assessment tool, based upon how the information used as inputs by the tool is collected, and there is likely room for misuse or gamesmanship of the tool because it is a black box whose implicit reasoning is often not disclosed or not easily understood.
Analysis
Would we be better off to acknowledge that reasoning based upon Bayesian priors in the criminal justice system is inevitable, in order to better regulate it?
Perhaps providing juries, judges and police with more information that would make their Bayesian priors more accurate would be a more fruitful approach than our current one.
But, because Bayesian priors based on race and other impermissible factors can simultaneously make decision making more accurate overall, and make life oppressively unfair to generally law abiding people who share those identifiers, criteria that evaluate the cost of false positives and false negatives equally are inherently flawed as well. Simply causing priorities other than mere raw accuracy to be more widely acknowledged, for example, in creating risk assessment tools and devising rules in the criminal justice system and in civil rights laws, might go a long way towards improving our policy decisions in those matters.
Rather the maintaining the pretense that we can ignore some facts to improve accuracy, which is true less often than we pretend it to be, we might be better off acknowledging that some facts that are ignored could improve accuracy and that we will continue to do so because preventing false positives is more important that preventing false negatives. A person with a propensity to commit crimes who is acquitted will usually offend again and be caught and punished the next time (and if he doesn't the reform arising from the acquittal is itself a benefit to society). But, a person who is generally law abiding who is wrongfully convicted may suffer so much harm that they are dragged down into life of crime as a consequence of the wrongful conviction.
I have been a practicing statistician for many years (although I don't have much experience with Bayesian analysis) and am now in law school and just starting Evidence. Rule 401 of the FRE says that evidence is relevant if it makes a fact more or less probable. This made me think of Bayes Theorem (and also led me to your blog):
ReplyDeleteP(Guilt|Evidence) = P(Evidence|Guilt)*P(Guilt)/P(Evidence)
If defendants are innocent until proven guilty, shouldn't P(Guilt) start at 0, in which case you would never be able to prove guilt. In practice, what P(Guilt) are juries actually supposed to start with? Or is (as your post suggest) Bayesian reasoning out the window when it comes to the law.
Perhaps of more interest: given some prior non-zero probability, how much does the probability have to increase or decrease in order for the evidence to be relevant? And, if there are multiple explanations for a particular piece of evidence, some consistent with guilt and some not, are we stuck with the expected value (average probability), even though the actual probability is either 0% or 100%?
I have a feeling my Evidence professor is not going to appreciate my curiosity in this area.
P(Guilt) is actually prohibited from being zero institutionally because a showing "probable cause" is required to make a valid arrest or to take a felony case to trial, and because prosecutors are ethically required to have "probable cause" that the person prosecuted committed a crime, before charges are brought.
ReplyDeleteYeah, I thought of that as well. Those two concepts seem to be at odds with each other, but practically (and statistically) there would have to be a non-zero probability of guilt to start with.
ReplyDelete