The Apgar score for evaluating the health of newborn babies, invented by Dr. Virginia Apgar in 1952, was one of the earliest examples of an artificial intelligence expert system. It simply distills the wisdom of experts into an evidence based measure that produces an empirically validated result.
These methods are often more accurate than human judgment because they weigh the material facts more accurately, and they ignore immaterial facts that often cloud human judgment.
But, if they get too complex they can pose subtle dangers because they are often not sufficiently transparent.
"Black box" assessments make it hard to discern what factors that should be considered are omitted or underweighted. People using those systems can give them inaccurate inputs because they don't recognize their relevance.
For example, an expert system for evaluating the health care needs of severely sick Medicaid patients denied many people sufficient care because one of the key factors was vaguely worded and often inputted incorrectly.
Theses systems can also consider factors that are correlated with outcomes that they do not cause, which shouldn't be considered for a variety of reasons. For example, many risk score systems that are used in sentencing decisions inappropriately consider race or proxies for race.