Science: More a matter of testing or questioning?

In a recent Newsweek article, Sharon Begley proclaims:

… I hereby make the heretical argument that it is time to stop cramming kids’ heads with the Krebs cycle, Ohm’s law, and the myriad other facts that constitute today’s science curricula. Instead, what we need to teach is the ability to detect Bad Science — BS, if you will.

… Science is not a collection of facts but a way of interrogating the world.

Putting aside the fact that having to learn such things as “the Krebs cycle [and] Ohm’s law” in no way precludes the development of skill in “interrogating the world”, Begley’s remarks could be seen as not only recommending that a new priority be given to the development of questioning skills, but those remarks – if taken to a logical extreme – could also be seen as a call for an end to commonplace deference to established fact and proclaimed expertise.

Begley, however, intends no such radical an understanding. Unfortunately, it may take just such a radical approach – even if only as brief or intermittent forays – in order to actually “detect Bad Science”.

In any event, Begley’s brief article does next to nothing with regards to establishing the least little bit about what distinguishes bad science from good science. She seems at times to be right on the verge of making some such distinctions, but, instead, she resorts to the convention of identifying as acceptable evidence the results which come from certain forms of investigation without even questioning what it is about forms of investigation that assures the quality of the evidence obtained — if indeed the form of an investigation is ever sufficient to assure good quality evidence.

For instance, Begley notes that:

All too many [scientists (and people in general)] put too much credence in observational studies, in which people who happen to behave one way (eating a lot of olive oil, drinking in moderation) have one health outcome, while people who choose to behave the opposite way have a different health outcome. This is a reasonable way to generate a hypothesis, but not something to guide your life by, and certainly no basis for the health advice such scientists peddle.

Begley’s point about scientists peddling health advice is reiterated in an article written by David H. Freedman:

… much of what biomedical researchers conclude in published studies – conclusions that doctors keep in mind when they prescribe antibiotics or blood-pressure medication, or when they advise us to consume more fiber or less meat, or when they recommend surgery for heart disease or back pain – is misleading, exaggerated, and often flat-out wrong.

This article goes on to note that much

published research … [is] remarkably unscientific, based largely on observations of a small number of cases.

But, is research ever “unscientific” merely because of the number of observations upon which conclusions are based?

Begley says that observational studies provide “a reasonable way to generate a hypothesis”, and she is correct. This, then, would seem sufficient reason – indeed, it IS sufficient reason – for insisting that it is NOT the number of observations or tests that determine whether some research is scientific or unscientific. To put it in Begley’s terms, it is NOT the number of observations that determine whether some science is bad or good.

Even so, Begley goes on to say that

If knowledge comes from intuition or anecdote, it is likely wrong.

It is very important that Begley insert the qualifier “likely” for at least two reasons.

First of all, since hypothesis generation is possibly the single most necessary aspect for there to be any sort of science, and since hypothesis generation can rightly be said to at least sometimes depend on a researcher’s intuition, then knowledge established to be right or correct by good science that originates from an hypothesis which is formed from intuition is knowledge which comes from intuition and is not wrong.

Furthermore, with regards to anecdotes, large studies – that is to say studies which include a large number of observations or a large number of cases – can be considered as being inclusive of a large number of anecdotes. In fact, with regards to biomedical research in particular and as will be discussed presently, there are actually very good reasons for thinking of large studies as collections of anecdotes or anecdotal information.

Be that as it may, within the scientific community (including the biomedical sciences community), anecdotes are typically ignored when not dismissed out of hand. As the Freedman article notes:

Doctors may notice that their patients don’t seem to fare as well with certain treatments as the literature would lead them to expect, but the field is appropriately conditioned to subjugate such anecdotal evidence to study findings.

Of course, the passage immediately above well serves (no doubt unintentionally) as an indictment of the apparently conventional notions about just what the terms “good science” and “scientific” are supposed to indicate. After all, if it is necessary for “good science” or for “scientific” research that observers (be conditioned to) either “subjugate” or deny those of their own anecdotal observations which do not comport with large studies, then to say that some investigation is either good science or to say that it is scientific is to say nothing more than that it encapsulates the current convention that defines acceptable thinking.

That, of course, is an absolutely preposterous way of judging either what is “good science” or what is “scientific”!

Both the Begley and the Freedman articles stress that large randomized studies are to be preferred, and while that seems intuitively sensible on the face of it, the question which should immediately come to mind is, “Why?”

Is it because having more data makes that data generally more reliable? But, then, what is it that would even make it appropriate to describe data in terms of its reliability? Is it a noticeable consistency within data that indicates the reliability of that data? Some kind of consistency is certainly necessary in order for there to be the patterns which are yet another necessary aspect for there to be science. Data, of course, are results; so, the reproducibility of results by other additional researchers could contribute another level of consistency on top of the pattern consistency derived from previously collected data, and this added layer of consistency could provide for an increased sense of data reliability.

Data collection in and of itself, testing in and of itself – even within the context of there being an hypothesis which is being investigated – is not sufficient for the activities undertaken to be called science. There still remains the entirely separate matter of how the collected data gets interpreted, and this means that the quality of a scientific endeavor is largely – if not exclusively – assessed according to the interpretations and the manner of their expression as conclusions.

However, interpretations amount to nothing more than – interpretations are nothing other than – descriptions of data after they are analyzed. As a process in itself, analysis can be regarded as reduction to constituents, but, since science – in order to function – depends upon there being patterns, all data needs to eventually be analyzed for patterns.

This is to say that in seeking orderliness or patterns, the practice of science relies upon the manipulation of data in order that there be interpretation.

Of course, the word “manipulation” is burdened with a negative connotation, for instance as it gets used in the following passage from the Freedman article:

One of the researchers, a biostatistician named Georgia Salanti, fired up a laptop and projector and started to take the group through a study she and a few colleagues were completing that asked this question: were drug companies manipulating published research to make their drugs look good? Salanti ticked off data that seemed to indicate they were …

After other team members variously object to the conclusions Salanti presents, the lead researcher, Dr. John Ioannidis, puts forth a remark which has considerable implications to all assessments regarding the quality of any research. Ioannidis says:

Maybe sometimes it’s the questions that are biased, not the answers.

Ioannidis admits that not all bias is overt, but the Freedman article makes it seem as if the sort of bias that Ioannidis has in mind extensively pertains to “an intellectual conflict of interest that pressures researchers to find whatever it is that is most likely to get them funded.”

However, an even more basic sort of bias is that which results simply from the manner in which a question is asked. After all, the question itself indicates the pattern(s) to be sought. This is not to say that there is anything necessarily or terribly wrong with there being a pre-determined agenda for a research project; rather, the point is that part of the assessment about the quality of a study must include consideration into and judgment about the quality of the question(s) being asked.

For example, in the Freedman article, it is noted that

80 percent of non-randomized studies (by far the most common type) turn out to be wrong, as do 25 percent of supposedly gold-standard randomized trials, and as much as 10 percent of the platinum-standard large randomized trials.

According to the article, Ioannidis attributes the cited error rates to

researchers … frequently manipulating data analyses, chasing career-advancing findings rather than good science, and even using the peer-review process – in which journals ask researchers to help decide which studies to publish – to suppress opposing views.

However, the article also notes that when Ioannidis started investigating how “studies were going wrong”, he

discovered that the range of errors being committed was astonishing: from what questions researchers posed, to how they set up the studies, to which patients they recruited for the studies, to which measurements they took, to how they analyzed the data, to how they presented their results, to how particular studies came to be published in medical journals.

This is to say that much of what Ioannidis found can be described as errors in reasoning – beginning with the quality of the questions being asked, where that quality is itself dependent on the adequacy of how the researcher(s) suitably identify (and describe) the problem, including the context in which it occurs.

Much ado is made in the Freedman article – as well as in the Begley article – about how very error-prone have been many of what Begley refers to as “observational studies” (which could just as well be – and are probably better – called correlational studies).

What neither Freedman nor Begley (and maybe not even Ioannidis) realize is that employment of the randomized form of investigation cannot result in “good” science (even when large numbers of cases are involved) so long as the question being addressed is itself lacking in quality.

For instance, the Freedman article notes that

widely prescribed antidepressants such as Prozac, Zoloft, and Paxil were revealed to be no more effective than a placebo for most cases of depression.

Is such a conclusion, even if drawn from an indubitably appropriate interpretation of absolutely adequate data, rightly deemed to be “good” science?

If the question which the research sought to answer were, “Are Prozac, Zoloft, and Paxil more effective than a placebo for cases of depression?”, would that question be a “good” enough question to have resulted in “good” science?

There is not really any reason to call it “bad” science. Maybe it would just be “not especially interesting” science, and maybe the science could be made better were the question more along the lines of: “Do we have any reason or evidence for thinking that Prozac, Zoloft, and Paxil are ever effective for cases of depression?”

In that case, however, the conclusion would be more likely stated as: “Prozac, Zoloft, and Paxil were revealed to be more effective than a placebo in only a small percentage of cases of depression; accordingly, until further research is able to differentiate the types of cases in which these drugs provide increased benefit, it is recommended that treatment for cases of depression be initiated with placebo.”

This is to say that, contrary to what is apparently the conventional way of thinking and contrary to the impressions promulgated by the likes of Begley and Freedman, “good” science or “scientific” research is in no way necessarily incompatible with acting on the basis of anecdotes.

When properly done, what “good” science – even if it is correlative science – can do (in biomedicine, for instance) is provide for a multiplicity of approaches which can result in an overall potentially more effective regimen for problem solving, and it can also reveal new or better defined and understood problems.

This entry was posted in History and Philosophy of Science, Science and tagged , , , , . Bookmark the permalink.

One Response to Science: More a matter of testing or questioning?

  1. Pingback: Philosophy in Science | The Kindly Ones

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s