Why exactly no placebo punishments?

In 2009, the Journal of Criminology [A Dutch journal, ‘Tijdschrift voor Criminologie, hence: ‘TVC’ further on] published the article “Recidivism after community service and after prison sentences”. The authors claimed to have demonstrated that offenders reoffend less after community service than after a prison sentence. At the beginning of 2011, the study was the subject of debate in Dutch Parliament, Mrs Helder (PVV [1])) was brutally attacked in parliament and on social media for rejecting the claims.

André van Delft and the undersigned studied the concerning article in detail and concluded that it was flimsy. We decided to take this display of lousy scientific work level seriously. This is not just socially deeply relevant research; the claims are widely adopted indiscriminately in media and politics.

It turned out quite a job. Our efforts yielded little. However, part of our criticism was eventually published in the TvC.

Communication with the magazine

We first wrote an extensive (about 7000 words), scientifically formulated critique, which we offered to TvC in March 2012. We asked for the article by Wermink c.s. to be withdrawn. At the end of May 2012, TvC announced that it would not withdraw the scientists’ article and would not have our article peer-reviewed; however, we were invited to write a discussion paper.

We asked if the magazine could then include a link to our full article that we would post online. Initially, the editors agreed.

In July we sent in our discussion paper; we neatly adhered to the maximum of 2500 words, we started a website KeizersenKleren.nl [ Emperors & Clothes, Now defunct] and posted the complete article as first contribution on it. In October, TvC replied. The editors asked us to soften or remove the sharpest criticism. At the beginning of December we were told that the version we had now modified would be published: we would receive the redacted text at the end of January 2013. In the meantime, it had been announced that yet there would be no link to the more extensive criticism…

At the beginning of February 2013 we asked for the text and on February 21 we received the neatly edited text with the request: “Please respond no later than tomorrow“…
We complied with this request…
On March 28, we discovered that our contribution had appeared in the magazine and was provided with an afterword by the authors under the curious title -not in line with the content of the piece: “The better helmsmen (also) row with the straps they have”.

Many words

The first thing noticed was that the ‘rebuttal’ had more than 4000 words. The first about 1000 of these constitute a general reflection on empirical research that does not directly relate to the original research and certainly not to our criticism.

We did not want to get into an argument about copyright and therefore did not include Wermink c.s.’ piece on the website. Instead, we produced the text below: Recidivism, community service and imprisonment: a critical discussion, presenting our own contribution in a special way and conclude with some brief comments on the piece by Wermink c.s.|
The special way being the use of different background colours.
Meaning of these:

ORANGE:

This part was completely ignored in the authors’ response or at first seriously distorted before it was responded to.

YELLOW:

These parts of our criticism were addressed, but the response was very poor.

Recidivism, community service and imprisonment: a critical discussion

In the third issue of 2009, the Journal of Criminology published the article ‘Recidivism after community service and after prison sentences’ of Wermink, Blokland, Nieuwbeerta and Tollenaar.

In this contribution we comment on the operational definition of recidivism used, the unusual interpretation of the judicial process and, in particular, on the statistics used.

From the summary of the article we quote:

‘The purpose of this article is to compare the recidivism of the person sentenced to work with the recidivism of prison-punished adult offenders in the Netherlands. We use longitudinal, judicial data (…) To take into account possible selection effects, “propensity score matching” and “matching by variable”.’

The success or failure of community service as an alternative to imprisonment immediately affects a major difference of opinion in politics; in short: should criminals be punished or helped? The research therefore received a lot of interest in the media.

Recidivism behaviour versus criminal recidivism

There is consensus in the literature on what is meant by recidivism; there is unanimity in the literature: there is recidivism when someone who has committed a crime or offence, and later does it again. However, measuring recidivism is not easy. In the present study, as in many other studies, one relies on data from the judiciary, thus on criminal recidivism. On the frequency of recidivism behavior that is not judicially assessed, the dark number, by definition, no exact data are available. However, based on data on reports and victims, it is certain that only a fraction of the number of criminal acts ultimately end up in court.

On the first page, the authors explicitly link their research to the objective to reduce – without further specification – ‘recidivism’: ‘The question we are trying to address in this article is to what extent community service, in view of the recidivism in the following period, is a good alternative to prison sentences.’ The existence of a dark number is therefore not taken into consideration by the researchers.

Implicitly, the researchers apparently assume that the chance of being caught or not caught, charged and to be condemned, is based on chance. That, by us, does not seem to be a legitimate assumption.

For another reason, too, it seems to us that leaving out the part of crime that does not go to court, is undermining the relevance of the investigation:

by limiting criminal recidivism in general and first offenders in particular, the deterrent effect of punishment for general prevention remains out of the picture.

The judicial process challenged

The researchers rightly conclude that experiments should actually be carried out with random assignment of community service or imprisonment. Now, in their words, it was crucial ‘to check for selection processes’.

In a general sense, the authors dwell extensively on this necessity, but do not devote a word to the conditions and limitations of the chosen methods for that ‘check’. The study mentioned groups that had already been formed by judges. They had pursued an active selection that has a very direct and very negative impact on reliability and relevance of the results. The judges did their best to impose community service and prison sentences to those suspects for whom that kind of punishment was best suited.
Those efforts of the judges have therefore been swept off the table by the researchers. And not just those of the judges, by the way: also those of the Public Prosecution Service and to a certain extent even of the defense. After all, they actively contribute to this selection, which is destructive for the reliability and relevance of this research. One imagines to get an assignment to evaluate whether judges are assigning the right type of punishment – community service or detention – in this, the recidivism score naturally counts as an important indication of success or failure of the judges.

The statistical methodology used

The propensity score matching used by the researchers is applied to edit observational and group data in such a way that it can be treated as if it were the result are from a controlled experiment.

The method has particularly gained traction in the context of research into the use of medicines. This research focused on criminal behavior by people who were for the first time convicted of a crime. The group that received imprisonment was by the researchers labeled as a control group: as if this was not a ‘treatment’ that could have an effect on the tendency to crime afterwards and the community service could.

In criminology, and even in the criticized article, mention is made of the phenomenon that people in prison learn to become ‘better’ criminals. Certainly against that background, the representation as if imprisonment would stand for an ‘untreated’ group is rather strange. Partly because of this the question is removed why the judge chose one or the other punishment, by definition and completely out of sight. Were convicted given community service because of positive expectations about the educational effect of this, due to negative expectations regarding imprisonment, because of the desire to retaliate or on the basis of the consideration that society in any case would have been protected from the conduct of the convicted person for some time?

It could be that community service has no effect at all on the chances of preventing recidivism, while detention increases that chance. In an older article of three of the four involved authors, it was a main conclusion: ‘detention has a criminogenic effect’ (Nieuwbeerta et al., 2007).

Six steps

The methodology used involved six separate steps:

1. choice of variables;

2. establishing a model based on logistic regression;

3. calculation of a propensity: assigning a ‘chance’ that this person would be sentenced to a community service or a prison sentence;

4. the one-on-one matching of pairs of convicts: people with approximately equal ‘chance’ of getting community service, one of whom received community service, while ‘his match’ actually went to prison;

5. disregarding part of the data;

6. comparison of two groups – one selected from among the community service and one from the prison sentences – and interpretation of the differences found.

Steps 1 and 2 are presented in an incomplete way in a table; the values for the parameters that indicate how in the established model that mutual ratio in predictive force, are not displayed. More serious is that there is also no indication of the found fit of the obtained model. The significance of the regression coefficients for each variable separately is displayed. To the statistics layman, that may sound like something that also substantiates relevance and reliability, but in reality that significance says nothing without theoretical underpinnings.
Because no information is provided about whether or not the model ‘fits’, there are only two lines in the article that substantiate the choice of variables. One of them refers to the authority of Nagin et al. (2009), the other refers to a piece by Monahan (2006). Monahan’s piece what is referred to is simply not about what the authors claim about it.

Incomplete matching

There are two ways of matching: nearest neighbour matching and maximum difference in propensity score. When the first way was unambiguously chosen, one could have linked each individual from the smallest group with an individual from the larger group. Instead, there was the choice for a mixture of both methods: ‘A person from the control group was linked to an individual from the experimental group, when the difference in the estimated probability of community service for both persons did not exceed 0.05.‘ This resulted in the deletion of persons to both ends of the matched group.

For 39 percent of the people in the control group – because of that maximum – no match could be found. Not surprising: some offenses will rarely, if ever, lead to community service, others almost always.

The nature of the offense is a very obvious predictor. The group of 39 percent was however, not deleted on the basis of this variable, but on the basis of the constructed propensity score, of which the crime type is only a building block.
This brings us to another problematic aspect of the design of the analysis: the nearly 73 percent of those sentenced to work and 39 percent of those sentenced to prison who were left outside the comparisons because they did not ‘fit’ because of the too large difference in propensity score, were however, taken into account when setting up the model for the same propensity score.

The actual assessment
The method used to present the significance in the numerical results found is not explained; it only extensively indicates how ‘significant’ the differences found are.
Rosenbaum and Rubin’s article, to which the authors refer extensively, mentioned in the title already that it was about sampling methods. With Wermink et al., however, there was no sampling: data from the complete populations of convicts were available.

When, in a sense, the entire population has been studied, the use of a statistical significance test is strange and unnecessary. Instead of talking about the statistical concept, we can then after all, talk about home-garden-and-kitchen-significance: ‘of significance’. In statistics we mainly talk about significance to indicate to what extent results, found on the basis of samples, may be considered significant for the entire population. If results related to the entire populations are ‘significant’, you do not establish statistically.
On the basis of readily available data, one could simply have made a statement about the number of offences that, according to the authors, may not have been committed -thanks to the fact that of the 11,308 in 1997 convicted a selection of 2,123 not the ‘toughest possible judicial response to [the] most serious offences’ (this particular way of describing imprisonment is derived from the aforementioned article (2007) by three of the four authors involved) was imposed. Then the claim ‘50% fewer convictions over a period of eight years’ changed into ‘preventing more than one offence per day’.
Apart from the inadequate substantiation of the claim, there is an impact here that is not really ‘of significance’ in comparison with the actual number of crimes committed, expressed in millions per year.

Correlation versus causation

The researchers claim to have shown a correlation between the type of punishment after an initial conviction and the number of times the same person was later arrested and convicted again. Correlation shows no causation. Data can correlate because there is a causal factor for both. It is not difficult to appoint candidates for this. Think of intelligence, especially social intelligence, and to the extent to which someone is ‘incorrigible’. Someone who, even in court, is unable to hold back in his statements and therefore will be sentenced to prison rather than community service, will also score lower on the ability to avoid being arrested for new offences, being charged and condemned.
Convicts who are able to give the judge the impression that they can and want to better their lives and are therefore being given community service, will be sentenced less for new offenses, for two reasons. Either because the impression was right and no new offenses were committed, or because the cunning that helped them deceiving the judge, also works in their advantage in terms of the likelihood of being arrested, charged and convicted of new offences.

Science versus policy advice
Since the conclusion of the article begins with a comment about the theory, it seems that the angle was primarily academic and not focused on policy evaluation or advice. What on this area has been reached is both meagre and unclear. The conclusion is merely that the research results ‘question the theory of deterrence’. But was that really what was tested? We note that at best there was a system of hypotheses.
Was there then a question of policy evaluation or policy advice? The researchers conclude about this:
Our results are also relevant for policymakers (…) our study [shows] that offenders after a work sentence less reoffend than after a prison sentence, which results in additional cost savings in terms of preventing crime damage.
Because the results are called ‘also’ relevant for policymakers, it is once again suggested that the angle was mainly an academic one. But then again: which policymakers are concerned? Isn’t this actually referring to the judges, who are urged here to opt for community service more often across the board? That would be in line with the fact that the very first line of the article is about the judges and a little further on the ‘central question’ is formulated as:
To what extent [are] community service orders a good alternative to prison sentences in relation to the recidivism of the punished after their sentence?
In view of the fact that already in 1997 significantly more community service than prison sentences were imposed and the proportion of community service has grown even further since then, a remarkable angle. The judges, after all, seem to be fully aware that community service is a good alternative. From this investigation, the judges do not get the smallest indication about when to impose community service or imprisonment: they simply have to choose (even) more often for community service.
And what do policymakers gain from this research?
On the basis of this research, policymakers have no way of determining whether the – claimed – less recidivism says something positive about community service or something negative about prison sentences. Other forms of punishment, such as house arrest for example, are not taken into account. That punishment has the advantage of the deterrent effect, but not the disadvantage of imprisonment as a criminal school.

Research in this area should not only look at the direct effect of the punishment on the detainees; a range of other influences will also have to be looked at. One possibility is that it is so difficult to reintegrate prisoners that this encourages them to commit additional crime (because of the label and associated subordination, loss of work and girlfriend, the acquired ‘training’ to become a ‘better’ criminal, wanting to maintain the toughness points score). Also the possibility that the probation service fails or the circumstances in prison has little or no deterrent effect, should be taken into account as a possible ground for explanation.

Comments

Due to the use of color, it can be seen almost at a glance, that the authors have not responded so well to our criticism. It’s even worse than it looks. The non-colored parts are almost all neutral, descriptive passages. Moreover, it has been less and worse dealt with as the pieces were more important and critical.

I would like to highlight three of the most critical points and point out two more spicy points.

In the extensive piece we had included the following statement about the so-called Propensity Score Matching, from Thomas Love, a man with a great track record in statistics:

But if our propensity model misses an important reason why subjects are selected to treatment or control, we’ll be in trouble.

The authors use a lot of words not to talk about this pitfall.

It ties in directly with the false reference to Monahan’s piece. In the discussion paper at the editors’ insistence this criticism was watered down. We concluded that more critical formulation as follows:

The sentence ‘These characteristics are also known to be of interest to judges when they make decisions with regard to the trial of perpetrators (Monahan, 2006)” is therefore decidedly not by Monahan, 2006, supported. Our hope that only an unfortunate error had occurred here, that, for example, to another text of Monahan should have been referred to, the ground was drilled in after the observation that in REC-EN any reference to Monahan is missing. For this ‘error’ such an improvement was apparently not within reach.

The rebuttal that is given to ignoring Love’s warning contains a spicy reference to an “English article that appeared about this study”. Spicy for two reasons. First of all, that article contains largely the same content as the offending document and it was written by its four authors and another author: D.Nagin. That’s the same person who is referred to as an authoritative source for the choice of those parameters, in addition to the falsely quoted Monahan. Even more spicy: there is no reference in that English article to this Dutch one!

The second extremely spicy aspect is the response to our criticism that the ‘efforts of the judges have therefore been dismissed by the investigators’. The authors don’t say it in so many words yet, but in fact their reaction amounts to an even more emphatic denigration of the role of the judges. Noteworthy: also in another piece by Wermink and other authors that was published on the website

Keizers en Kleren as the second non-peer review, that theme returned:

Moreover, we believe that judges themselves do not know exactly how they arrive at a certain sentence. Within the very wide discretionary power that the court has (for the time being) numerous legal but also psychological factors can play a role of which the decision maker is not aware.

The third and final critical point is the most embarrassing for Nieuwbeerta c.s.

We cannot avoid quoting a piece from the ‘rebuttal’. It’s about the phenomenon ‘placebo punishments’.

The ideal way to determine the effect of community service would, as with a new drug, be a complete randomized experiment. Several aspects of criminal law practice, however, complicate the set-up and execution of such an experiment. We will mention three of those aspects here. Firstly, within criminal law, the requirement applies that the sentence must be an adequate response to the offence committed by the accused. This principle is at odds with the accidental assignment of a certain punishment. In criminal law practice, judges will always trying to impose the ‘most appropriate’ punishment, which can lead to structural differences between offender groups that receive different sentences. Secondly, the nature of criminal law precludes leaving suspects unaffected who, in the opinion of the judge, deserve punishment: of a ‘placebo’ sentence can therefore be no question within criminal law. Finally, research into the effects of punishment cannot escape a problem that is inherent to criminological science as a whole: when determining the effect of punishment on the criminal behaviour of the punished, there is usually no way of directly measuring this criminal behaviour.

What is stated is not nonsensical.

What is shameful is what it lacks: the observation that a placebo penalty is logically absolutely impossible. One cannot impose a punishment that is not really a punishment but only looks like a punishment to the convicted person.

It is even more painful that in response to Professor Nieuwbeerta’s oration on Emperors and Clothes last December, we already wrote about this kind of “punishment that is secretly not a punishment”.

Literature

Monahan, J. (2006). Structured violence risk assessment. In: R. Simon & K. Tardiff (eds.). American psychiatric publishing textbook on violence assessment and management. Washington, DC: American Psychiatric Publishing.

Nagin, D.S., Cullen, F.T. & Jonson, C.L. (2009). Imprisonment and reoffending. Crime and Justice, 115-200.

Nieuwbeerta, P., Nagin, D.S. & Blokland, A.J. (2007). Het meten van effecten van gevangenisstraf op crimineel gedrag in een niet-experimentele studie. Mens & Maatschappij, 82, 272-299.

Rosenbaum, P. & Rubin, D. (1983). The central role for the propensity score in observational studies of causal effects. Biometrika, 70, 41-55.

Wermink, H., Blokland, A., Nieuwbeerta, P. & Tollenaar, N. (2009). Recidive na werkstraffen en na gevangenisstraffen: een gematchte vergelijking. Tijdschrift voor Criminologie, 51(3), 211-227.

NOTES:

1) PVV was then and still is the main opposition party in Dutch parliament

The pdf version of the original article of Wermink c.s. was published on the website ‘Reclassering.nl’ = Probation service. It can not be found anymore there, nor on the site of the Journal that originally published it.

P.S.: In the bibliography of the article by Wermink et al., the reference contains, in addition to the erroneous year, another typo: they write in the title ‘on’ instead of ‘and’.