Prison Accountability and Performance Measures, Volokh Emory L. J. 2014
Download original document:
Document text
Document text
This text is machine-read, and may contain errors. Check the original document to verify accuracy.
PRISON ACCOUNTABILITY AND PERFORMANCE MEASURES Alexander Volokh* A few decades of comparative studies of public vs. private performance have failed to give a strong edge to either sector in terms of quality. That supposed market incentives haven’t delivered spectacular results is unsurprising, since by and large market incentives haven’t been allowed to work: outcomes are rarely measured and are even more rarely made the basis of compensation, and prison providers are rarely given substantial flexibility to experiment with alternative models. This Article argues that performance measures should be implemented more widely in evaluating prisons. Implementing performance measures would advance our knowledge of which sector does a better job, facilitate a regime of competitive neutrality between the public and private sectors, promote greater clarity about the goals of prisons, and, perhaps most importantly, allow the use of performance-based contracts. Performance measures and performance-based contracts have their critiques, for instance: (1) the theoretical impossibility of knowing the proper prices, (2) the ways they would change the composition of the industry, for instance by reducing public-interestedness or discouraging risk-averse providers, and (3) potentially undesirable strategic behavior that would result, for instance manipulation in the choice of goals, distortion of effort away from hard-to-measure dimensions or away from hard-to-serve inmates, or outright falsification of the numbers. I argue that these concerns are serious but aren’t so serious as to preclude substantial further experimentation. * Assistant Professor, Emory Law School, avolokh@emory.edu. I am grateful to Michael J. Broyde, Russell C. Gabriel, Leonard Gilroy, Linda Hardyman, Erica J. Hashimoto, Christina Mulligan, Carl Nink, Usha Rodrigues, Sarah M. Shalf, and the participants at the Emory/UGA joint faculty colloquium for their input and assistance. I am also grateful to Kedar Bhatia and Julia Hueckel for their able research assistance, and to the law librarians at Emory Law School. [*Need to insert Thrower acknowledgment.] forthcoming EMORY L.J. (2014) Electronic copy available at: http://ssrn.com/abstract=2336155 2 VOLOKH TABLE OF CONTENTS I. Introduction ..................................................................................... 4 II. The Failure of Comparative Effectiveness Studies........................ 9 A. Which Sector Costs Less? ................................................... 10 1. Difficulties in Calculating Costs .................................... 10 2. Competing Cost Estimates ............................................. 12 B. Which Sector Provides Higher Quality? .............................. 15 1. Difficulties in Figuring Out Quality............................... 15 2. Which Sector Leads to Less Recidivism? ...................... 19 C. The Limits of Comparative Effectiveness ........................... 23 III. Why Use Performance Measures?.............................................. 26 A. The Puzzle of Prisons? ........................................................ 26 B. Accountability, Flexibility, and Neutrality .......................... 31 1. To Know What Works ................................................... 31 2. To Implement Competitive Neutrality ........................... 32 3. To Express What We Want ............................................ 34 C. For Performance-Based Contracting.................................... 35 1. Limited Current Efforts .................................................. 35 2. The Range of Possible Contracts ................................... 37 3. The Feasibility of Merit Pay in the Public Sector .......... 46 D. What Measures to Choose ................................................... 47 IV. Concerns and Critiques .............................................................. 55 A. What Prices to Set................................................................ 55 B. Effects on Market Structure ................................................. 58 1. Public-Interestedness ..................................................... 58 2. Risk and Capital Requirements ...................................... 60 C. Undesirable Strategic Behavior ........................................... 65 1. Manipulating the Goals .................................................. 66 2. Distortion Across Dimensions of Performance.............. 68 3. Distortion Across Types of Inmates............................... 73 4. Falsifying Performance Measures .................................. 74 V. Conclusion ................................................................................... 77 Draft—Please do not circulate Electronic copy available at: http://ssrn.com/abstract=2336155 2013] PERFORMANCE MEASURES 3 Here arises a feature of the Circumlocution Office, not previously mentioned in the present record. When that admirable Department got into trouble, and was, by some infuriated members of Parliament . . . attacked on the merits . . . as an Institution wholly abominable and Bedlamite; then the noble or right honourable [member] who represented it in the House, would smite that member and cleave him asunder, with a statement of the quantity of business (for the prevention of business) done by the Circumlocution Office. Then would that noble or right honourable [member] hold in his hand a paper containing a few figures, to which, with the permission of the House, he would entreat its attention. . . . Then would the noble or right honourable [member] perceive, sir, from this little document, which he thought might carry conviction even to the perversest mind . . . , that within the short compass of the last financial halfyear, this much-maligned Department . . . had written and received fifteen thousand letters . . . , had written twentyfour thousand minutes . . . , and thirty-two thousand five hundred and seventeen memoranda . . . . [T]he sheets of foolscap paper it had devoted to the public service would pave the footways on both sides of Oxford Street from end to end, and leave nearly a quarter of a mile to spare for the park . . . ; while of tape—red tape—it had used enough to stretch, in graceful festoons, from Hyde Park Corner to the General Post Office. . . . No one . . . would [then] have the hardihood to hint that the more the Circumlocution Office did, the less was done, and that the greatest blessing it could confer on an unhappy public would be to do nothing. — Charles Dickens, Little Dorrit1 The results obtained from ENRD’s civil and criminal cases in fiscal year 2012 alone were outstanding. We secured over $397 million in civil and stipulated penalties, cost recoveries, natural resource damages, and other civil 1 CHARLES DICKENS, LITTLE DORRIT, bk. 2, ch. 8, at 489–90 (Wordsworth Classics 1996) [1855–57]. Draft—Please do not circulate Electronic copy available at: http://ssrn.com/abstract=2336155 4 VOLOKH monetary relief, including almost $133 million recovered for the Superfund. We obtained over $6.9 billion in corrective measures through court orders and settlements, which will go a long way toward protecting our air, water and other natural resources. We concluded 47 criminal cases against 83 defendants, obtaining nearly 21 years in confinement and over $38 million in criminal fines, restitution, community service funds and special assessments. — DOJ’s Environment & Natural Resources Division Annual Report, 20122 I. INTRODUCTION “Isn’t everything to be said on [private prisons] already in print?” asks Sharon Dolovich.3 She means the question to be merely rhetorical; and so do I.4 The comparative effectiveness debate, to the extent it’s relevant5—and I think it is6—has stalled, simply because the empirical literature, exhaustive as it is, is so bad. “The current weight of the evidence on prison privatization in the United States is so light that it defies interpretation,” write prison researcher Gerald Gaes and his coauthors.7 (The theory isn’t much better: the same authors characterize prison performance as a “theoretically bereft domain.”8) To intelligently choose between public and private provision, we should at least know which sector costs 2 U.S. DEP’T OF JUST., ENV’T & NAT. RES. DIV., ENRD ACCOMPLISHMENTS REPORT, FISCAL YEAR 2012, at 4–5 (2013). 3 Sharon Dolovich, How Privatization Thinks, in GOVERNMENT BY CONTRACT: OUTSOURCING AND AMERICAN DEMOCRACY 128, 129 (Jody Freeman & Martha Minow eds., 2009). 4 Not that her perspective is the same as mine, but we both agree that there’s still something left to say on the subject. 5 Dolovich herself is wary of premature engagement with the comparative effectiveness debate without having sorted through the necessary normative issues beforehand. See id. at 128–29; Sharon Dolovich, State Punishment and Private Prisons, 55 DUKE L.J. 437, 447 n.20 (2005). 6 See Alexander Volokh, Privatization and the Elusive Employee-Contractor Distinction, 46 UC DAVIS L. REV. 133 (2012). 7 GERALD G. GAES ET AL., MEASURING PRISON PERFORMANCE: GOVERNMENT PRIVATIZATION AND ACCOUNTABILITY 184 (2004). 8 Id. at 123. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 5 less, but we don’t; and we should at least know which sector provides higher quality, but we don’t have a great sense of that either.9 This seems puzzling: readers of the voluminous debate on private prisons can be forgiven for thinking that market incentives should make private prison firms either (1) cut wasteful expenditures and produce innovative services10 or (2) cut corners on essential inmate care and security and lead to a humanitarian disaster.11 Let’s focus on the positive claims for private prisons: if the private sector is so clearly superior, shouldn’t the difference hit us between the eyes?12 On second thought, this isn’t so puzzling after all. The advantages of market provision are often said to be that, what with the rigidities and low-incentive structure of government agencies, private firms have greater incentive and greater flexibility to figure out how to achieve any desired level of quality. But this assumes that (1) particular levels of quality are desired or encouraged, and (2) private firms are given the flexibility to achieve these levels. It turns out that both of these assumptions are wrong. Let’s take the quality problem first. Why not tally up the quality at a public prison, do the same at a comparable private prison, and compare the two quality measures? The trouble here is that— despite the scores of studies that have been produced purporting to measure quality differences—good performance measures are rarely used. As I document in Part II, this means that comparative quality studies are hard to interpret if one wants to know which sector is better. (This hasn’t prevented both partisans and detractors of private prisons from producing loosely reasoned pieces that oversell the findings of their favorite studies.) 9 These aren’t the only things we should know. For instance, we can also care about where accountability is greater, which sector might be more likely to push the substantive criminal law in a more pro-incarceration direction, and the like. See, e.g., Developments in the Law—The Law of Prisons, 115 HARV. L. REV. 1838, 1868–91 (2002) (my student note); Alexander Volokh, Privatization and the Law and Economics of Political Advocacy, 60 STAN. L. REV. 1197 (2008). 10 See, e.g., GEOFFREY F. SEGAL & ADRIAN T. MOORE, WEIGHING THE WATCHMEN: EVALUATING THE COSTS AND BENEFITS OF OUTSOURCING CORRECTIONAL SERVICES, PART II: REVIEWING THE LITERATURE ON COST AND QUALITY COMPARISONS (Reason Pub. Pol’y Inst., Pol’y Study No. 290, Jan. 2002); Samuel Jan Brakel & Kimberly Ingersoll Gaylord, Prison Privatization and Public Policy, in CHANGING THE GUARD: PRIVATE PRISONS AND THE CONTROL OF CRIME 125, 134–43 (Alexander Tabarrok ed., 2003). 11 See, e.g., Dolovich, supra note 5, at 474–80. 12 See, e.g., Philippe C. Schmitter, The “Organizational Development” of International Organizations, 25 INT’L ORG. 917, 932 (1971) (“interocular impact test”). Draft—Please do not circulate 6 VOLOKH It doesn’t have to be that way. Criminologists have produced no shortage of performance measures that are appropriate for evaluating prisons, using variables like in-prison violence, the quality of prison health care, the degree of crowding, and—which I think is immensely important—recidivism.13 The most important thing about a performance measure is that it measure performance, that is, outcomes. Inputs like money spent, guards hired, or programs offered are of quite limited value, since the whole point is to see whether the money spent is worthwhile, whether the guards hired are necessary, and whether the programs are effective. Outputs like the number of doctor visits or the number of graduates of rehabilitative programs—like the number of memos written by Dickens’s Circumlocution Office14 or the number of years of prison resulting from DOJ prosecutions15—are also of limited value. Doctor visits might just be make-work; the rehabilitative programs may not actually be rehabilitative. (The Circumlocution Office, whose function is to prevent things from being done,16 has a zero or negative contribution to performance; and the prosecutions that maximize prison time aren’t necessarily the same as those that most improve the environment.) What we care about—prisoner health, decent conditions, actual rehabilitation—are the outcomes that we should actually measure, to the extent possible. 17 Why should we use performance measures? There are several reasons, which I canvass in Part III. First, it’s good just to know whether the public or private sector has higher quality, for instance in evaluating whether one’s state should outsource or insource a particular project, or be one of the 19 states that ban private prisons altogether.18 Naturally, many factors determine performance other than the quality of the manage13 I first (briefly) advocated performance measures for prison accountability in my student note. See Developments, supra note 9, at 1887–88. 14 See text accompanying supra note 1. 15 See text accompanying supra note 2. 16 DICKENS, supra note 1, bk. 1, ch. 10, at 101–18. 17 See also BERYL A. RADIN, CHALLENGING THE PERFORMANCE MOVEMENT: ACCOUNTABILITY, COMPLEXITY, AND DEMOCRATIC VALUES 15–16 (2006) (defining “input,” “output,” “outcome,” and other terms). 18 See E. ANN CARSON & WILLIAM J. SABOL, PRISONERS IN 2011, at 32 appx. tbl. 15 (U.S. Dep’t of Just., Bur. of Just. Stats., Dec. 2012) (listing Arkansas, Delaware, Illinois, Iowa, Maine, Massachusetts, Michigan, Minnesota, Missouri, Nebraska, Nevada, New Hampshire, New York, North Dakota, Oregon, Rhode Island, Utah, Washington, and West Virginia as states with no inmates in private prisons in 2011). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 7 ment and the facilities: for instance, a prison can have better performance numbers because it was sent a better crop of people. But certainly having performance measures is better than useless. Second, it would help to implement a regime of competitive neutrality, where the public and private sectors could bid against each other and individual projects could shuffle from one sector to another. Competitive neutrality might be better than an all-public or all-private regime, but to implement it properly, the auctions should be evenhanded, which means that proposed costs and proposed quality targets should be fairly comparable. Performance measures would allow a winning contractor to commit to deliver a particular level of performance, and would allow governments to levy the appropriate contractual fine if this level isn’t achieved. Third, it would help policymakers express what’s desirable in prisons. One would think that this had been done already; but prison contracts are written in input and output terms because this is largely how the industry works and thinks. Performance measures have been a byproduct of the debate over prison privatization: the different sides in the debate needed them to argue in favor of or against privatization; and the development of these measures has in turn spurred serious thinking about what prisons should accomplish, which has had accountability benefits for the public sector as well. Perhaps most importantly, the use of performance measures would allow the spread of performance-based contracting, where— instead of levying a fine for not delivering a particular level of performance—the contract fee varies continuously with the level of performance delivered. Once accountability is tied to actual performance, giving prison providers the flexibility to choose how to do their job becomes more attractive. Part IV discusses critiques of using performance measures as part of a compensation scheme. One concern is that the true social benefits of various aspects of performance are unknowable, either in principle or in practice, so that determining the proper prices will inevitably fail. Where a service is closely bound up with justice concerns, a focus on efficiency pricing may be inappropriate: it might demean the service or give insufficient weight to non-efficiency goals. A second problem is that the use of performance measures will alter the composition of providers in the industry, in ways that are Draft—Please do not circulate 8 VOLOKH perhaps undesirable. One way this might happen is that, in the presence of monetary incentives, public-interested people may be less attracted to corrections. A different way performance measures can alter the composition of the industry is by increasing risk for providers. Providers can only control inputs, and the connection between inputs and outcomes is highly variable, because it depends on a great many variables, many of which are beyond the prison’s control—such as general social conditions or the underlying quality of the inmates. The relationship between any of these variables and outcomes is not very well known. One might care about the fairness of rewarding or penalizing providers based on factors beyond their control, though in an auction system, such windfalls will be canceled out by competitive bidding. More seriously, the riskiness might bias the set of available providers in favor of the largest and best-capitalized firms, and perhaps discourage experimentation with risky but promising techniques. This means that the sensitivity of price to outcomes might have to be limited, which might also limit the incentive effects. A third problem is that providers may engage in undesirable strategic behavior. They might manipulate the performance goals so they reflect goals that are easy to meet. They might focus their effort on the measurable dimensions of performance and slight the unmeasurable ones. (For example, what are the true outcomes of the justice system? Some outcomes, like case backlogs, are measurable, but other important outcomes, like accuracy of adjudication, aren’t—and measuring one runs the risk of distorting the agency’s effort away from the unmeasured outcomes.19) Similarly, providers will want to choose the easiest-to-treat populations (“creaming” or “cherry-picking”), and (given a population) fail to treat the hardest-to-treat members (“parking”). And, of course, any system based on particular numbers comes with the risk that someone might try to falsify the numbers. The good news is that, for prisons, there’s hope that these concerns can be fairly addressed. At the very least, these concerns don’t seem so serious as to preclude far more experimentation than 19 One might think that the reversal rate is a measure of accuracy of adjudication. But this isn’t true, because (1) the cases selected for appeal aren’t random (in the absence of some special process to verify accuracy), and (2) given deferential standards of review, judges can work to insulate their decisions from appellate review if they’re so inclined—for instance, by making them more intensely fact-based. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 9 has been happening so far. We actually have access to reasonably good performance measures that reasonably cover the important dimensions of prison quality, none of which have to be limited to efficiency-based measures. These measures should be set by corrections departments, not by contractors. Riskiness can be addressed, at least in part, by only making part of the payment depend on performance. Social Impact Bonds have some promise in encouraging nonprofit-sector financing; in any event, the prison market is already highly concentrated, so there is no current vast population of nonprofits and small companies to lose. Cherrypicking can be addressed by giving contractors no say in what inmates they’re given, and parking can be addressed, at least in part, by making monetary rewards depend on observable characteristics of the inmate (if, indeed, it’s a problem at all). Outright falsification of performance measures is a serious problem, which requires seriously investing in monitoring and ensuring robust disclosure regimes. None of these are perfect fixes, but we don’t need perfection; we just need an improvement over the status quo. II. THE FAILURE OF COMPARATIVE EFFECTIVENESS STUDIES It’s somewhat surprising that, for all the ink spilled on private prisons over the last thirty years, we have precious little good information on what are surely the most important questions: when it comes to cost or quality, are private prisons better or worse than public prisons? It’s safe to say that, so far at least, the political process hasn’t encouraged rigorous comparative evaluations of public and private prisons. Some states allow privatization without requiring cost and quality evaluations at all.20 The 19 states that don’t privatize at all21 might, for all I know, be right to do so, but of course their stance doesn’t promote comparative evaluation. When studies are done, they’re usually so inadequate from a methodological perspective that we can’t reach any firm comparative conclusions. Section A below discusses the problems with cost 20 See Developments, supra note 9, at 1873–74; Alexis M. Durham III, Evaluating Privatized Correctional Institutions: Obstacles to Effective Assessment, FED. PROBATION, June 1988, at 65, 67. 21 See supra note 18 and accompanying text. Draft—Please do not circulate 10 VOLOKH comparison studies, and Section B discusses the problems with quality comparison studies. Section C takes a broader view and notes that even well-done comparative effectiveness studies don’t answer all our questions. A. Which Sector Costs Less? 1. Difficulties in Calculating Costs How do we determine whether the private sector costs more or less than the public sector? Ideally, we could work off of a large database of public and private prisons and run a regression in which we controlled for jurisdiction, demographic factors, size, and the like. In practice, this large database doesn’t exist, and so the typical study chooses a small set of public and private prisons that are supposedly comparable. Unfortunately, this comparability tends to be elusive; the public and private facilities compared often “differ in ways that confound comparison of costs.”22 Sometimes no comparable facilities exist.23 Even where there are two prisons in the jurisdiction housing inmates of the same sex and security classification, they generally differ in size, age, level of crowding, inmate age mix, inmate health mix, and facility design.24 In particular, adjusting facilities to take into account different numbers of inmates is problematic, since facilities with more inmates, other things equal, benefit from economies of scale.25 The GAO explained recently that “[i]t is not currently feasible to conduct a methodologically sound cost comparison of BOP and private low and minimum security facilities because these facilities differ in several characteristics and BOP does not collect compara22 DOUGLAS MCDONALD ET AL., PRIVATE PRISONS IN THE OF CURRENT PRACTICE 33 (Abt Assocs. Inc., 1998). UNITED STATES: AN ASSESSMENT 23 Id. at 45 (making this claim about the Arizona facilities compared in CHARLES W. THOMAS, COMPARING THE COST AND PERFORMANCE OF PUBLIC AND PRIVATE PRISONS IN ARIZONA (1997)); see also SCOTT D. CAMP & GERALD G. GAES, PRIVATE PRISONS IN THE UNITED STATES, 1999: AN ASSESSMENT OF GROWTH, PERFORMANCE, CUSTODY STANDARDS, AND TRAINING REQUIREMENTS 15 (Fed. Bur. of Prisons 2000). 24 Id. at 34–35; see also Robert B. Levinson, Okeechobee: An Evaluation of Privatization in Corrections, PRISON J., Oct. 1985, at 77. 25 Gerry Gaes, Cost, Performance Studies Look at Prison Privatization, NIJ JOURNAL, Mar. 2008, at 32, 34; Douglas C. McDonald, The Costs of Operating Public and Private Correctional Facilities, in PRIVATE PRISONS AND THE PUBLIC INTEREST 86, 101 (Douglas C. McDonald ed., 1990). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 11 ble data to determine the impact of these differences on cost.”26 The data problem mostly comes from the private side: information collected by BOP from private facilities isn’t necessarily reported the same way that public data is reported, and the reliability of the data is uncertain.27 Moreover, “[w]hile private contractors told us that they maintain some data for their records, these officials said that the data are not readily available or in a format that would enable a methodologically sound cost comparison at this time.”28 Not only do federal regulations not require that this data be collected,29 but also, and more troublingly, at the time of the GAO study in 2007, the BOP didn’t believe there was value in developing the data collection methods that would make valid publicprivate cost comparison methods possible.30 Probably more seriously, public and private prisons have accounting procedures that “make the very identification of comparable costs difficult.”31 First, public systems, unlike private ones, don’t spread the costs of capital assets over the life of the assets, which overstates public costs when the assets are acquired and understates them in all other years.32 Second, various public expenditures, including employee benefits and medical, utilities, legal work, insurance, supplies and equipment, and various contracted services, are often borne by various other agencies in government, which might understate public costs by 30–40%.33 One of the often-ignored costs in the public sector is the cost of borrowing capital.34 Conversely, governments bear some of the costs of private firms, for instance, in various cases, contract monitoring, inspection and licensing, personnel train- 26 COST OF PRISONS: BUREAU OF PRISONS NEEDS BETTER DATA TO ASSESS ALTERNATIVES FOR ACQUIRING LOW AND MINIMUM SECURITY FACILITIES 4 (Gov’t Accountability Off., 2007). 27 Id. at 12–13. Id. at 5, 11–12. 29 Id. at 13. 30 Id. at 7, 19, 30. The BOP’s view seems to have been chiefly based on the fact that it used private contractors to run facilities for criminal aliens and wasn’t expecting to receive funding to run its own. Id. at 7, 19, 30. The BOP also believed that the Taft cost study, see text accompanying infra notes 55–58, was already a sufficient cost study. COST OF PRISONS, supra note 26, at 7, 19, 21, 30. 31 MCDONALD ET AL., supra note 22, at 33; McDonald, supra note 25, at 88–89, 97–100. 32 MCDONALD ET AL., supra note 22, at 35. 33 Id. at 36. 34 See McDonald, supra note 25, at 106. 28 Draft—Please do not circulate 12 VOLOKH ing, inmate transportation, case management, and emergency response teams.35 And third, when public or private prisons incur overhead expenditures, there’s no obvious way of allocating overhead to particular facilities—Gerald Gaes gives a specific numerical example involving Oklahoma, a high-privatization state, where a difference in overhead accounting can alter the estimate of the cost of privatization by 7.4%.36 As a bottom-line matter, McDonald says “the uncounted costs of public operation are probably larger than of private operation”;37 I tend to agree, but it’s hard to say for sure. 2. Competing Cost Estimates The best way to see the importance of various assumptions is to look at a handful of cases where different people tried to estimate the same cost. Without committing myself to which way is correct, I’ll provide three examples: from Texas in 1987, from Florida in the late 1990s, and from the federal Taft facility in 1999–2002. a. Texas In Texas, private prisons were authorized in 1987 with the passage of SB 251,38 which required that private prisons show a 10% savings to the state compared to public prisons.39 Calculating the per-diem cost of public incarceration in Texas thus became important, since the maximum contract price for private providers would be 90% of that cost. 35 MCDONALD ET AL., supra note 22, at 36–37. Gerald G. Gaes, The Current Status of Prison Privatization Research on American Prisons, at 17–18 (Aug. 2010); id. at 17 (“Other complications arise from the appropriate treatment of property, sales, or income taxes paid by private contractors, as well as profits from inmate phone calls and commissary accounts.”); see also MCDONALD ET AL., supra note 22, at 37. Private companies are also loath to divulge their own financial details. See McDonald, supra note 25, at 89; NSW PARLIAMENT, LEGIS. ASSEMBLY, PUB. ACCOUNTS COMM., VALUE FOR MONEY FROM NSW CORRECTIONAL CENTRES 23 (Rep. No. 13/53 (No. 156), Sept. 2005); FLA. LEGIS., OFF. OF PROG. POL’Y ANAL. & GOV’T ACCOUNTABILITY, PERFORMANCE AUDIT OF THE GADSDEN CORRECTIONAL INSTITUTION 2 (Rep. No. 95-48, 1996). 37 McDonald, supra note 25, at 100. 38 C. ELAINE CUMMINS, PRIVATE PRISONS IN TEXAS, 1987–2000, at 15 (doctoral thesis, American Univ., Washington, D.C., 2001) (on file with author). 39 Id. at 42; see also TEX. GOV’T CODE § 495.003(c)(4). 36 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 13 The Texas Department of Corrections40 came up with an estimate of $27.62.41 The Legislative Budget Board, however, proposed a number of additions to this cost, to better take into account the costs of complying with Ruiz v. Estelle,42 building costs, the state’s cost to provide additional programs that private firms would be required to provide, and the like.43 All these adjustments raised the estimated per-diem cost by about 50%—to $41.67.44 In the end, contracts were awarded within a range of $28.72 to $33.80— between the two estimates, though closer to the first one.45 b. Florida In Florida, the Office of Program Policy Analysis and Government Accountability (OPPAGA) compared two private facilities, Bay Correctional Facility and Moore Haven Correctional Facility, with a public facility, Lawtey Correctional Institution.46 After various adjustments, OPPAGA calculated that the per-diem operating cost was $46.08 at Bay and $44.18 at Moore Haven, versus $45.98 at Lawtey; that is, Bay was 0.2% more expensive and Moore Haven 3.9% cheaper than the public facility. 47 The Florida Department of Corrections had come up with its own numbers: $45.04 at Bay and $46.32 at Moore Haven, versus $45.37 at Lawtey:48 Bay was 0.7% cheaper and Moore Haven 2.1% more expensive. The Corrections Corporation of America, which operated Bay, submitted comments to the OPPAGA report, disputing its analy- 40 Now absorbed into the Texas Department of Criminal Justice. See TEX. DEP’T OF CRIM. JUST., AGENCY STRATEGIC PLAN, FISCAL YEARS 2013–17, at 2 (2012). 41 CUMMINS, supra note 38, at 155. 42 503 F. Supp. 1265 (S.D. Tex. 1980). 43 CUMMINS, supra note 38, at 156–57. 44 Id. at 156 tbl.9. 45 Id. at 158. One facility received an extra $7.41 for an “intensive substance abuse treatment program.” Id.; see also GAES ET AL., supra note 7, at 87–88. 46 STATE OF FLA., OFFICE OF PROGRAM POL’Y ANALYSIS & GOV’T ACCOUNTABILITY, REVIEW OF BAY CORRECTIONAL FACILITY AND MOORE HAVEN CORRECTIONAL FACILITY, at 9 (Report No. 97-68, 1998) [hereinafter OPPAGA]. 47 Id. at 9 exh.4. 48 FLA. DEP’T OF CORR., Budget, in 1996–97 ANNUAL REPORT, available at http://www.dc. state.fl.us/pub/annual/9697/budget.html. These estimates were analyzed in FLA. DEP’T OF CORR., PRIVATIZATION IN THE FLORIDA DEPARTMENT OF CORRECTIONS (1998). See GAES ET AL., supra note 7, at 191 n.4. Draft—Please do not circulate 14 VOLOKH sis.49 It disagreed that Lawtey was comparable,50 and suggested its own adjustments to OPPAGA’s numbers for all three facilities. Under CCA’s analysis, Bay cost $45.16 and Moore Haven cost $46.32, versus $49.30 for Lawtey, which comes out to cost savings of 8.4% for Bay and 6.0% for Moore Haven.51 (OPPAGA, understandably, disupted CCA’s modifications.52) c. Taft Perhaps the best example of competing, side-by-side cost studies comes from the evaluation of the federal facility in Taft, California, operated by The GEO Group. A Bureau of Prisons cost study by Julianne Nelson compared the costs of Taft in fiscal years 1999 through 2002 to those of three federal public facilities: Elkton, Forrest City, and Yazoo City.53 The Taft costs ranged from $33.42 to $38.62; the costs of the three public facilities ranged from $34.84 to $40.71. Taft was cheaper than all comparison facilities and in all years, by up to $2.42 (about 6.6%)—except in fiscal year 2001, when the Taft facility was more expensive than the public Elkton facility by $0.25 (about 0.7%).54 Sloppily averaging over all years and all comparison institutions, the savings was about 2.8%. A National Institute of Justice study by Douglas McDonald and Kenneth Carlson55 found much higher cost savings. They calculated Taft costs ranging from $33.25 to $38.37, and public facility costs ranging from $39.46 to $46.38.56 Private-sector savings ranged from 9.0% to 18.4%. Again averaging over all years and all comparison institutions, the savings was about 15.0%: the two cost 49 OPPAGA, supra note 46, at 55–61 (Corrections Corporation of America’s comments, with OPPAGA’s comments interspersed throughout). 50 Id. at 57. 51 Id. at 61. 52 Id. at 59. 53 JULIANNE NELSON, COMPETITION IN CORRECTIONS: COMPARING PUBLIC AND PRIVATE SECTOR OPERATIONS 10 (CNA Corp., 2005). 54 Id. at 42 fig.5. The study also compared actual GEO costs to hypothetical costs if Taft had been kept in-house. This comparison gave the edge to the public sector, id. at 25–26, but I don’t stress this result because it’s based on a comparison with a hypothetical public institution, not on actual public-sector costs. 55 DOUGLAS C. MCDONALD & KENNETH CARLSON, CONTRACTING FOR IMPRISONMENT IN THE FEDERAL PRISON SYSTEM: COST AND PERFORMANCE OF THE PRIVATELY OPERATED TAFT CORRECTIONAL INSTITUTION (Abt Assocs. Inc., 2005). 56 Id. at 48 tbl.2.18. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 15 studies differ in their estimates of private-sector savings by a factor of about five. Why such a difference? First, the Nelson study (but not the McDonald and Carlson study) adjusted expenditures to iron out Taft’s economies of scale from handling about 300 more inmates each year than the public facilities.57 Second, the studies differed in what they included in overhead costs, with the Nelson study allocating a far higher overhead rate.58 These examples should be enough to give a sense of the complications in cost comparisons; given these difficulties, it’s not surprising that most studies have fallen short. B. Which Sector Provides Higher Quality? 1. Difficulties in Figuring Out Quality Moving on to quality comparisons, the picture is similarly grim. As with cost comparisons, sometimes no comparable facility exists in the same jurisdiction.59 Some studies solve that problem by looking at prisons in different jurisdictions, an approach that has its own problems.60 (If one had a large database with several prisons in each jurisdiction, one could control for the jurisdiction, but this approach is of course unavailable when comparing two prisons, each in its own jurisdiction.) Many studies just don’t control for clearly relevant variables in determining whether a facility is truly comparable.61 57 Gaes, supra note 25, at 34. Id. at 34–35; Gaes, supra note 36, at 20. 59 MCDONALD ET AL., supra note 22, at 54–55 (discussing Arizona facilities compared in THOMAS, supra note 23); Gerald G. Gaes et al., The Performance of Privately Operated Prisons: A Review of Research, printed as Appendix 2 with separate page numbering in MCDONALD ET AL., supra note 22, at 12 (same). 60 MCDONALD ET AL., supra note 22, at 55 (discussing CHARLES H. LOGAN, WELL KEPT: COMPARING QUALITY OF CONFINEMENT IN A PUBLIC AND A PRIVATE PRISON (Nat. Inst. of Just. 1991); Charles H. Logan, Well Kept: Comparing Quality of Confinement in Private and Public Prisons, 83 J. CRIM. L. & CRIMINOLOGY 577 (1992)). 61 Gaes et al., supra note 59, at 5 (criticizing the use of univariate methods in the comparison of Kentucky facilities in URBAN INSTITUTE, COMPARISON OF PRIVATELY AND PUBLICLY OPERATED CORRECTIONAL FACILITIES IN KENTUCKY AND MASSACHUSETTS (1989)); id. at 18 (discussing lack of information on characteristics of inmate populations in WILLIAM G. ARCHAMBEAULT & DONALD R. DEIS, JR., COST EFFECTIVENESS COMPARISONS OF PRIVATE VERSUS PUBLIC PRISONS IN LOUISIANA: A COMPREHENSIVE ANALYSIS OF ALLEN, AVOYELLES, AND WINN CORRECTIONAL CENTERS (1996)); id. at 19 (discussing lack of control for differences in number of inmates at some comparison prisons in ARCHAMBEAULT & DEIS, supra); Scott D. Camp & Gerald G. Gaes, Private Adult Prisons: What Do We Really Know and Why Don’t We Know More?, in PRIVATIZATION IN CRIMI(continued next page) 58 Draft—Please do not circulate 16 VOLOKH Often, the comparability problem boils down to differences in inmate populations; one prison may have a more difficult population than the other, even if they have the same security level. Usually prisons have different populations because of the luck of the draw,62 but sometimes it’s by design, as happened in Arizona, when the Department of Corrections made “an effort to refrain from assigning prisons to [the private prison] if they [had] serious or chronic medical problems, serious psychiatric problems, or [were] deemed to be unlikely to benefit from the substance abuse program that is provided at the facility.”63 It’s actually quite common to not send certain inmates to private prisons; the most common restriction in contracts is on inmates with special medical needs.64 Not that all prisons must have totally random assignment; it can be rational to tailor prisoner assignment to, say, the programming available at a prison. But such practices do have “the unintended effect of undermining cost comparisons.”65 Another practice that undermines cost comparisons is contractual terms limiting the private contractor’s medical costs.66 NAL JUSTICE: PAST, PRESENT, AND FUTURE 283, 285–87 (David Shichor & Michael J. Gilbert eds., 2001) (critiquing ARCHAMBEAULT & DEIS, supra, and THOMAS, supra note 23); GAES ET AL., supra note 7, at 51–53 (discussing ARCHAMBEAULT & DEIS, supra). 62 Gaes et al., supra note 59, at 4 (discussing comparison of Kentucky facilities in URBAN INSTITUTE, supra note 61, where public sector had more difficult adult population while private sector had more difficult juvenile population); id. at 9 (discussing TENN. SELECT OVERSIGHT CMTE. ON CORR., COMPARATIVE EVALUATION OF PRIVATELY-MANAGED CCA PRISON (SOUTH CENTRAL CORRECTIONAL CENTER) AND STATE-MANAGED PROTOTYPICAL PRISONS (NORTHEAST CORRECTIONAL CENTER, NORTHWEST CORRECTIONAL CENTER (1995)); id. at 11 (discussing STATE OF WASH. LEGIS. BUDGET CMTE., DEPARTMENT OF CORRECTIONS PRIVATIZATION FEASIBILITY STUDY (Report 96-2, 1996)); id. at 20 (criticizing the use of the Angola facility as a comparison facility in ARCHAMBEAULT & DEIS, supra note 61); id. at 20–21 (discussing that low urinalysis hit rates in ARCHAMBEAULT & DEIS, supra note 61, could indicate a population less inclined to use drugs, and low medical risk scores could indicate a population less in need of serious medical care). 63 THOMAS, supra note 23, at 73. 64 CAMP & GAES, supra note 23, at 21–22 (some restrictions in effect in 62.5% of the contracts surveyed; special medical needs restriction in 50% of contracts; other restrictions include highpublicity inmates and gang members). 65 THOMAS, supra note 23, at 73. 66 See, e.g., Contract Between the State of Tennessee and Corrections Corporation of America, RFS-329.44-004 [hereinafter Tennessee CCA 2007 contract] ¶ A.4.g.13)(a), at 13, available at http://www.capitol.tn.gov/joint/committees/fiscal-review/archives/106ga/contracts/RFS%20329.4400408%20Correction%20%28CCA%20-%20amendment%201%29.pdf (“If the inmate is hospitalized, the Contractor shall not be responsible for inpatient-Hospital Costs which exceed $4,000.00 per inmate per admission.”); id. ¶ A.4.g.13)(b) (“The Contractor shall not be responsible for the cost of providing anti-retroviral medications therapeutically indicated for the treatment of inmates with AIDS or HIV infection.”). By its terms, this contract covers serves at the South Central Correctional Center, id. at 1, and runs from 2007 to 2010, id. at 22. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 17 Some performance studies rely on surveys administered to a non-random sample of inmates67 or potentially biased staff surveys,68 or generally to populations of inmates or staff that aren’t randomly assigned to public and private prisons.69 Survey data isn’t useless, but it’s rarely used with the appropriate sensitivity to its limitations.70 The higher-quality survey-based studies don’t give the edge to either sector.71 Most damningly, many studies don’t rely on actual performance measures,72 relying instead on facility audits that are largely process-based.73 Some supposed performance measures don’t necessarily indicate good performance,74 especially when the prisons are compared based on a “laundry list” of available data items (for instance, staff satisfaction) whose relevance to good performance hasn’t been theoretically established.75 Gerald Gaes and his coauthors conclude that most studies are “fundamentally flawed,” and agree with the GAO’s conclusion that there is “little information that is widely applicable to various correctional settings.”76 67 Gaes et al., supra note 59, at 6 (discussing DALE K. SECHREST & DAVID SHICHOR, FINAL REPORT: EXPLORATORY STUDY OF CALIFORNIA’S COMMUNITY CORRECTIONAL FACILITIES (Cal. Dep’t of Corr., Parole & Comm. Servs. Div., 1994)). 68 Gaes et al., supra note 59, at 24 (discussing staff surveys in LOGAN, supra note 60; Logan, supra note 60). 69 GAES ET AL., supra note 7, at 74–76 (critiquing Judith Greene, Comparing Private and Public Prison Services and Programs in Minnesota: Findings from Prisoner Interviews, 2 CURR. ISS. IN CRIM. JUST. 202 (1999); Judith Greene, Lack of Correctional Services, in CAPITALIST PUNISHMENT: PRISON PRIVATIZATION & HUMAN RIGHTS (Andrew Coyle et al. eds., 2003); LOGAN, supra note 60; Logan, supra note 60); Scott D. Camp et al., Quality of Prison Operations in the US Federal Sector: A Comparison with a Private Prison, 4 PUNISH. & SOC. 27, 32–34 (2002). For a general discussion of methods, see Scott D. Camp et al., Creating Performance Measures from Survey Data: A Practical Discussion, 3 CORR. MGMT. Q. 71 (1999). 70 See RICHARD W. HARDING, PRIVATE PRISONS AND PUBLIC ACCOUNTABILITY 115–19 (1997). 71 Scott D. Camp et al., Using Inmate Survey Data in Assessing Prison Performance: A Case Study Comparing Private and Public Prisons, 27 CRIM. JUST. REV. 26, 31 (2003); Camp et al., Quality of Prison Operations, supra note 69, at 49–50; see also GAES ET AL., supra note 7, at 83. 72 Gaes et al., supra note 59, at 9 (discussing TENN. SELECT OVERSIGHT CMTE. ON CORR., supra note 62). 73 Not that prison audits are useless; Gerald Gaes, in fact, who is a big booster of performance measurement, discusses how audits could be improved to be made more useful. GAES ET AL., supra note 7, at 31–37. 74 Gaes et al., supra note 59, at 20 (discussing, in the context of ARCHAMBEAULT & DEIS, supra note 61, how a low count of disciplinary actions could indicate either good or bad performance); id. at 25–27 (discussing similar difficulties in interpreting items in LOGAN, supra note 60; Logan, supra note 60). 75 Camp & Gaes, supra note 61, at 286. 76 Gaes et al., supra note 59, at 31 (citing PRIVATE AND PUBLIC PRISONS: STUDIES COMPARING OPERATIONAL COSTS AND/OR QUALITY OF SERVICE 11 (Gen. Account. Off., 1996)). Draft—Please do not circulate 18 VOLOKH I would add that accountability mechanisms vary widely—the standard U.S. model, the Florida model, and the UK model are different,77 and these in turn differ from the French model78 or the model proposed for prison privatization in Israel before the Israeli Supreme Court invalidated the experiment.79 When a prison study finds some result about comparative quality, that tells us something about comparative quality within that accountability structure; if a private prison performed inadequately under one accountability structure, it might do better under a better one.80 Consider, for instance, the performance evaluations of the private federal Taft facility. As with the cost studies discussed above,81 we have two competing studies, the National Institute of Justice one by McDonald and Carlson82 and a Bureau of Prisons study by Scott Camp and Dawn Daggett83—the companion paper to Julianne Nelson’s cost paper.84 The Bureau of Prisons has evaluated public prisons by the Key Indicators/Strategic Support System since 1989.85 Taft, alas, didn’t use that system, but instead used the system designed in the contract for awarding performance-related bonuses.86 Therefore, McDonald and Carlson could only compare Taft’s performance with that of the public comparison prisons on a limited number of dimensions,87 and many of these dimensions—like accreditation of the facility, staffing levels, or frequency of seeing a doctor 88— aren’t even outcomes. Taft had lower assault rates than the average of its comparison institutions, though they were within the range of 77 See, e.g., David E. Pozen, Managing a Correctional Marketplace: Prison Privatization in the United States and the United Kingdom, 19 J.L. & POL. 253, 276–81 (2003) (comparing American and British accountability systems); HARDING, supra note 70, at 158–165 (describing the “basic model” of accountability, the UK model, and the Florida model, and proposing a new model). 78 See JON VAGG, PRISON SYSTEMS: A COMPARATIVE STUDY OF ACCOUNTABILITY IN ENGLAND, FRANCE, GERMANY, AND THE NETHERLANDS 305–07 (1994). 79 See HCJ 2605/05 Acad. Ctr. of Law & Bus., Human Rights Div. v. Minister of Finance [2009] (Isr.) ¶ 18, http://elyon1.court.gov.il/files_eng/05/050/026/n39/05026050.n39.htm; Volokh, supra note 6, at 180–85, 198–99 (discussing this opinion). 80 Gaes, supra note 36, at 30, also calls for more study of different accountability structures. 81 See text accompanying supra notes 53–56. 82 MCDONALD & CARLSON, supra note 55. 83 SCOTT D. CAMP & DAWN M. DAGGETT, EVALUATION OF THE TAFT DEMONSTRATION PROJECT: PERFORMANCE OF A PRIVATE-SECTOR PRISON AND THE BOP (2005). 84 NELSON, supra note 53. 85 MCDONALD & CARLSON, supra note 55, at 119; see also text accompanying infra note 297– 298. 86 Gaes, supra note 25, at 35; text accompanying infra note 168. 87 MCDONALD & CARLSON, supra note 55, at 119. 88 Id. at 143. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 19 observed assault rates.89 No inmates or staff were killed.90 There were two escapes, which was higher than at public prisons.91 Drug use was also higher at Taft, as was the frequency of submitting grievances.92 On this very limited analysis, Taft seems neither clearly better nor clearly worse than its public counterparts. The Camp and Daggett study, on the other hand, created performance measures from inmate misconduct data,93 and concluded not only that Taft “had higher counts than expected for most forms of misconduct, including all forms of misconduct considered together,”94 but also that Taft “had the largest deviation of observed from expected values for most of the time period examined.” Camp and Daggett’s performance assessment was thus more pessimistic than McDonald and Carlson’s. According to Gerald Gaes, the strongest studies include one from Tennessee, which shows essentially no difference, one from Washington, which shows somewhat positive results,95 and three more recent studies of federal prisons by himself and coauthors, which found public and private prisons to be equivalent on some measures, higher on others, and lower on yet others.96 2. Which Sector Leads to Less Recidivism? Recidivism reduction is really just one dimension of prison quality, though it’s a particularly relevant one that deserves its own section. If we found that inmates at private prisons were less likely to reoffend than comparable inmates at public prisons, this would be an important factor in any comparison of public and private pris- 89 Id. at 126, 127 fig.4.2. To focus on the three comparison prisons from the cost analyses, Elkton’s assault rate was similar to what would have been expected, while Taft, like Forrest City and Yazoo City, had lower rates than what would have been expected. Yazoo City’s was the lowest. Gaes, supra note 25, at 36. 90 MCDONALD & CARLSON, supra note 55, at 128. 91 Id. at 128. 92 Id. at 143. 93 CAMP & DAGGETT, supra note 83, at 36. 94 Id. at 59. 95 Gaes et al., supra note 59, at 31. 96 Gaes, supra note 36, at 26 (citing Camp et al., supra note 71; Scott D. Camp et al., The Influence of Prisons on Inmate Misconduct: A Multilevel Investigation, 20 JUST. Q. 501 (2003); Camp et al., Quality of Prison Operations, supra note 69). Draft—Please do not circulate 20 VOLOKH ons. Unfortunately, recidivism comparisons haven’t been very good either. A study from the late 1990s by Lonn Lanza-Kaduce and coauthors reported that inmates released from private prisons were less likely to reoffend than a matched sample of inmates released from public prisons, and had less serious offenses if they did reoffend.97 But this study has been critiqued on various grounds.98 First, not all the recidivism measures are significant: while various reoffense-related rates were found to be significantly lower in the private sector,99 and while the seriousness of reoffending was found to be significantly lower in the private sector,100 a time-tofailure analysis found that there was no significant difference in the “length of time that a release ‘survived’ without an arrest during the 12-month period.”101 Second, the public inmates seem to not really have been well matched to the private inmates; they only seemed so when their descriptive variables were described at a high level of generality (e.g., custody level vs. “the underlying continuous score measuring custody level,” whether inmates had two or more incarcerations vs. the actual number of incarcerations, etc.).102 Third, the authors seem to have made the questionable decision to assign an inmate to the sector he was released from, even if he had spent time in several sectors: thus, an inmate who spent years in public prison and was transferred to private prison shortly before his release was classified as a private prison releasee.103 Fourth, a private releasee who reoffended could take longer to be 97 Lonn Lanza-Kaduce et al., A Comparative Recidivism Analysis of Releasees from Private and Public Prisons, 45 CRIME & DELINQ. 28 (1999); see also Lonn Lanza-Kaduce et al., The Devil in the Details: The Case Against the Case Study of Private Prisons, Criminological Research, and Conflict of Interest, 46 CRIME & DELINQ. 92, 96–97 (2000). 98 The critiques are discussed in GAES ET AL., supra note 7, at 24–26. Gaes et al. argues, see id. at 27, argues that several of the critiques continue to apply to a later paper with a longer follow-up period, L. Lanza-Kaduce & S. Maggard, The Long-Term Recidivism of Public and Private Prisoners, paper presented at the National Conference of the Bureau of Justice Statistics and Justice Research and Statistics Association, New Orleans, 2001. 99 The difference in rearrest rates is significant at the 1% level and the difference in resentencing rates is significant at the 5% level, but the differences in reincarceration rates and for any indication of recidivism are only significant at the 10% level. Lonza-Kaduce et al., supra note 97, at 36– 37. 100 Id. at 37–38. 101 Id. at 38–41. 102 GAES ET AL., supra note 7, at 25 (citing FLA. DEP’T OF CORR., BUR. OF RES. & DATA ANALYSIS, PRELIMINARY ASSESSMENT OF A STUDY ENTITLED “A COMPARATIVE RECIDIVISM ANALYSIS OF RELEASEES FROM PRIVATE AND PUBLIC PRISONS IN FLORIDA” (1998)). 103 Id. at 26. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 21 entered in the system than a public releasee,104 so the truly comparable number of private recidivists may well have been larger than reported. A later study by David Farabee and Kevin Knight105 that “corrected for some of these deficiencies”106 found no comparative difference in the reoffense or reincarceration rates of males or juveniles over a three-year post-release period, though women had lower recidivism in the private sector.107 However, this study may still suffer from the problem of the attribution of inmates who spent some time in each sector, as well as possible selection bias to the extent that private prisons got a different type of inmate than public prisons did.108 Another study by William Bales and coauthors,109 even more rigorous,110 likewise found no statistically significant difference between public-inmate and private-inmate recividism.111 A more recent study, by Andrew Spivak and Susan Sharp, reported that private prisons were (statistically) significantly worse in six out of eight models tested.112 But the authors noted that some skepticism was in order before concluding that public prisons necessarily did better on recidivism. Populations aren’t randomly assigned to public and private prisons: that private prisons engage in “cream-skimming” is a persistent complaint.113 Recall the case in 104 Id. DAVID FARABEE & KEVIN KNIGHT, A COMPARISON OF PUBLIC AND PRIVATE PRISONS IN FLORIDA: DURING- AND POST-PRISON PERFORMANCE INDICATORS (Query Research, 2002). 106 GAES ET AL., supra note 7, at 27. 107 FARABEE & KNIGHT, supra note 105, at ii–iii, 20–25. 108 GAES ET AL., supra note 7, at 28. 109 William D. Bales et al., Recidivism of Public and Private State Prison Inmates in Florida, 4 CRIMINOLOGY & PUB. POL’Y 57 (2005). 110 See Gaes, supra note 36, at 9. 111 Bales et al., supra note 109, at 69, 72, 74. 112 Andrew L. Spivak & Susan F. Sharp, Inmate Recidivism as a Measure of Private Prison Performance, 54 CRIME & DELINQ. 482, 500 tbl.5, 501 (2008). 113 See, e.g., GAES ET AL., supra note 7, at 28; Dolovich, supra note 5, at 505; John J. DiIulio, Jr., The Duty to Govern: A Critical Perspective on the Private Management of Prisons and Jails, in PRIVATE PRISONS AND THE PUBLIC INTEREST, supra note 25, at 155, 166–67 (stating that private firms “engage in correctional creaming when they bid,” meaning that they avoid bidding on facilities that they expect will “bring negative media attention, legislative inquiries, staff unrest, lawsuits, and judicial intervention,” “the Atticas and Riker Islands of the country”); Richard A. Oppel Jr., Private Prisons Found to Offer Little in Savings, N.Y. TIMES, May 18, 2011 (discussing Arizona Department of Corrections study stating that private prisons “often house only relatively healthy inmates” and quoting State Representative Chad Campbell calling this practice “cherry-picking”); STATE OF ARIZONA, OFFICE OF THE AUDITOR GENERAL, DEPARTMENT OF CORRECTIONS—PRISON POPULATION GROWTH 20 (Report No. 10-08, 2010) (“[P]rivate prisons do not accept inmates in need of more serious medical care . . . .”); ARIZONA DEP’T OF CORR., FY 2009 OPERATING PER CAPITA (continued next page) 105 Draft—Please do not circulate 22 VOLOKH Arizona, where the Department of Corrections made “an effort to refrain from assigning prisons to [the private Marana Community Correctional Facility] if they [had] serious or chronic medical problems, serious psychiatric problems, or [were] deemed to be unlikely to benefit from the substance abuse program that [was] provided at the facility.”114 But the phenomenon can also run the other way. One of the authors of the recidivism study, Andrew Spivak, writes that he was “a case manager at a medium-security public prison in Oklahoma in 1998, he noted an inclination for case management staff (himself included) to use transfer requests to private prisons as a method for removing more troublesome inmates from case loads.”115 Moreover, recidivism data is itself often flawed.116 Recidivism has to be not only proved (which requires good databases) but also defined.117 Recidivism isn’t self-defining—it could include arrest; reconviction; incarceration; or parole violation, suspension, or revocation; and it could give different weights to different offenses depending on their seriousness.118 Which definition one uses makes a difference in one’s conclusions about correctional effectiveness,119 as well as affecting the scope of innovation.120 The choice of how long to monitor obviously matters as well: “[m]ost severe offences occur in the second and third year after release.”121 Recidivism measures might also vary because of variations in, say, COST REPORT 2 (2010) (discussing inmates “returned to state prisons due to an increase of their medical scores that exceeds contractual exclusions”); id. at 4 (same); id. at 10 n.1 (similar); id. at 12–16 (discussing medical, mental health, and other restrictions on inmates that can be sent to particular private prisons). Compare Gaes et al., supra note 59, at 34–35 (stressing that the federal Taft facility, the subject of the comparative study reported at text accompanying supra notes 53–58 and supra notes 81–92, will house inmates equivalent to those at the comparison facilities). 114 THOMAS, supra note 23, at 73; see text accompanying supra note 63. 115 Spivak & Sharp, supra note 112, at 503–04. 116 MICHAEL D. MALTZ, RECIDIVISM 58–60 (1984); PUBLIC AND PRIVATE PRISONS, supra note 76, at 30–31 (discussing SECHREST & SHICHOR, supra note 67) (“Sufficient data were not available to adequately complete the analysis comparing the inmates released from the community correctional facilities to inmates released from other correctional institutions in the state.”); Gaes et al., supra note 59, at 7 (same). 117 Brakel & Gaylord, supra note 10, at 154. 118 MALTZ, supra note 116, at 62. 119 Id. at 63; see also JAMES DICKER, PAYMENT-BY-OUTCOME IN OFFENDER MANAGEMENT 16 (2020 Public Services Trust at the RSA, Case Study 2, Feb. 2011) (“[N]either reconviction nor reimprisonment rates capture all re-offending behaviour, as only about 45% of offenders who are reconvicted are incarcerated and it is possible to be recalled to prison for breaching license conditions without being reconvicted.”) 120 DICKER, supra note 119, at 18. 121 Id. at 16–17. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 23 enforcement of parole conditions, independent of the true recidivism of the underlying population.122 The study of the comparative recidivism of the public and private sector could thus use a lot of improvement.123 C. The Limits of Comparative Effectiveness After having read the foregoing, one should be fairly dismayed at the state of comparative public-private prison research.124 In fact, it gets worse. An overarching problem is that most studies don’t simultaneously compare both cost and quality, not both. It’s hard to draw strong conclusions from such studies, even if they’re state-of-the-art at what they’re examining.125 If we find that a private prison costs less, how do we know that they didn’t achieve that result by cutting quality? (This is the standard critique of private prisons.126) If we find that a private prison costs more, how do we know that they didn’t cost more because of the fancy and expensive programs they implemented?127 (According to Douglas McDonald, this was exactly the problem with the cost comparison of the Silverdale Detention Center in Hamilton County, Tennessee.128) 122 MALTZ, supra note 116, at 66–67. See also Gaes, supra note 36, at 9–12 (discussing these studies). 124 Some studies are actually meta-analyses. See Gaes, supra note 36, at 3–6 (discussing metaanalyses and literature reviews). Two recent meta-analyses showed little difference between the public and private sectors. One, only analyzing costs, found no statistical difference between the public and private sectors. Travis C. Pratt & Jeff Maahs, Are Private Prisons More Cost-Effective Than Public Prisons? A Meta-Analysis of Evaluation Research Studies, 45 CRIME & DELINQ. 358, 365 & 366 tbl.2 (1999). Another, looking at both cost and quality, found that the private sector was both slightly cheaper and slightly worse; but with such small effects, the authors concluded that “prison privatization provides neither a clear advantage nor disadvantage.” Brad W. Lundahl et al., Prison Privatization: A Meta-Analysis of Cost and Quality of Confinement Indicators, 19 RES. ON SOC. WORK PRACTICE 383, 392 (2009). A third—more a literature review than a metaanalysis—reported that the comparison was “inconclusive,” Dina Perrone & Travis C. Pratt, Comparing the Quality of Confinement and Cost-Effectiveness of Public Versus Private Prisons: What We Know, Why We Do Not Know More, and Where to Go from Here, 83 PRISON J. 301 (2003); and in any event there was no formal attempt to control for differences between the public and private prisons compared. Given that many of the underlying studies are flawed in various ways, it’s not clear how you do better by aggregating them. When studies done in vastly different ways and subject to different sources of bias are aggregated in a meta-analysis, the results are “garbage in, garbage out.” 125 See PUBLIC AND PRIVATE PRISONS, supra note 76, at 13. 126 See, e.g., Dolovich, supra note 5, at 474–80. 127 See Developments, supra note 9, at 1875; MCDONALD ET AL., supra note 22, at 34–35. 128 McDonald, supra note 25, at 91. 123 Draft—Please do not circulate 24 VOLOKH Our goal should be to determine the production function for public and private prisons; this is the only way we’ll find out whether privatization moves us to a higher production possibilities frontier or merely shifts us to a different cost-quality combination on the existing frontier.129 Realizing this allows us to throw out a lot of studies from the outset. At least people are taking more seriously the need to develop valid comparisons. Governments need to mandate, by regulation or by contract, that the information necessary to do valid comparisons become available, even if collecting this extra data would add to private facilities’ cost.130 Until we get a better handle on what works, public and private prisons should be required to live up to the same standards to facilitate comparisons. Private prisons should get the same types of inmates as public prisons—neither better nor worse131—and they should be restricted in whom they can transfer out.132 Having spent so long bemoaning the paucity of good comparative effectiveness studies, I should note that there’s more to life than comparative effectiveness. Even ignoring any differences between the public and private sectors, privatization can have systemic effects, altering how the public sector works.133 For one thing, privatization can, for better or worse, change the public sector as well. Suppose private prisons are better than public prisons but competitive pressures lead public prisons to improve as well.134 A comparative study may not be able to find any differ129 Cf. Caroline M. Hoxby, School Choice and School Competition: Evidence from the United States, 10 SWED. ECON. POL’Y REV. 1, 42–43 (2003) (“If school choice is to be public policy, and not merely an experiment, then the question we need to answer is whether students’ achievement would rise if they attended voucher or charter schools that had resources like those available to them in regular public schools. In other words, we should ask the achievement question, holding resources constant (as well as holding students’ ability, motivation, and other characteristics constant.”). 130 COST OF PRISONS, supra note 26, at 5, 13–14, 17, 19–20, 30. 131 See text accompanying supra notes 113–115. 132 See STATE OF FLORIDA, OFF. OF PROG. POL’Y ANAL. & GOV’T ACCOUNTABILITY, REVIEW OF CORRECTIONAL PRIVATIZATION 4 (Rep. No. 95-12, 1995) (recommending restrictions on transfers out of private prisons). 133 Cf. Hoxby, supra note 129, at 19 (“[school] choice can affect productivity through a variety of long-term, general equilibrium mechanisms that are not immediately available to an administrator,” like bidding up the wages of successful teachers and altering the mix of people who choose teaching as a career, making parents into more informed consumers by encouraging the spread of information about schools, altering what curricula are adopted, and the like). 134 Charles W. Thomas, Correctional Privatization in America: An Assessment of Its Historical Origins, Present Status, and Future Prospects, in CHANGING THE GUARD, supra note 10, at 57, 59. See also infra Part III.A (privatization can improve accountability of public sector). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 25 ence between the two sectors, and yet one can still say that privatization was a success.135 (Indeed, one study does suggest that for prisons, privatization might drive public agencies to be more efficient,136 though the statistical significance of this effect seems highly sensitive to the precise specification,137 and selection bias is a confounding issue.138) Similarly, if private prisons really do cost less and therefore allow for greater increases in capacity, thus relieving overcrowding across the board, that won’t show up in a comparative study.139 Likewise if best practices migrate from one sector to another through a process of cross-fertilization:140 Richard Harding calls this “the paradox of cross-fertilization—that regimes progressively become more similar than dissimilar to each other.”141 Alternatively, what if privatization leads to a race to the bottom? If private prison cost-cutting is harmful, and if public prisons have to cut costs to stay competitive, we may have lower quality, including higher recidivism, across the board.142 135 Cf. Hoxby, supra note 129, at 43 (suggesting that concentrating on the effect on student achievement of private schooling vs. public schooling is wrongheaded in the school choice debate, because school choice can be a success if, through competition, it leads to improvements in the public sector, so that there never emerges any difference between public and private school outcomes). 136 James F. Blumstein et al., Do Government Agencies Respond to Market Pressures? Evidence from Private Prisons, 15 VA. J. L. & SOC. POL’Y 446 (2008); see also James F. Blumstein & Mark A. Cohen, The Interrelationship Between Public and Private Prisons: Does the Existence of Prisoners Under Private Management Affect the Rate of Growth in Expenditures on Prisoners Under Public Management? (April 2003). 137 See Blumstein et al., supra note 136, at 465 (insignificant effect with two different specifications, significant effect with a third). 138 The authors estimate the effect using a two-stage regression where the first stage represents the probability of privatizing, but this method doesn’t always take care of selection effects. See Alexander Volokh, Do Faith-Based Prisons Work?, 63 ALA. L. REV. 43, 67–73 (2011). Gaes also critiques the study, see Gaes, supra note 36, at 12–14. I’ve discussed or critiqued selection bias in many places. See Volokh, supra note 9, at 1245–47; Alexander Volokh, Choosing Interpretive Methods: A Positive Theory of Judges and Everyone Else, 83 NYU L. REV. 769, 803–19 (2008); Alexander Volokh, Privatization, Free-Riding, and Industry-Expanding Lobbying, 30 INT’L REV. L. & ECON. 62, 68 (2010); Alexander Volokh, The Effect of Privatization on Public and Private Prison Lobbies, in 3 PRISON PRIVATIZATION: THE MANY FACETS OF A CONTROVERSIAL INDUSTRY 7, 24– 26 (Byron Eugene Price & John Charles Morris eds., 2012). 139 Developments, supra note 9, at 1875. 140 I discuss cross-fertilization at greater length below, see text accompanying infra note 190. 141 Richard Harding, Private Prisons, in 28 CRIME AND JUSTICE: A REVIEW OF RESEARCH 265, 334 (2001). But see Tony Ward, Book Review, 3 THEO. CRIMINOLOGY 125, 126 (1999) (reviewing HARDING, supra note 70) (conceding that Harding’s cross-fertilization argument is valid but noting that “[t]here seems to be a ‘heads I win, tails you lose’ quality to [Harding’s cross-fertilization] argument (if public prisons turn out to be better than private ones, that just proves that competition is good for them!)”). 142 Gerald G. Gaes, Prison Privatization in Florida: Promise, Premise, and Performance, 4 CRIMIN. & PUB. POL’Y 83, 87 (2005); GAES ET AL., supra note 7, at 108; HARDING, supra note 69, at (continued next page) Draft—Please do not circulate 26 VOLOKH In either of these two cases, good empirical evaluations are necessary, though detecting such dynamic, system-wide effects will require before-and-after studies, not comparative snapshots. Finally, to step back a bit from the privatization debate, regardless of what comparative effectiveness analysis show, both sectors may fall short of the ideal, so this exercise shouldn’t blind us to the continuing need to reform the whole system.143 I’ll add that, even if the public and private sectors are equivalent, one can argue against privatization on the grounds that—assuming it costs less— it enables greater expansion of the prison system and therefore may increase incarceration and hinder the search for alternative penal policies.144 III. WHY USE PERFORMANCE MEASURES? A. The Puzzle of Prisons? The moral so far is that the whole empirical literature on public and private prisons is highly inconclusive.145 As I noted in the Introduction, this should be somewhat of a puzzle for activists on both sides who claim that privatization should turn prisons into either humanitarian disaster zones or models of quality and efficiency.146 Of course, that the empirical literature is inconclusive doesn’t mean the sectors are equivalent; it means that current methods haven’t been good enough to detect the difference. A methodologically deficient literature could hide evidence of either good or bad quality. But if the differences are great enough, you’d think they might show through even with bad methods.147 The tentative conclusion I draw from the literature, though, is that there may be modest quality differences between the sectors, but not huge; the public sector is better on some dimensions and worse on others, and there’s no good evidence that either sector 138 (noting that reductions in public prisons’ staffing levels in response to competition could be alternatively characterized as “cross-fertilization” or “industrial blackmail”). 143 Dolovich, supra note 5, at 442. 144 See Volokh, supra note 6, at 142–43 & n.30 (collecting sources making this argument). 145 See Alexander Volokh, The Modest Effect of Minneci v. Pollard on Inmate Litigants, 46 AKRON L. REV. 287, 324 (2013). 146 See supra text accompanying notes 10–11. 147 See supra note 12. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 27 does better at reducing recidivism. And while the private sector is probably cheaper, it remains to be seen whether the cost savings is on the order of 15% (respectable) or on the order of 3% (somewhat negligible).148 But this puzzle largely disappears when we consider the institutional environment of private prisons. In many areas, the private sector has been good at delivering better results at a lower cost. This is because private producers are accountable to customers who care about the quality of the end product, and because they have the flexibility to change how they do things in response to problems they may encounter. Neither of these conditions is true for private prisons—not even slightly, not even as a first approximation. I’ve noted above that there’s limited evidence of private firm innovation.149 But this is because private prisons are highly constrained in how they operate. Private prison contracts essentially “‘governmentalize’ the private sector,”150 reproducing public prison regulations in the private contract. Privatization can come to resemble an exercise in who can better pretend to be a public prison.151 For instance, back in 1985, Robert Levinson complained of a contract with the Eckerd Foundation for the management of the Okeechobee School for Boys in which “[v]irtually every” contract item: concerned input activities and pertained to administrative/operational functions. Thus, Eckerd could have been in total compliance with all contractual provisions even if every released client committed a new offense on the first day in the community. Moreover, at no point in the contract 148 See supra text accompanying notes 53–56. See, e.g., Camp & Gaes, supra note 61, at 287; Dolovich, supra note 5, at 476; Scott D. Camp, Editorial Introduction: Private Prisons & Recidivism, 4 CRIMIN. & PUB. POL’Y 55, 55 (2005). 150 Thomas, supra note 134, at 64; see also id. at 82, 116 n.15, 100–02. 151 MCDONALD ET AL., supra note 125, at 49; DOUGLAS MCDONALD & CARL PATTEN JR., GOVERNMENTS’ MANAGEMENT OF PRIVATE PRISONS 18 (Abt Assocs. Inc., 2003); Gaes et al., supra note 59, at 12 (“Generally speaking, the contract [discussed in THOMAS, supra note 23] stipulates that [the private provider] run the . . . facility in a manner similar to that in which the state would have operated the prison.”); id. at 17 (“Basically, the State of Arizona has taken the position that a private contractor should be given the opportunity to demonstrate it can out perform the state in running an Arizona prison according to Arizona Department of Corrections policy.”); Durham, supra note 20, at 67; Harding, supra note 141, at 303. 149 Draft—Please do not circulate 28 VOLOKH were the criteria for noncompliance stated nor its consequences specified.152 More recently, in Arizona, an auditor general report stated: The Department requires that private prisons mirror state-operated facilities, and performs extensive oversight activities to ensure that its contractors meet its requirements. In order to maintain uniform standards for state and private prisons, the Department requires contractors to follow Department Orders, Director’s Instructions, Technical Manuals, Institution Orders, and Post Orders. These requirements extend to specific details, such as following the same daily menus as state-operated facilities. Contractors may request waivers from the Department for policies that are not applicable to private prisons, such as state fiscal management practices, employee evaluations, and employee benefits.153 The same daily menus! In Tennessee, “it even appears that private sector innovation was deliberately thwarted by making the private sector provider . . . abide by [state Department of Corrections] policy” in running the facility.154 Subjecting private contractors to public regulations is actually quite common;155 one exception to this trend is Florida, where public and private prisons are controlled by different agencies,156 and the agency that regulates private prisons tries to balance “setting policy and encouraging innovation.”157 More generally, input specification in private-prison contracts is routine, though of course the level of inputs specified can (and should) be “output-driven” in the sense that it’s “related to output objectives.”158 For instance, one 152 Levinson, supra note 24, at 87; see also id. at 88. DEBRA K. DAVENPORT, PERFORMANCE AUDIT: ARIZONA DEPARTMENT OF CORRECTIONS, PRIVATE PRISONS, at 9 (2001); see also Thomas, supra note 134, at 101. 154 Gaes et al., supra note 59, at 10. 155 CAMP & GAES, supra note 23, at vii (“[P]rivate contractors were typically obligated to use the training standards and policies of the public agencies.”) ;id. at 28. But see id. at ix (“[T]he private sector, even when there is no contractual obligation, has adopted the standards and policies of their public sector counterparts.”); id. at 32. 156 Id. at x, 32–33; see also HARDING, supra note 70, at 161. 157 CAMP & GAES, supra note 23, at x; see also Harding, supra note 141, at 303–04 (similar situation in Western Australia). 158 HARDING, supra note 70, at 67–68; see also Peter H. Kyle, Contracting for Performance: Restructuring the Private Prison Market, 54 WM. & MARY L. REV. 2087, 2111 (2013) (“[S]ome states have started to require the provision of vocational services . . . .”). Harding doesn’t distinguish between outputs and outcomes, see text accompanying supra note 17, so when he refers to outputs (continued next page) 153 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 29 can find liquidated damages provisions for certain input-based breaches like not complying with the state’s policies or not filling certain required positions.159 If inputs and procedures are highly regulated, it’s not surprising that the evidence for private-sector improvements isn’t overwhelming. The market is a discovery process; one shouldn’t expect different methods to emerge unless innovation is permitted. And not only permitted: one shouldn’t expect different methods to emerge unless the incentives favor it. If the premise of privatization is that incentives work, particularly given the greater flexibility of private industry, micromanaging inputs and failing to incorporate the full range of desirable outcomes into the contract price means giving up on much of the possible benefit of privatization. But the efforts to measure performance in various areas of government in various areas, from the Job Training Partnership Act of 1982160 and the Government Performance and Results Act of 1993161—and the limited efforts to make funding contingent on those performance measures162—have largely passed prisons by. Outcome measures aren’t totally absent. Contracts do include a limited range of outcome measures—for instance, limited penalties for escapes.163 But by and large, outcome-based compensation is here, he means something like outcomes. Harding also suggests “intermediate outputs” as a synonym for “output-driven inputs,” HARDING, supra note 69, at 67–68; perhaps this concept is close to what I refer to as simply “outputs.” See also RADIN, supra note 17, at 15 (defining “output” and “intermediate outcome” differently). 159 Leonard Gilroy, Innovators in Action 2012: Creating a Culture of Competition to Improve Corrections, REASON FOUND. (May 31, 2012), available at http://reason.org/news/printer/ohiocorrections-competition. 160 Pub. L. No. 103-62, 107 Stat. 285 (codified in scattered sections of 5 U.S.C. and 31 U.S.C.); see also Matthew S. Schoen, Good Enough for Government Work?: The Government Performance Results Act of 1993 and Its Impact on Federal Agencies, 32 SETON HALL LEGIS. J. 455, 467 (2008). 161 Pub. L. No. 97-300, § 106(b)(1), 96 Stat. 1322, 1333 (creating 29 U.S.C. § 1516 (repealed by Workforce Investment Act of 1998, Pub. L. No. 105-220, § 199(b)(2), 112 Stat. 936, 1059)) (permissible performance measures for job-training organizations include “(A) placement in unsubsidized employment, (B) retention in unsubsidized employment, (C) the increase in earnings, including hourly wages, and (D) reduction in the number of individuals and families receiving cash welfare payments and the amounts of such payments”); Laurence E. Lynn, Jr., Requiring Bureaucracies to Perform: What Have We Learned from the U.S. Government Performance and Results Act (GPRA)?, 17 POLITIQUES ET MANAGEMENT PUBLIC 1, 3 (1999). 162 See infra Part IV.C.1. 163 See, e.g., Tennessee CCA 2007 contract, ¶ A.4.x.2), at 17 (“In the event of an escape resulting in whole or in part from Contractor’s failure to perform pursuant to the provisions of this Contract, the State may seek damages in a court of competent jurisdiction.”) (note that there’s no provision for paying for escapes not stemming from non-performance; ¶ A.4.x.1, at 17, only requires that the contractor “exercise its best efforts to prevent escapes”). Draft—Please do not circulate 30 VOLOKH rare.164 And to the extent there are outcome-based rewards or penalties, “the amounts involved commonly have little or no correlation with the true magnitude of what independent contractors accomplished or failed to accomplish,” and “the dollar value of the reward or sanction is often too trivial to encourage superior performance or to deter defective performance.”165 Even developing outcome measures hasn’t been a high priority.166 In 1998—not that long ago—Douglas McDonald and his coauthors identified two exceptional cases of performance-based compensation: the Bureau of Prisons’ contract with Wackenhut167 for the operation of the Taft Correctional Institution, “which contain[ed] provisions for an award-fee incentive worth up to 5 percent of paid invoices,” and a District of Columbia contract with CCA for the Correctional Treatment Facility, “which permit[ted] financial rewards for meeting targets based on performance indicators.”168 Florida recently would have taken a good step in this direction, if the bill in question169 hadn’t been defeated. The bill would have required that private prison contracts make provision for measuring a number of dimensions of performance (though note that some of these are output measures): number of batteries, number of major disciplinary reports, percentage of negative random drug tests, number of escapes, percentage of inmates in “a facility that provides at least one of the inmate’s primary program needs,” and so on.170 The number of escapes also showed up in a more specific way: the contractor would have been required to reimburse the state for the costs of escapes.171 The Florida bill also listed required 164 See Pozen, supra note 77, at 282–83; Kenneth L. Avio, The Economics of Prisons, 6 EUR. J.L. & ECON. 143, 150 (1998); Thomas, supra note 134, at 107 (“[I]f there are contracts that include product-oriented requirements that go beyond mere evidence of participation, then they are contracts I have never read.”). 165 Thomas, supra note 134, at 109. 166 See Durham, supra note 20, at 67. 167 “Wackenhut Corrections Corp. changed its name to The GEO Group in November 2003 under the terms of a share purchase agreement with another company.” See Volokh, supra note 9, at 1229 n.131. 168 See MCDONALD ET AL., supra note 125, at 52. 169 Florida Senate, SB 2038, first engrossed version (Feb. 13, 2012), http://flsenate.gov/Session /Bill/2012/2038/BillText/e1/PDF. 170 Id. § 1, at 10–12 (creating FLA. STAT. § 944.7115(8)(f)(1)(a)–(r)). 171 Id. § 1, at 13 (creating FLA. STAT. § 944.7115(11)). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 31 various performance measures for work release centers.172 (I discuss various other performance measures below.173) The following sections develop these themes and discuss two distinct benefits of using performance measures. The first set of advantages of using performance measures, discussed in Section B, is a pure accountability advantage: we, as citizens and policymakers, would know how well our prisons are doing, we’d be better informed in deciding which sector to choose, either systemwide or on discrete projects, and we could think more clearly about what prisons should be doing. The second type of advantage, discussed in Section C, goes more to harnessing incentives to improve the system over time: incorporating such measures into contracts, and tying providers’ compensation to how well they do, would give providers a reason to care about quality and simultaneously let us grant them greater flexibility. Section D discusses what the normative issues involved in choosing the actual measures. B. Accountability, Neutrality, and Goal Setting 1. To Know What Works We all want to improve prisons. But forget about that for a moment. Even before any of these improvements were possible, performance measures would have the obvious effect of allowing us to measure performance. This would be a great step forward in researchers’ ability to conduct quality studies. We would have a better sense of which sector provides better quality; combine that with better cost studies that take into account the pitfalls described above,174 and we’d be better able to decide whether to be, or not be, one of the 19 states that ban private prisons entirely.175 If we do ban private prisons entirely, performance measures would help us These were: “(a) The percent of employment of supervised individuals; (b) The illegal substance use by supervised individuals; (c) The victim restitution paid by supervised individuals; (d) Compliance by supervised individuals with no-contact orders; (e) The number of serious incidents occurring at the facility; and (f) The number of absconders.” Id. § 1, at 12 (creating FLA. STAT. § 944.7115(8)(f)(2)(a)–(f)). 173 See infra Part III.D. 174 See supra Part II.A. 175 See supra note 18 and accompanying text. 172 Draft—Please do not circulate 32 VOLOKH determine which public prisons performed badly and where to look for improvement.176 2. To Implement Competitive Neutrality Suppose we decide not to ban private prisons entirely. Should we then contract out the entire prison system? Probably not: someone has to be able to run a facility if the current contractor has fallen down on the job or gone bankrupt,177 and given how concentrated the private prison industry currently is,178 it may not always be realistic to count on being able to easily bring in a competitor when this happens. How much of the system, then, should we privatize? The standard way to proceed is to choose particular prisons to privatize and put them up to bid to private firms, or to contract with private firms to use their own prisons. A more beneficial approach, though, would be to have a regime of “competitive neutrality,” where the public and private sector compete on the same projects.179 The best system may be one of mixed public and private management, where private programs “complement existing public programs rather than replace them.”180 (Health-care reformers’ advocacy of the “public option” in health insurance was premised on a similar idea: that public participation can make competition more fair by disciplining private providers more than they would discipline each other.181) 176 See Marc Holzer & Arie Halachmi, Measurement as a Means of Accountability, 19 INT’L J. PUB. ADMIN. 1921 (1996) (measurement improves accountability of public sector); Aloysius Bavon, Innovations in Performance Measurement Systems: A Comparative Perspective, 18 INT’L J. PUB. ADMIN. 491, 493, 502 (1995) (performance measurement arose as a result of the perceived inefficiency of the public sector). 177 HARDING, supra note 70, at 158 (“The state must in the last resort be able to reclaim private prisons.”); Michael J. Gilbert, How Much Is Too Much Privatization in Criminal Justice?, in PRIVATIZATION IN CRIMINAL JUSTICE, supra note 61, at 41, at 76–77. 178 Volokh, supra note 9, at 1237–38. 179 WILLIAM D. EGGERS, COMPETITIVE NEUTRALITY: ENSURING A LEVEL PLAYING FIELD IN MANAGED COMPETITIONS 6 (Reason Found., How-to Guide No. 18, 1998); Gaes, supra note 36, at 24. 180 Patrick Anderson et al., Private Corrections: Feast or Fiasco?, PRISON J., Oct. 1985, at 32, 38. 181 See JACOB S. HACKER, THE CASE FOR PUBLIC PLAN CHOICE IN NATIONAL HEALTH REFORM: KEY TO COST CONTROL AND QUALITY COVERAGE 1–2 (Berkeley Law Ctr. on Health, Econ., & Fam. Security, Dec. 16, 2008), available at http://ourfuture.org/report/case-public-plan-choicenational-health-reform; see also WILLIAM A. NISKANEN, JR., BUREAUCRACY AND REPRESENTATIVE GOVERNMENT 217 (1971) (“In the 1930’s, the primary case for the creation of public power authorities was to provide a “yardstick” with which to evaluate private electric utility monopolies.”). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 33 For instance, Gary Mohr, director of the Ohio Department of Rehabilitation and Correction, has talked about creating a “culture of competition” in corrections.182 Ohio has pursued a combination of outsourcing and insourcing: some public prisons have been sold or their management has been contracted out to the private sector, while one private prison has been taken in-house.183 The result, according to Mohr, is that one can “ratchet[] up the best practices that can be created from both the public sector and multiple private vendors.”184 But for this sort of system to work, we have to be able to fairly compare private-sector and public-sector bids before the fact. The cross-fertilization that’s supposed to result from competitive neutrality depends on flexibility, otherwise both sectors will try to do the same thing. But, without performance measures, flexibility undermines the ability to do the comparative analysis of bids that’s necessary to successfully implement cross-fertilization; the most straightforward way of making efficiency comparisons without performance measures is to mandate that the private sector replicate every public-sector procedure, down to the tiniest detail. And indeed, this is what Mohr did when selling the North Central Correctional Complex facility to the private sector.185 But with performance measures—and with an understanding of how proposed programs and methods translate into performance— he would have been to take different proposals, translate them into expected performance, and thus have a basis for comparison, even if the proposals were radically dissimilar.186 (The beliefs about expected performance would then have to be verified by evaluating the winning contractor’s performance after the fact.) In particular, recall the problems involved in figuring out the public sector’s true costs:187 the same problems can make for unfair competitions if public providers’ bids don’t include the costs 182 Gilroy, supra note 159. Id. 184 Id. 185 Id. (“[I]n the [request for proposals], . . . [w]e replicated the post assignments and the staffing pattern and the policies and the food requirements. We basically said, ‘you must identify a minimum of a 5 percent savings’ from exactly the cost of what it has cost us to operate North Central.”). 186 Ohio actually has performance metrics, which are a combination of output and outcome measures, covering “everything from violence indicators, to use of force indicators, to program completion indicators (GED, etc.), to recidivism data.” Id. But they apparently weren’t used in the way described above. 187 See supra Part II.A.1. 183 Draft—Please do not circulate 34 VOLOKH they bear that are paid for by other departments, their different tax treatment, and the like.188 So it’s not surprising that such a regime is rare in the United States.189 One of the advantages of competitive neutrality is that—as in Ohio—prisons can be both outsourced and insourced at different times, depending on who wins the contract, so particular prisons can “churn” between the public and private sectors. The result, according to Richard Harding, would be a “process of positive crossfertilization,”190 where best practices migrate from one sector to another. “The opening up of the private sector,” Harding writes, “may heighten awareness of how sloppy public accountability has often been in the past, leading to the creation of innovative mechanisms applicable to both the private and the public sectors.”191 In fact, Harding argues, systemic improvement has been one of the best consequences of privatization,192 so narrowly focusing on which sector is better in a static sense is almost beside the point.193 3. To Express What We Want Measuring performance would do more than just let us know which sector is better and promote cross-fertilization by facilitating a competitive neutrality regime. On an even higher level, it would encourage governments to better conceptualize what makes for a good prison—an exercise that’s long overdue.194 Jon Vagg, for instance, argues that, in the UK, private prisons “were a key factor in persuading the administration that standards were necessary, if only for the purpose of monitoring contractual 188 See EGGERS, supra note 179, at 1, 8–11. Thomas, supra note 134, at 81; id. at 86 (“I am aware of no example in the United States that reveals fair competition between public and private providers of correctional services. Until both of these policy failures are corrected, achieving many of the potential benefits of privatization will be impossible.”); Harding, supra note 141, at 334 (also rare in Australia and UK). 190 HARDING, supra note 70, at 115; see also id. at 162; Developments, supra note 9, at 1890– 91; Gilroy, supra note 159. 191 HARDING, supra note 70, at 22–23. 192 Richard Harding, Private Prisons, in 28 CRIME AND JUSTICE: A REVIEW OF RESEARCH 265, 272–73, 331–36 (2001). 193 There remains the fear that, instead of system-wide improvement through cross-fertilization, we’ll get a race to the bottom, as Gaes worries. See text accompanying supra note 142. But good performance measures help avoid that problem. 194 See DICKER, supra note 119, at 6 (“[P]ayment-by-outcome . . . compels commissioners to state explicitly the goals of policy.”). 189 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 35 compliance.”195 And that example isn’t just a fluke. Prisons have been operating for centuries,196 and yet it was the experience of privatization that spurred the development of performance measures, as private-prison critics made arguments that privatization harmed quality and private-prison advocates made arguments to the contrary.197 Now that performance measures exist, one can use them to evaluate both the private and the public sectors, to the benefit of both. C. For Performance-Based Contracting With performance measures, we can go further than just knowing how good public and private prisons are, implementing competitive neutrality, and formulating the proper goals of the prison system—important as all that is. We can also incorporate the performance measures into contracts and make compensation contingent on performance, finally giving prison providers strong incentives to deliver high quality. 1. Limited Current Efforts Performance-based compensation is being implemented in the United States to a very limited extent. As noted above,198 5% of the contract price at the Bureau of Prisons’ Taft facility was performance-based. Taft was a demonstration project, which should give one a sense of how new this enterprise is.199 The UK is now on the forefront of performance-based compensation, which it calls “[p]ayment-by-outcome” or “payment-byresults.”200 The idea was floated in a 2008 Conservative Party 195 VAGG, supra note 78, at 307. See, e.g., RALPH B. PUGH, IMPRISONMENT IN MEDIEVAL ENGLAND (1968); G. GELTNER, THE MEDIEVAL PRISON: A SOCIAL HISTORY (2008); Edward M. Peters, Prison Before the Prison: The Ancient and Medieval Worlds, in THE OXFORD HISTORY OF THE PRISON: THE PRACTICE OF PUNISHMENT IN WESTERN SOCIETY 3 (Norval Morris & David J. Rothman eds., 1995). 197 HARDING, supra note 70, at 22; GAES ET AL., supra note 7, at xi, 153, 180; cf. NISKANEN, supra note 181, at 217 (“[T]he case for the private supply of some public services is . . . to provide a yardstick to evaluate the performance of budget-maximizing monopoly bureaus.”). 198 See text accompanying supra notes 167–168. 199 Also, in Kansas, SB14 rewards community corrections agencies for reductions in recidivism beyond a set target. See CONSERVATIVE PARTY, PRISONS WITH A PURPOSE: OUR SENTENCING AND REHABILITATION REVOLUTION TO BREAK THE CYCLE OF CRIME 74 (Security Agenda, Policy Green Paper No. 4, 2008). 200 DICKER, supra note 119, at 6. 196 Draft—Please do not circulate 36 VOLOKH Green Paper201 and, once the Conservative Party came into power, developed in a 2010 Green Paper from the Ministry of Justice.202 Payment-by-results is being introduced in three prisons: two private prisons, Peterborough203 and Doncaster,204 and a public prison, Leeds,205 though the plan is to extend the model to all prisons by 2015.206 The measure is the 12-month reconviction rate,207 compared to a matched comparison group. At Peterborough, performance-based “[p]ayments start when the reconviction rate of the intervention group is 7.5% less than that of the matched comparison group, with increasing returns up to a maximum rate of 13%.”208 “The Peterborough pilot is the first in the world where private investors have assumed financial risk for reducing reoffending.”209 In addition to having access to a range of prison programs to prevent recidivism, offenders at Doncaster are assigned case managers to support them during their sentence and after release, offering advice and help on employment, housing, and benefits issues.210 (Earlier experience with payment-by-results was “primarily limited to the welfare to work market[,] where success [was] varied and limited.”211) A parallel program focused on finding jobs for offenders, called Job Deal, compensates providers based on employment rates.212 Compensation is 70% fixed and 30% conditional; a third of the conditional payment is for an output measure, “successfully 201 CONSERVATIVE PARTY, supra note 199, at 49, 72–75. UK MIN. OF JUST., BREAKING THE CYCLE: EFFECTIVE PUNISHMENT, REHABILITATION AND SENTENCING OF OFFENDERS (2010). 203 Id. at 13. 204 Wesley Johnson, Payment-by-Results Project Bid to Cut Reoffending, INDEP., Oct. 11, 2011, available at http://www.independent.co.uk/news/uk/crime/paymentbyresults-project-bid-to-cutreoffending-2368793.html; UK Min. of Just., Innovative Rehabilitation—Payment by Results at Doncaster Prison, Oct. 13, 2011, available at https://www.gov.uk/government/news/innovativerehabilitation-payment-by-results-at-doncaster-prison. 205 Joe Inwood, State-Run Leeds Prison to Be Paid on Results, BBC NEWS, Oct. 27, 2011, available at http://www.bbc.co.uk/news/uk-england-leeds-15479570. Leeds Prison is also called Armley. Id. 206 Id. 207 DICKER, supra note 119, at 13 & 30 n.29. 208 Id. at 13. At Doncaster, payments start when the reduction is 5%. UK Min. of Just., supra note 204. 209 DICKER, supra note 119, at 13. 210 There’s also a 24-hour help line. Johnson, supra note 204; UK Min. of Just., supra note 204. 211 CHRIS NICHOLSON, REHABILITATION WORKS: ENSURING PAYMENT BY RESULTS CUTS REOFFENDING 5 (CentreForum, 2011); see also id. at 21–24 (discussing the experience with payment by results in the welfare to work context, characterizing the “Pathways to Work” program as unsuccessful and the “Enterprise Zones” program as reasonably successful). 212 DICKER, supra note 119, at 13. 202 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 37 enrolling offenders” in the program; another third is for “a combination of outputs and processes” such as “helping clients open bank accounts”; and another third is “for achieving ‘hard outcomes.’”213 Note, though, that even these “hard outcomes” are softer than they might seem, because they include finding a job but also include “enrolling in further learning.”214 Some additional payment-by-results programs have also been proposed by the government or by the Social Market Foundation, focusing either on reoffending rates or on other outcomes or outputs like “drug use cessation or employment.”215 2. The Range of Possible Contracts a. General Considerations These examples suggest how performance-based contracts could be structured. The contract could provide that the contract price is not just the usual flat per-diem per prisoner,216 but an incentive payment that—as a simple example—could vary (positively) with how many inmates find jobs or (negatively) with how many inmates are rearrested within two years.217 Outcome measurements may not always be available for all dimensions of quality, so some measurement of inputs may continue to be necessary.218 But as far as possible, the ideal should be to make compensation contingent not on inputs like guard training, or 213 Id. at 14. Id. Id. 216 Dolovich, supra note 5, at 474; see also Tennessee CCA 2007 contract, supra note 66, ¶ C.3, at 22 (laying out schedule of per-diems). 217 Kenneth L. Avio, On Private Prisons: An Economic Analysis of the Model Contract and Model Statute for Private Incarceration, 17 NEW ENG. J. ON CRIM. & CIV. CONFIN. 265, 294–95 (1991); Daniel L. Low, Nonprofit Private Prisons: The Next Generation of Prison Management, 29 NEW ENG. J. ON CRIM. & CIV. CONFINEMENT 1, 46 (2003); Gaes, supra note 36, at 23 (citing GAES ET AL., supra note 7); Kyle, supra note 158, at 2111–12. 218 Durham suggests that “process-oriented monitoring methods” continue to be used: “For instance, a system of frequent accounting of staffing levels can detect shortfalls in staffing that may lead to a diminution in service provision. . . . If the change in staffing levels is detected relatively quickly, efforts can be made to either restore institutional staff to initial levels or to alter the evaluation design.”). Durham, supra note 20, at 66; see also Sidney A. Shapiro & Rena Steinzor, Capture, Accountability, and Regulatory Metrics, 86 TEX. L. REV. 1741, 1775, 1779 (2008); DICKER, supra note 119, at 16 (suggesting intermediate outcomes such as drug misuse, stability of relationships, or becoming debt-free). Cf. Shapiro & Steinzor, supra, at 1768 (in context of EPA and GPRA); GEN. ACCOUNTING OFFICE, PERFORMANCE-BASED ORGANIZATIONS: LESSONS FROM THE BRITISH NEXT STEPS INITIATIVE 7 (1997) (in context of British Next Steps agencies). 214 215 Draft—Please do not circulate 38 VOLOKH even on outputs like the number of GEDs granted or the number of rehabilitative programs offered or ACA accreditation,219 but primarily on actual outcomes like the extent of unconstitutional conditions or how well prisoners are actually rehabilitated or how many prisoners get jobs.220 The amount of the bonus can be a flat fee, or it could be more complicated—in the case of recidivism bonuses, the bonus could be inmate-specific, depending on “the probability and social cost of recidivism for each inmate”—or it could even be determined by competitive bidding.221 It’s often charged that private prisons have little incentive to invest in rehabilitation, and in fact have an incentive to try to increase recidivism, so that they can get (at least some of) the same inmates back later; if this is so, the bonuses should be at least high enough to counteract this incentive, so rehabilitating inmates is affirmatively attractive to prison firms.222 Though I focus here on monetary rewards and penalties, there are other possibilities. High performance could, instead of increasing a firm’s compensation in the individual contract, merely confer a reputational benefit, increasing its probability of winning future bids.223 One could give out certificates224 or “even simply publiciz[e] league tables of recidivism performance.”225 Or one could reward good performers by giving them more flexibility in future contracts.226 219 See MCDONALD ET AL., supra note 125, at 49 (“Correctional administrators . . . reported that 57 of the contracts in force at the end of 1997 required that facilities achieve ACA accreditation within a specified time.”). 220 Kyle, supra note 158, at 2112–13. 221 Low, supra note 217, at 46; see also infra Part IV.C.3. 222 Pozen, supra note 77, at 283–84; Avio, supra note 164, at 150; James T. Gentry, Note, The Panopticon Revisited: The Problem of Monitoring Private Prisons, 96 YALE L.J. 353, 362–63 (1986). 223 DICKER, supra note 119, at 25; CONSERVATIVE PARTY, supra note 199, at 73–74 (describing Avon Park Youth Academy in Florida as “a prison rewarded by results,” even though its only reward was having its contract renewed, “a decision clearly influenced” by its lower recidivism results). 224 Burt S. Barnow, The Effects of Performance Standards on State and Local Programs, in EVALUATING WELFARE AND TRAINING PROGRAMS 277, 286 (Charles F. Manski & Irwin Garfinkel eds., 1992). 225 Pozen, supra note 77, at 283. 226 Barnow, supra note 224, at 286. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 39 b. Rewards or Penalties Going back to monetary incentives, one can choose between penalties for bad performance and rewards for good performance,227 though the difference needn’t be that important. Consider a “rewards” contract that offers a $1 per diem reward for each unit of quality on a hypothetical zero-to-ten scale, so the potential reward is $0 to $10. Suppose Acme Corrections Corp. expects to achieve a quality level of 5 at a total cost of $35 per diem.228 Then it would be willing to submit a bid of $30 or above for the project; it would just cover its costs with the $30 payment plus the $5 reward. (Recall that prison bids are bids on how much money the contractor will get from the government; a $30 per diem winning bid means that the contractor will be paid $30 per inmateday.) Suppose bidding is competitive, other firms have similar technology, and Acme is the most efficient firm; then Acme wins the auction with its $30 bid.229 (A less efficient firm, say one that would require $36 per diem to achieve quality level 5, wouldn’t bid below $31, so Acme, as a more efficient firm, would be automatically rewarded up front for its higher quality by having a better chance of winning the auction.230 The bids don’t tell us the true social cost, the true cost to the government, or the true quality— that requires waiting for the actual realized level of quality, which determines the level of the reward—but they do signal which firm is (or believes that it is) more efficient.231) Now consider an alternative “penalties” contract that offers a $1 penalty for each unit of quality below 10 (i.e., 7 units of quality lead to a $3 penalty). This contract has equivalent incentive effects to the previous one: a provider will invest in a unit of quality as long as its cost of doing so is under $1.232 Therefore, these incentives, as before, make Acme expect to achieve the same quality level of 5, which we have seen carries a total cost of $35 per diem. 227 Thomas, supra note 134, at 108–09. This is taking into account the incentive effects of the $1-per-unit reward. Perhaps earlier, with fixed-price contracts, Acme only achieved, say, a quality level of 3 at a total cost of $32. 229 I discuss auction-theoretic considerations like the winner’s curse at text accompanying infra note 249. 230 See Gentry, supra note 222, at 363. 231 See also text accompanying infra notes 249–250. 232 Here, I’m abstracting away from behavioral factors that might make rewards more attractive than punishments. See BEHAVIORAL LAW AND ECONOMICS (Cass R. Sunstein ed., 2000). 228 Draft—Please do not circulate 40 VOLOKH Now Acme is willing to submit a bid of $40 or above for the project; it would just cover its cost with the $40 payment minus the $5 penalty. Again, with the competitive bidding assumptions listed above, Acme wins the auction with its $40 bid. So even though the contracts look different, they have essentially identical incentives, and any superficial differences between them are, roughly speaking, ironed out in the bidding process. The provider’s degree of risk aversion doesn’t change the result. The government can offer contracts with penalties, but then it will pay more to the winning bidder; or it can offer contracts with rewards, and the winning bidder will be satisfied with less. (One difference might be in the timing of the payments: if the base price is paid up front while rewards or penalties are processed some time later, the first contract is somewhat less valuable than the second because its payments are more delayed.233) c. Controlling for Baselines In the same way, it probably doesn’t make a huge difference whether the compensation takes into account the baseline level of quality. Controlling for baselines is a huge issue in the literature on performance measures.234 For instance, an early paper on performance measures, by Gloria Grizzle and coauthors,235 discussed methodological issues regarding what makes for a good performance measure. A large part of the discussion focused on doing the proper econometric modeling to figure out the causal factors behind a performance measure. Figuring out these causal factors is important at least for two reasons (beyond merely understanding the process). One is to have a sense of what input or output measures to use if 233 See text accompanying infra note 384. See, e.g., Kyle, supra note 158, at 2112 (controlling for “age, prior criminal history, and sex”); id. at 2113 & n.136 (controlling for crime rates); DICKER, supra note 119, at 20 & fig.2 (use “performance of control groups” or a whole range of control methods); Barnow, supra note 224, at 281 (“[P]erformance management systems [could] measure outcomes relative to [a] standard [that is] set to take into account what would have occurred in the absence of the program . . . .”); GAES ET AL., supra note 7, at 159 (citing C.J. Heinrich, Outcomes-Based Performance Management in the Public Sector: Implications for Government Accountability and Effectiveness, 62 PUB. ADMIN. REV. 712 (2002) (questioning “whether outcome measures in the absence of a control or comparison group can provide meaningful information” in context of Job Training Partnership Act). 235 GLORIA A. GRIZZLE ET AL., BASIC ISSUES IN CORRECTIONS PERFORMANCE 4 (Nat’l Inst. of Just., 1982). 234 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 41 the outcome measures aren’t available in a given case.236 Another is to be able to properly assign credit, so providers who get a bad (or good) population of inmates aren’t blamed (or praised) for bad (or good) results.237 Similarly, Gerald Gaes and his coauthors argue that “social scientists should push ultimate outcomes as far as they can be pushed,”238 but that, in light of the other factors that affect recidivism, “[i]t is also desirable to have more direct measures of intermediate changes to human behavior that precede desistance, and that may be influenced by criminal justice interventions.”239 They don’t directly list desirable performance measures—they give an example of performance measures for the specific element of “Prison Security Performance,”240 though they stress that one should do a similar exercise for other elements of prison performance such as health care.241 The main characteristic of their approach is its emphasis on adequately modeling prison performance in terms of individual-level and institutional-level independent variables, so that one can properly attribute credit where credit is due, avoid blaming prisons for factors beyond their control like the characteristics of the inmates, and figure out what inputs are actually important in producing prison performance.242 For instance, for health care, rather than measure (or in addition to measuring) the prevalence of a disease in the prison, which indicates the potential for transmission, it would be useful to use the number of cases in the incoming population as a baseline, and measure the number of new cases.243 Is all this necessary? Let’s do our numerical example again: Consider the rewards contract discussed above, with a $1 per diem reward for every unit of quality on a zero-to-ten scale;244 the winning bidder, who expected to deliver quality level 5 at a cost of $35, would have won the contract with a bid of $30. Now consider 236 See infra Part III.D. GRIZZLE ET AL., supra note 235, at 91. 238 GAES ET AL., supra note 7, at 7. 239 Id. 240 Id. at 142 tbl.10.1. 241 Id. at 141. 242 Id. at 144 (discussing differences with Logan model); see also id. at 4 (suggesting “develop[ing] an expected rate of crime for a community or an expected rate of misconduct for a prison based on characteristics of the people and inmates”). 243 Id. at 38. 244 See text accompanying supra notes 228–229. 237 Draft—Please do not circulate 42 VOLOKH a rewards contract that controls for the baseline level of quality; suppose the expected level of quality for this prison is 4, so a quality level of 5 would yield a reward of $1. The only effect of the quality adjustment is to reduce reward payments by $4. A bidder who was willing to bid $30 on the unadjusted contract would be willing to bid $34 on the adjusted contract, to take into account the $4 reduction in the expected reward. Either way, the payoff is the same to the contractor—and the price is the same to the government. The government saves $4 on reward payments but pays it all out again in the base contract price that emerges from the auction. Jeremy Bentham argued against controlling for baselines two centuries ago: I would make [the contractor] pay so much for every one that died, without troubling myself whether any care of his could have kept the man alive. To be sure he would make me pay for this in the contract; but as I should receive it from him afterwards, what it cost me in the long run would be no great matter. . . . . . . [Under this system,] you need not doubt of his fondness of these his adopted children; of whom whosoever may chance while under his wing to depart this vale of tears, will be sure to leave one sincere mourner at least . . . .245 To be sure, the bidder has to have a way to figure out that the expected level of quality is 4. This requires two things. First, the bidder should have a belief about the proper model to predict the baseline quality level; different bidders can have competing beliefs about reality that lead them to different predictions. Second, it needs to know have enough information about the population of inmates to be able to plug into its model. Where either of these is absent, the contractor won’t know how much to bid—this might lead to excessive payments from the taxpayer’s point of view or insufficient payments from the contractor’s point of view—but the incentive effects will remain the same. So while adjusting for the baseline is relevant for various reasons—it allows one to more accurately assign praise or blame, rank different facilities,246 and so on—it doesn’t seem absolutely 245 Gentry, supra note 222, at 362 n.52 (quoting 1 J. BENTHAM, PANOPTICON, at 71–73 (Dublin 1791)). 246 See GAES ET AL., supra note 7, at 144 (concern with rank-ordering institutions). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 43 necessary for a compensation scheme to provide the proper incentives for improvement. Moreover, risk aversion makes a difference here,247 but not in the way one would expect. Controlling for baselines might even increase risk, depending on the uncertainty in the calculation of the baseline.248 If the contractor gets too little, there is the concern that it might not be able to fund the project and might go bankrupt within the contractual term. But this is the same concern that happens with all bidding. Whether or not we adjust the payment for the baseline, the winning bid under a low-bid system will be subject to the “winner’s curse.”249 As a simple example, consider many firms with identical technology. They each have slightly different models for predicting how profitable a prison will be, and firms with higher predictions will submit lower bids. At most one of these models is correct; everyone else’s model is incorrect to some degree. The lowest bid will thus come from the bidder who makes the most wildly incorrect overestimate of his profits. Sophisticated bidders adjust their bids to take the winner’s curse into account, but the winning bidder might either be unsophisticated or end up not having adjusted his bid enough. So the threat of contractors who go bankrupt—or of contractors who bid low and then try and hold the government up for more money250—is real. But, again, this hap247 Recall that it didn’t in the reasoning establishing the equivalence of reward and penalty contracts, see supra Part III.C.2.b. 248 Without controlling for baselines, the winning contractor gets a contract price of P and a performance-based reward R, bears costs of C, and and his payoff is P + R – C; the variance of the payoff is var(R) + var(C) if R and C are independent. Now let’s control for baselines; for simplicity, assume this just involves subtracting an adjustment A from the reward, where A is determined by the expected baseline level of performance. The contract price becomes P', and the contractor’s new payoff is P' + R – A – C. If A has no randomness—everyone knows how the government’s formula and everyone knows the underlying data that the government is plugging into the formula—then var(A) = 0 and the variance of the new payoff is the same var(R) + var(C). But if the data or the formula is somewhat uncertain, var(A) is positive, so the variance of the new payoff is var(R) + var(A) + var(C) if R, A, and C are independent, which is greater. This doesn’t necessarily have to happen. Suppose, for instance, that R, A, and C aren’t independent, but instead there’s some negative covariance among R, A, and C. Then the randomness of A might cancel out some of the randomness of R and C, and the adjustment can indeed reduce risk. The point in the text, though, is that this needn’t be the case, and the adjustment, though often defended as a risk-reducing move for contractors, could end up doing the opposite. 249 See, e.g., PATRICK BOLTON & MATHIAS DEWATRIPONT, CONTRACT THEORY 283–85 (2005). 250 See Robert W. Poole Jr., Privatization, CONCISE ENCYCLOPEDIA OF ECONOMICS (2007), http://www.econlib.org/library/Enc/Privatization.html; Mary Sigler, Private Prisons, Public Functions, and the Meaning of Punishment, 38 FLA. ST. U. L. REV. 149, 155 (2010); Jody Freeman, The Private Role in Public Governance, 75 NYU L. REV. 543, 574 (2000). See generally OLIVER HART, (continued next page) Draft—Please do not circulate 44 VOLOKH pens regardless of whether we adjust for baselines. The solution is instead to require bonds, to rely on a track record of past performance (and restrict complete newcomers to small projects until they’ve proven themselves), or otherwise to try to weed out financially unsophisticated or untrustworthy parties. d. Discrete vs. Continuous Measures Note that, in the preceding example, the contract price varied continuously with the level of quality.251 Another possibility would have been to use a binary compensation scheme, where the reward or penalty is contingent on whether one reaches a particular target. This could look like “Get a fixed reward only if you achieve less than 50% recidivism.”252 These binary schemes, while easier to implement, are problematic in several ways. Providers who don’t expect to be able to reach anywhere near the target have little incentive to try to achieve anything at all.253 Providers who do expect to be able to reach the target quite comfortably have little incentive to try to achieve anything additional.254 Providers who may or may not be able to reach the target are subjected to more risk than they would bear under a continuous scheme.255 Perhaps a large corporation might act somewhat risk-neutrally, so risk won’t matter; but smaller firms or nonprofits may refrain from bidding, or may require more money FIRMS, CONTRACTS AND FINANCIAL STRUCTURE (1995) (discussing opportunistic behavior in contract relationships). 251 Well, the example as worded involved discrete jumps, but one can easily imagine the prorated version. The “continuous” scheme is also called a “distance travelled” scheme. DICKER, supra note 119, at 16; see text accompanying infra notes 411–414, 425. 252 See also HARDING, supra note 70, at 68 (“x per cent of participants [in a remedial literacy class] reaching attainment level y in z months”). 253 DICKER, supra note 119, at 19 (a continuous measure “may incentivise providers to engage with high-risk offenders who are unlikely to achieve absolute desistance”); HARDING, supra note 70, at 68. 254 On the other hand, incentives are very large for those who could be just under the cutoff but could also reach the cutoff; but even then, unless the cutoff is a magical point, it’s probably more socially optimal to provide continuous incentives. 255 Kyle also notes the following advantage of a sliding scale: it “would reduce the likelihood that private companies would receive an undeserved windfall—the farther in standard deviations from the mean the private prison is, the more likely a causal relationship that should be rewarded exists.” Kyle, supra note 158, at 2112. More accurately, this depends on the likely effect of rehabilitative measures versus the likely magnitude of unobserved factors: it could be that a truly exceptional performance in fact reflects an unusually (and unobservedly) good or rehabilitable crop of inmates. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 45 to take the project, or may be reluctant to try high-expected-value but risky strategies.256 (Of course, one could also imagine intermediate reward schemes: for example, the reward could be almost flat for any level of recidivism above 50% and increase rapidly at or below 50%, for instance “Get a reward of $0.01 for every percentage-point reduction of recidivism below 100% and down to 50%, and then a reward of $1.00 for every percentage-point reduction beyond 50%.”257 British performance contracts, where payments don’t start until the decrease in recidivism is 5% or 7.5%, and where payments are capped once the decrease is high enough, fit this mold.258 At this point I won’t do anything more than signal the existence of such contracts, though the optimal slope of the compensation scheme is something I’ll return to below when I discuss risk allocation.259) The same is true of penalties that may occur during the contractual term. Governments can terminate their contracts260—this is a form of binary scheme—though this is a rare remedy that tends to be reserved for the most extreme abuses.261 Providing for graduated financial penalties for abuses of different severity is probably a better solution than merely providing for contract rescission, because draconian penalties are less likely to be used. Not that termination isn’t appropriate in extreme cases—governments should always retain the ability to take over a prison if a contract is terminated;262 the need to retain a credible threat of termination is one reason to prefer that governments, not prison firms, own the prisons.263 256 See also infra Part IV.B.2. Some also mention the possibility that the public could see the cotinuous measure as being “too lenient.” DICKER, supra note 119, at 20. 257 DICKER, supra note 119, at 24 (“minimum threshold of achievement that providers must attain before payments commence.”); id. at 25 (discussing a “target accelerator,” where increases are rewarded at an increasing rate). 258 See supra note 208 and accompanying text. 259 See infra Part IV.B.2. 260 See Tennessee CCA 2007 contract, supra note 66, ¶ D.3, at 24 (“The State may terminate this Contract without cause for any reason.”); id. ¶ D.4, at 24 (“If the Contractor fails to properly perform its obligations under this Contract in a timely or proper manner, or if the Contractor violates any terms of this Contract, the State shall have the right to immediately terminate the Contract and withhold payments in excess of fair compensation for completed services.”). 261 See Developments, supra note 9, at 1883–84; Dolovich, supra note 5, at 495–500. 262 See text accompanying supra note 177. 263 See Levinson, supra note 24, at 90. Draft—Please do not circulate 46 VOLOKH 3. The Feasibility of Merit Pay in the Public Sector Note, also, that while I’ve been primarily concentrating on incentives for private firms, there’s no inherent reason why performance-based compensation can’t also be considered for public prison wardens264—consider the example of Leeds noted above265—especially if we simultaneously pursue competitive neutrality.266 As John Donahue says, “the fundamental distinction is between competitive output-based relationships and noncompetitive input-based relationships rather than between profit-seekers and civil servants per se.”267 Proposals to reward public servants for high performance aren’t rare,268 and merit-based compensation in the public sector has increased in recent years,269 but it’s still hard to find in corrections.270 Researchers differ on how feasible merit pay is in the public sector;271 I won’t resolve the argument here, except to note that the Government Performance and Results Act of 1993 has a procedure by which agencies can make proposals “to waive administrative procedural requirements and controls, including specification of personnel staffing levels, limitations on compensation or remuneration, and prohibitions or restrictions on funding transfers . . . in return for specific individual or organization accountability to achieve a performance goal.”272 Any such proposal, according to the statute, must “describe the anticipated effects on performance 264 Rick Hills, Merit Pay for Prison Wardens?, PRAWFSBLAWG, Mar. 3, 2008, http:// prawfsblawg.blogs.com/prawfsblawg/2008/03/tying-the-salar.html. 265 See text accompanying supra notes 205. 266 See text accompanying supra notes 179–192. 267 JOHN D. DONAHUE, THE PRIVATIZATION DECISION: PUBLIC ENDS, PRIVATE MEANS 82 (1989) (italics omitted). 268 See NISKANEN, supra note 181, at 201–09; Lynn, supra note 161, at 11; Barnow, supra note 224, at 307–08; cf. also David N. Figlio & Lawrence W. Kenny, Individual Teacher Incentives and Student Performance, 91 J. PUB. ECON. 901 (2007) (examining effects of teacher merit pay). 269 See Jon D. Michaels, Privatization’s Progeny, 101 GEO. L.J. 1023, 1048–49 & nn.124–25 (2013). 270 Thomas, supra note 134, at 109. 271 Compare Harding, supra note 141, at 304 (“The financial incentive should drive performance in a way that is impossible in the state-funded public sector.”), and MCDONALD & PATTEN, supra note 151, at xxvii (“When structuring contracts, [governments] have opportunities to create incentives and mechanisms for accountability that are more difficult to implement in existing public organizations.”), with GAES ET AL., supra note 7, at 151 (““There is certainly no reason why public administrators cannot award bonuses to the best performing public prison managers and their employees, while also demoting, firing, or transferring the managers who are substandard.”), and id. at 180. 272 31 U.S.C. § 9703(a). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 47 resulting from greater managerial or organizational flexibility, discretion, and authority, and . . . quantify the expected improvements in performance resulting from any waiver,”273 “precisely express the monetary change in compensation or remuneration amounts, such as bonuses or awards, that shall result from meeting, exceeding, or failing to meet performance goals,”274 and be “endorsed by the agency that established the requirement.”275 Just reading the statutory language—and this is a statute that purports to encourage flexibility—doesn’t exactly give one confidence that public-sector flexibility is easy to come by, at least in the federal system. At the very least, though, to the extent performance-based compensation is a good idea in the private sector, it may well also be a good idea in the public sector.276 How feasible that is is a question of the relevant state or federal law. D. What Measures to Choose The earlier discussion of how to define recidivism277 shows that a lot rides on choosing the outcome measures judiciously. This applies across the board, not just to recidivism. This section considers two distinct aspects of performance measures. The first is that wherever outcome measures have been used, output measures haven’t been abandoned. The second is that what outcomes to measure—and even whether something counts as an output or outcome measure—is inevitably a value-laden question, which must be resolved for a performance-based compensation scheme to go forward. The inevitable incompleteness of outcome measures— and therefore the need to supplement outcomes with outputs—can give rise to undesirable strategic behavior, which I discuss in a later section.278 273 Id. § 9703(b). Id. § 9703(c). 275 Id. § 9703(d). 276 Some of the disadvantages of performance-based compensation may apply with different force in the public than in the private sector. For instance, the concern that market incentives will discourage public-interested people from entering the industry, see infra Part IV.B.1, seems to not apply at all to private providers, who are presumably already profit-motivated. 277 See text accompanying supra notes 116–122. 278 See infra Part IV.C.2. This section only covers what measures should rationally be chosen, not the real-world possibilities for manipulation in the choice of goals. That sort of strategic behavior is covered infra Part IV.C.1. 274 Draft—Please do not circulate 48 VOLOKH Adopting specific outcomes to measure is equivalent to adopting what John DiIulio calls an “operational” goal—“an image of a desired future state of affairs that can be compared unambiguously to an actual or existing state of affairs.”279 “‘Improving the quality of public education in America’ is a nonoperational goal; ‘Increasing the average verbal and math SAT scores of public school students by 20% between the year 1992 and the year 2000’ is an operational goal.”280 Similarly, “[r]eforming criminals” is nonoperational, while “[d]oubling the rate of inmate participation in prison industry programs” is operational.281 That last goal was outputbased, but there’s no reason we can’t, as in the education example, adopt an outcome-based goal—we could just agree on a convenient if arbitrary measure of how well criminals are reformed, such as the two-year reconviction rate. Moreover, there’s no reason to adopt a numerical target as the goal (which would be binary); the goal might merely be (thinking more continuously) to reduce the rate as far as possible.282 And there’s no reason to adopt a unique goal: multiple operational goals can be implemented in one part of an overall index that determines compensation.283 A useful way to explore this question is to examine some existing prison performance measures. Perhaps one of the oldest formal approaches284 to measuring prison performance is the Correctional Institutions Environment Scale285 developed by Rudolph Moos in the late 1960s286 and often 279 John J. DiIulio, Jr., Measuring Performance When There Is No Bottom Line, in PERFOR142, 144 (John J. DiIulio et al. eds., Bur. of Just. Stats. 1993). 280 Id. 281 Id. 282 See text accompanying supra notes 251–256. 283 Of course, one should also set the weights to be put on the various measures in the index. See infra Part IV.A. Cf. also Barnow, supra note 224, at 284 (“Even if the program has a single objective, it may be advantageous to use several measures as proxies if an ideal measure cannot be developed.”); GRIZZLE ET AL., supra note 235, at 80. 284 A survey article in 1975 reviewed 231 studies of particular performance measures, but at that time, in the authors’ opinions, there had apparently never been any comprehensive approach. (Presumably the Moos approach, if it was considered, was thought to be insufficiently comprehensive or not performance-oriented.) The American Correctional Association had published comprehensive standards in the late 1970s, but they were primarily process-oriented. GRIZZLE ET AL., supra note 235, at 4 (citing DOUGLAS LIPTON ET AL., THE EFFECTIVENESS OF CORRECTIONAL TREATMENT: A SURVEY OF TREATMENT EVALUATION STUDIES (1975); AM. CORR. ASS’N, MANUAL OF STANDARDS FOR ADULT CORRECTIONAL INSTITUTIONS (1977)). 285 Michael Montgomery, Performance Measures and Private Prisons, in 3 PRISON PRIVATIZATION, supra note 138, at 187, 193. MANCE MEASURES FOR THE CRIMINAL JUSTICE SYSTEM Draft—Please do not circulate 2013] PERFORMANCE MEASURES 49 used in the 1970s.287 The Moos scale contains several subscales: “Involvement,” “Support,” “Expressiveness,” “Autonomy,” “Practical Orientation,” “Personal Problem Orientation,” “Order and Organization,” “Clarity,” and “Staff Control.”288 These elements generally aren’t true performance measures, and it’s immediately apparent from their definitions that some are highly impressionistic. The “Involvement” variable measures “how active and energetic residents are . . .”; the “Support” variable measures “the extent to which residents are encouraged to be helpful and supportive . . .”; and so on, with an emphasis on measuring the extent of supportiveness and encouragement.289 The scale was criticized because it wasn’t clear what the difference between some of the elements was and to what extent they were correlated,290 and even to what extent they described a real phenomenon.291 Some critics wrote that “when the CIES is administered and the individual scores are tallied and averaged, we really have no idea what the scores on the nine subscales indicate.”292 Ultimately, the scale was “determined not to possess acceptable validity.”293 A later approach, described in 1980 by in a report by Martha Burt, uses five types of measures: “Measures of Security” including the escape rate and escape seriousness, “Measures of Living and Safety Conditions” such as victimization, overcrowding, and sanitation, “Measures of Inmate Health” (both physical and mental), “Intermediate Products of Programs and Services” like improvements in basic skills and vocational education completed, and “Measures of Post-Release Success” including employment success and recidivism.294 Only the fourth category is explicitly labeled “Intermediate Products,”295 but some of the other measures 286 Kevin N. Wright & James Boudouris, An Assessment of the Moos Correctional Institutions Environment Scale, 19 J. RES. IN CRIME & DELINQ. 255, 255 (1982). 287 Id. (citing sources using the Moos scale in the 1970s). 288 Id. at 257 (quoting RUDOLF H. MOOS, EVALUATING CORRECTIONAL AND COMMUNITY SETTINGS 41 (1975)). 289 Id. 290 Id. at 256; Elaine Selo, Book Review, 4 J. CRIM. JUST. 348, 349 (1976) (reviewing MOOS, supra note 288). 291 Wright & Boudouris, supra note 286, at 258. 292 Id. at 274. 293 Montgomery, supra note 285, at 193. 294 MARTHA R. BURT, MEASURING PRISON RESULTS: WAYS TO MONITOR AND EVALUATE CORRECTIONS PERFORMANCE ii (Final Report, Nat’l Inst. of Just., 1980). 295 Id. at 97–105. Draft—Please do not circulate 50 VOLOKH are also outputs, not outcomes—see, for instance, the use of hospitalizations and sick days in the measures of inmate health.296 The mixing of output and outcome measures is fairly typical; John DiIulio criticizes BOP’s Key Indicators/Strategic Support System297 for also “indiscriminate[ly] mix[ing] . . . process [i.e., input or output] and performance [i.e., outcome] measures.”298 But DiIulio himself has measured prison quality in terms of “order (rates of individual and collective violence and other forms of misconduct), amenity (availability of clean cells, decent food, etc.), and service (availability of work opportunities, educational programs, etc.)”:299 note the output measures in the inclusion of the availability (not the effectiveness) of programming. The MTC Institute, the research arm of the private prison firm Management & Training Corp. (MTC), likewise calls for holding prisons accountable for “outcomes”; but these “outcomes” include not only assaults, escapes, recidivism, overcrowding, and the like, but also outputs like “[s]ubstance abuse education/treatment completions” and “[p]roportion of inmates participating in spiritual development program(s).”300 The American Correctional Association’s performance-based standards for correctional health care301 raise the same issue. Some of these are true outcomes, like “the rate of positive tuberculin skin tests”302 or the suicide rate,303 though others are process measures or expected practices, like whether an offender “is informed about access to health systems and the grievance procedure.”304 And the Prison Social Climate Survey, which is based on inmate and staff surveys, likewise mixes outcomes (such as crowding305 or safe296 Id. at 72. See WILLIAM G. SAYLOR, DEVELOPING A STRATEGIC SUPPORT SYSTEM: MONITORING THE BUREAU’S PERFORMANCE VIA TRENDS IN KEY INDICATORS (1988). 298 DiIulio, supra note 279, at 150–52. 299 John J. DiIulio, Jr., Recovering the Public Management Variable: Lessons from Schools, Prisons, and Armies, 49 PUB. ADM. REV. 127, 129 (1989) (citing JOHN J. DIIULIO, JR., GOVERNING PRISONS: A COMPARATIVE STUDY OF CORRECTIONAL MANAGEMENT (1987)). 300 MTC INST., MEASURING SUCCESS: IMPROVING THE EFFECTIVENESS OF CORRECTIONAL FACILITIES 5 (2006). 301 AM. CORR. ASS’N, PERFORMANCE-BASED STANDARDS FOR CORRECTIONAL HEALTH CARE IN ADULT CORRECTIONAL INSTITUTIONS (2002). These standards are discussed in GAES ET AL., supra note 7, at 37–38. 302 GAES ET AL., supra note 7, at 37. 303 Id. at 38. 304 Id. at 37. 305 Michael W. Ross et al., Measurement of Prison Social Climate: A Comparison of an Inmate Measure in England and the USA, 10 PUNISH. & SOC. 447, 461 (2008). 297 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 51 ty306) with outputs (such as whether the prison is a pleasant place to work for staff307). It’s clear, then, that outcomes and output measures tend to go together; no doubt this is because not all outcomes are well measurable. Moreover, the choice of measures, and even the basic question of whether to classify a measure as an output or an outcome, are inevitably value-laden. We can see this clearly by examining Charles Logan’s “quality of confinement” index, one of the more highly regarded prison performance measures.308 Logan’s performance indicators focus on eight broad categories: 1. “Security (‘keep them in’).” 2. “Safety (‘keep them safe’).” 3. “Order (‘keep them in line’).” 4. “Care (‘keep them healthy’).” 5. “Activity (‘keep them busy’).” 6. “Justice (‘do it fairly’).” 7. “Conditions (‘without undue suffering’).” 8. “Management (‘as efficiently as possible’).”309 Each of these categories contains a number of subdimensions: for instance, the “security” category contains the subdimensions of security procedures, drug use, significant incidents, community exposure, freedom of movement, and staffing adequacy.”310 The “safety” category contains safety of inmates, safety of staff, dangerousness of inmates, safety of environment, and (again) staffing adequacy.”311 And, finally, Logan decomposes these subdimensions into specific numerical measures: number of escapes, proportion of staff 306 Id. at 466. WILLIAM G. SAYLOR ET AL., PRISON SOCIAL CLIMATE SURVEY: RELIABILITY AND VALIDITY ANALYSES OF THE WORK ENVIRONMENT CONSTRUCTS 3–8 (1996); see also text accompanying supra note 85. 308 Charles H. Logan, Criminal Justice Performance Measures for Prisons, in PERFORMANCE MEASURES FOR THE CRIMINAL JUSTICE SYSTEM, supra note 279, at 19. GAES ET AL., supra note 7, at xi (calling Logan’s approach “one serious attempt to develop a coherent theoretical and empirical approach to prison performance measurement”); id. at 5–8 (discussing Logan’s model). Joan Petersilia has also developed performance measures for community corrections. See Joan Petersilia, Measuring the Performance of Community Corrections, in PERFORMANCE MEASURES FOR THE CRIMINAL JUSTICE SYSTEM, supra, at 60. But many of these are input measures (“Number and type of supervision contacts”), output measures (“Number of hours/days performed community service”), or outcome measures that can be easily gamed (“Number of arrests and technical violation[s] during supervision”). Id. at 77–78. 309 Logan, supra note 308, at 27–32. 310 Id. at 34. 311 Id. 307 Draft—Please do not circulate 52 VOLOKH who have observed staff ignoring inmate misconduct, ratio of resident population to security staff, drug-related incidents, and so on.312 In all—over all eight dimensions—there are a few hundred measures.313 Logan used this index to evaluate three women’s prisons in New Mexico and West Virginia.314 None of Logan’s measures involve how many inmates get rehabilitated. But this is also intentional. First, actual rehabilitation is out of the direct control of prisons. Logan has a preference for measuring things that are within prisons’ “direct sphere of influence”;315 what we measure “ought to be achievable and measurable mostly within the prison itself.”316 Second, including rehabilitation endorses the rehabilitative model of criminal punishment, and Logan makes it clear that his model is retributive, not rehabilitative.317 Prisons, in his view, shouldn’t “add to (any more than . . . avoid or . . . compensate for) the pain and suffering inherent in being forcibly separated from civil society[;] . . . coercive confinement carries with it an obligation to meet the basic needs of prisons at a reasonable standard of decency.”318 Logan’s concern for focusing on what a prison can control and focusing on the rehabilitative goal merge in the following statement: “a prison does not have to justify itself as a tool of rehabilitation or crime control or any other instrumental purpose at which an army of critics will forever claim it to be a failure.”319 (Of course “[i]t would be very nice if the prison programs [counted in the ‘activity’ dimension] had rehabilitative effects,” and perhaps they do, but whether they do or don’t doesn’t enter into the index.320) Fair enough. What this illustrates is that you can’t judge particular measures to be desirable unless you have a normative theory that proclaims certain goals to be desirable, and such a political discussion is necessary before one can commit oneself to a particu- 312 Id. at 42–43. Id. at 43–57. 314 LOGAN, supra note 60, at 7–11, 13, 17; Logan, supra note 60, at 577–78, 583 fig.1. 315 Logan, supra note 308, at 24. 316 Id. 317 Id. at 19, 21, 24. 318 Id. at 25. 319 Id. at 26. 320 Id. at 29 n.7. 313 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 53 lar form of performance measures.321 “[W]ithout declared goals, we cannot hold a jurisdiction accountable, and performance measurement is meaningless.”322 This normative issue arises wherever performance measurements are used. John DiIulio describes how John Chubb and Terry Moe “measure school performance strictly in terms of pupils’ achievements on a battery of standardized tests, accepting the schools’ value as instruments of socialization and civics training as important but secondary.”323 On the relative value of test scores vs. socialization, your mileage may vary. Likewise, for the correctional system, there is a great variety of available goals;324 prisons should punish, rehabilitate, deter, incapacitate, and reintegrate—all, says John DiIulio, “without violating the public conscience (humane treatment), jeopardizing the public law (constitutional rights), emptying the public purse (cost containment), or weakening the tradition of State and local public administration (federalism).”325 So we need to have a political discussion about what the appropriate goals are. One’s normative theory also affects whether a particular measure is an output or an outcome; this classification,326 which I’ve been using casually so far as if it were value-neutral, is in fact anything but. If we didn’t care about inmates but only cared about the outside world, perhaps only recidivism would be relevant. The quality of living conditions or inmate literacy would merely be outputs, which we would care about only to the extent that they affected recidivism; they wouldn’t need to independently enter the compensation function as long as we already counted recidivism. But we might independently care about inmates’ living conditions 321 John DiIulio thus seems incorrect when he states that Logan’s work “dispels the worry that any such measurement scheme is bound to be based exclusively on one or another moral or ideological view of the ‘ends of criminal justice’” and that his measures “encompass and satisfy every major school of thought about ‘what prisons are for.’” DiIulio, supra note 279, at 152. 322 GAES ET AL., supra note 7, at xii. 323 DiIulio, supra note 279, at 129 (citing John E. Chubb, Why the Current Wave of School Reform Will Fail, PUB. INTEREST, Winter 1988, at 28; JOHN E. CHUBB & TERRY M. MOE, POLITICS, MARKETS, AND AMERICA’S SCHOOLS (1990)). 324 GAES ET AL., supra note 7, at 10–16 tbl.1.1; see also text accompanying infra note 283. 325 John J. DiIulio, Jr., Rethinking the Criminal Justice System: Toward a New Paradigm, in PERFORMANCE MEASURES FOR THE CRIMINAL JUSTICE SYSTEM, supra note 279, at 1, 6 (italics omitted). 326 See text accompanying supra note 17. Draft—Please do not circulate 54 VOLOKH for many reasons; if we do, living conditions become an actual outcome of the system. Thus, some of Logan’s dimensions, like “activity,” which I’m inclined to call an output measure,327 might be an outcome measure given Logan’s normative perspective. The same goes for variables like prison employees’ job satisfaction328 (which I consider an output measure because it’s only instrumentally relevant to prison quality, but which others who care about labor conditions might treat differently) or whether inmates have difficulty concentrating329 (which—unlike, say, overcrowding or physical safety330— many may not consider an appropriate dimension for prison evaluation). Some of the measures, though, for instance the number of urinalysis tests that conducted based on suspicion, are output measures under any definition, and these have the problem that it’s ambiguous whether they’re good or bad. Do we want more or fewer urinalysis tests based on suspicion? More tests could mean that drug use has gone up; or it could mean that prison authorities are getting more serious about controlling drug use. Even worse, prison authorities’ stringency is something prison authorities themselves can control; this is a serious problem, which I discuss below.331 As a final note, I’ll mention that while it’s vitally important to have good cost measures that are adequate for comparing public and private prisons, it’s not necessary to include cost in the private contractor’s compensation. If we couldn’t measure quality, perhaps there would be a role for rate-of-return regulation, which might at least limits some of the private sector’s harmful cost-cutting tendencies.332 But if we’re going to engage in quality measurement, we might as well enforce quality directly by getting the re- See DiIulio, supra note 279, at 152 (distinguishing between certain “process measures” and certain “performance measures” within Logan’s “security” dimension); see also Gaes, supra note 36, at 23 (“[J]urisdictions that buy prison services are most concerned about internal performance measures such as order, health, case management, program services, and safety.”). 328 See text accompanying supra note 307. 329 Ross et al., supra note 305, at 464. 330 See text accompanying supra note 305. 331 See infra Part IV.C.2. 332 Cf. W. KIP VISCUSI ET AL., ECONOMICS OF REGULATION AND ANTITRUST 430–36 (4th ed. 2005) (discussing the theory of traditional rate-of-return regulation, primarily in the context of electric utilities). 327 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 55 wards or penalties “right”;333 let the private firms worry about their own costs.334 IV. CONCERNS AND CRITIQUES Despite the advantages discussed in the previous section, the use of performance measures has its pitfalls. One concern, so obvious as not to merit its own section heading, is the issue of administrative costs. Recidivism-based contracts require that one track released prisoners adequately; perhaps there would be substantial startup costs.335 But if performance-based contracting is beneficial at all, its benefits are probably great enough that these startup costs are worthwhile.336 This Part focuses on other concerns and critiques. First, there is the concern that one can’t set the proper prices in a theoretically defensible way. Second, there’s the concern that performancebased compensation will affect market structure, either by driving out the public-interested or by driving out the risk-averse. Third, there’s the concern that performance-based compensation will lead to undesirable strategic behavior, for instance via manipulation of the choice of performance goals, by distorting effort across various dimensions of performance, by distorting effort across various types of inmate, and by encouraging outright falsification. A. What Prices to Set The focus on performance measures might seem grating to those who criticize the turn toward efficiency analysis and comparative effectiveness and stress moral considerations.337 But one can support performance measures without endorsing efficiency in 333 See infra Part IV.A. Cf. Shapiro & Steinzor, supra note 218, at 1767 (questioning whether reducing regulatory cost to the private sector should be a GPRA performance measure for the FDA). 335 See Durham, supra note 20, at 66; id. at 67 (“‘At none of the sites we examined were attempts made by government to evaluate rehabilitative success.’ (quoting COUNCIL OF STATE GOV’TS & URBAN INST., ISSUES IN CONTRACTING FOR THE PRIVATE OPERATION OF PRISONS AND JAILS 115 (1987))). 336 Cf. Low, supra note 217, at 64. One might also measure a random sample of inmates, see id. at 46, though this might exacerbate risk issues. See infra Part IV.B.2. 337 Sharon Dolovich critiques “comparative efficiency” analysis and stresses moral considerations, see, e.g., Dolovich, supra note 3; Dolovich, supra note 5, though to my knowledge she hasn’t opined on performance measures. 334 Draft—Please do not circulate 56 VOLOKH any way—in fact, as a better way of achieving particular moral goals. I myself have been critical of a focus on efficiency in the context of regulatory cost-benefit analysis,338 another example of hardnumbers-based accountability. To restate the problems of costbenefit analysis in the prison context: What’s the social value of having less recidivism? To ask this in an economic context, we’d have to know either the maximum amount people would be willing to pay to reduce crime, or the minimum amount people would accept to acquiesce in an increase in crime. These are in general different amounts, and the choice between them is value-laden.339 Suppose we choose one of these numbers to measure; we may find that, when surveyed, some people—who reject the very notion of paying or being paid for reductions or increases in crime—give answers of zero or infinity for their willingness to pay or accept; the number we’re seeking may just not exist for these people.340 Some people may have true willingnesses to pay or accept, but they don’t even know what they are: we only come to know such numbers because of our experience paying for and consuming goods and services in the real world, but increases and decreases in crime generally aren’t traded in markets. So the very act of asking for the number may bring some number into being, but there’s no reason to suppose it’s accurate.341 Or, people may know the number, but there’s no incentive for them to truthfully reveal it in surveys. Even if we use non-survey-based estimation methods—how much higher are house prices in lower-crime areas? how much do people pay to avoid crime?—econometric analysis isn’t good enough to give us the correct number.342 The political process is also likely to manipulate the numbers.343 Moreover, concerns that are hard to quantify can be systematically slighted.344 338 See Alexander Volokh, Rationality or Rationalism? The Positive and Normative Flaws of Cost-Benefit Analysis, 48 HOUS. L. REV. 79 (2011). 339 See id. at 82–83. 340 See id. at 84. 341 See id. at 85–86. 342 See id. at 86–88. 343 See, e.g., Frank Ackerman & Lisa Heinzerling, Pricing the Priceless: Cost-Benefit Analysis of Environmental Protection, 150 U. PA. L. REV. 1553, 1580 (2002) (regulated industry has incentive to overstate costs). 344 See id. at 1579–80. This gives rise to potentially serious strategic behavior, which I address in infra Part IV.C.2. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 57 In short, “[w]hile cost-benefit analysis may look like rationality, perhaps it’s merely rationalism.”345 And these are just the problems for people who accept the utilitarian basis of cost-benefit analysis. The problems for those who reject utilitarianism as a moral philosophy are even greater.346 Surely corrections policy, of all things, should be decided with respect to morality and human values rather than numbers? These are real problems with cost-benefit analysis, and they potentially infect performance-based contracting as well. Setting the incentives in a performance-based contract means either setting the relative weights of every component of performance,347 or (equivalently) setting the separate rewards or penalties for every component of performance.348 Getting the prices “right,” in an efficiency sense, requires knowing the social value of the different components of performance;349 if that social value doesn’t exist or can’t be measured, it’s an impossible task. I agree and disagree with this critique. As to the moral objection: even though moral values have an extremely important place in criminal law and policy, I have no essential problem with using economic incentives to improve outcomes in the process. I’ve argued elsewhere that the valid arguments for or against private prisons generally are essentially empirical;350 measuring performance is an essential part of that debate, even though the choice of outcomes to measure is a value-laden enterprise;351 and attaching incentives to those performance measures is eminently justifiable if the result is a morally more just correctional system. 345 Volokh, supra note 338, at 88. See id. at 88–91. 347 See supra note 283 and accompanying text. 348 These two approaches are identical. Let xi be the ith component of performance and pi be the reward for that component. Then the total performance-based component of compensation is Σpixi. Let P be the sum of the prices (P=Σpi). Then the performance-based component of compensation can be expressed as P Σ(pi/P)xi = P Σwixi, where wi = pi/P is the weight placed on the ith component of performance and P is the price attached to the overall performance index Σwixi,. 349 Not that the price necessarily has to be equal to the social value—paying the price requires incurring the deadweight losses involved in raising tax money, and making incentives so highpowered might make the contract too risky. See infra Part IV.B.2 for a discussion of optimal risk allocation. But at least the optimal prices (or at least the relative optimal prices of the different components of performance), from an efficiency perspective, will probably bear some relation to social value. 350 See generally Volokh, supra note 6. 351 See text accompanying supra notes 321–324. 346 Draft—Please do not circulate 58 VOLOKH As to the theoretical incoherence objection, I’m sympathetic. But the enterprise can still be salvaged if we adopt a humble attitude.352 Rather than trying to achieve incentives that are correct in some abstract sense,353 we can just try to muddle through and ameliorate the problems of the current system by attaching some weight to factors that traditionally haven’t been rewarded. None of this requires buying into the efficiency norm.354 Maybe the weights will be wrong, but “[t]he basic question . . . is whether the dangers of providing improper incentives through imperfect models outweigh the benefits of providing program direction and accountability.”355 Is adding this element of imperfect, numbers-based accountability better than not? The remaining sections in this Part address this question. B. Effects on Market Structure This section discusses how performance-based compensation can change the composition of providers. First, it will attract providers who respond better to market incentives, which might affect the overall public-interestedness of the industry. Second, because performance-based compensation is riskier than flat-rate compensation, it will discourage the more risk-averse providers. 1. Public-Interestedness Todd Henderson and Fred Tung address this concern in the context of performance-based compensation for regulators. If regulators are currently public-interested, introducing market incentives might change the culture within the agency. “Once diligence has been priced, perhaps some regulators will slack.”356 352 Cf. Christopher C. DeMuth & Douglas H. Ginsburg, Rationalism in Regulation, 108 MICH. L. REV. 877, 885 (2010) (Richard Revesz and Michael Livermore “regard regulatory cost-benefit analysis as a device for social engineering. . . . Our view of cost-benefit analysis is much more modest. . . . [W]e think that many important political questions . . . cannot be effectively decided by costbenefit analysis.”). 353 See DiIulio, supra note 279, at 146. 354 Barnow, supra note 224, at 279. 355 Barnow, supra note 224, at 307; see also Henderson & Tung, supra note 356, at at 36 (“We make no attempt to offer firm prescriptions for the optimal ratio [between debt and equity]. The mix should induce regulators to care about bank profits but not at the expense of risk shifting to creditors.”). 356 M. Todd Henderson & Frederick Tung, Pay for Regulator Performance, 85 S. CAL. L. REV. 1003, 1056–57 (2012). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 59 This form of compensation will also affect the mix of people who choose to be regulators. “Public service motives might be displaced by financial motivations among new hires . . . . Eventually, the composition of the regulatory agency could change for the worse.”357 Henderson and Tung conclude, citing the crowding out literature,358 that this is possible, though not necessary: “public spiritedness and financial reward [might not be] mutually exclusive, up to a point.”359 Moreover, changing the mix of individuals “could be a good,” given the failures of the current crop of people.360 The same arguments can be applied to performance-based compensation for prison providers. I would add that, to the extent we’re considering performance-based compensation for private firms rather than public servants,361 we don’t need to worry about making providers any more mercenary than they already are: if there’s one thing advocates and opponents of private prisons agree on, it’s that private prison providers are a profit-oriented bunch. Not that the profit motive is inconsistent with publicinterestedness: public servants “profit” from their employment too without being accused of thereby necessarily becoming mercenaries;362 moreover, corrections professionals move between the public and private sectors and presumably take their professionalism with them. Finally, as I discuss further below,363 performancebased compensation, combined with social impact bonds, allows nonprofits to raise money from private investors, so to this extent, introducing the profit motive may turn out to be a great boon for charitable and public-interested providers. 357 Id. at 1057. Id. at 1057 n.182 (citing Ernst Fehr & Armin Falk, Psychological Foundations of Incentives, 46 EUR. ECON. REV. 687, 688 (2002); Uri Gneezy & Aldo Rustichini, A Fine Is a Price, 29 J. LEGAL STUD. 1, 14 (2000)). 359 Id. at 1057. 360 Id. 361 But see supra Part III.C.3 (discussing possibilities for merit pay for public prison wardens). 362 See Volokh, supra note 6, at 178–85. 363 See infra Part IV.B.2. 358 Draft—Please do not circulate 60 VOLOKH 2. Risk and Capital Requirements a. The Risk Is in the Slope We’ve seen, in the discussion of Charles Logan’s approach above,364 the concern that performance measures be based on factors that the relevant actor can actually control. Such concerns crop up frequently;365 James Q. Wilson even says, in the context of police departments, that public order and safety aren’t “‘real’ measures of overall success” because whatever about them is measurable “can only partially, if at all, be affected by police behavior.”366 When he does favor a “micro-level measure of success” of whether the neighborhood is becoming safer and more orderly,367 he still limits it to cases where the level of danger and disorder is “amenable . . . to improvement by a given, feasible level of police and public action.”368 The concern in the literature over controlling for baselines is similarly motivated.369 This seems mistaken: overall public order and safety are measures of the success of police departments, and (given that prison programs and conditions affect recidivism to some extent370) lower recividism is a measure of the success of prisons.371 364 See text accompanying supra notes 327–316. See, e.g., Petersilia, supra note 308, at 66; DICKER, supra note 119, at 17; GRIZZLE ET AL., supra note 235, at 48–49. 366 James Q. Wilson, The Problem of Defining Agency Success, in PERFORMANCE MEASURES FOR THE CRIMINAL JUSTICE SYSTEM, supra note 279, at 156, 159; see also DiIulio, supra note 325, at 1–2, 13. 367 Id. at 160–62. 368 Id. at 161. 369 See text accompanying supra notes 234–243. 370 See Camp et al., supra note 96; DIIULIO, supra note 365, at 106–45; DiIulio, supra note 325, at 2; M. Keith Chen & Jesse M. Shapiro, Do Harsher Prison Conditions Reduce Recidivism? A Discontinuity-Based Approach, 9 AM. L. & ECON. REV. 1, 17–21 (2007); Francesco Drago et al., Prison Conditions and Recidivism, 13 AM. L. & ECON. REV. 103, 120–25 (2011); Daniel S. Nagin et al., Imprisonment and Reoffending, 38 CRIME & JUST. 115, 115 (2009); Rafael Di Tella & Ernesto Schargrodsky, Criminal Recidivism After Prison and Electronic Monitoring 28 (Nat'l Bureau of Econ. Research, Working Paper No. 15602, 2009, rev. 2010). See also GAES ET AL., supra note 7, at 124 (citing S.D. Bushway et al., An Empirical Framework for Studying Desistance, 39 CRIMINOLOGY 491 (2001); J. Grogger, The Effect of Arrest on the Employment and Earnings of Young Men, 110 Q.J. ECON. 51 (1995); J. Kling, The Effect of Prison Sentence Length on the Subsequent Employment and Earnings of a Criminal Defendant (Woodrow Wilson Sch. Econ. Disc. Paper 208, 1999); R.J. SAMPSON & J.H. LAUB, CRIME IN THE MAKING: PATHWAYS AND TURNING POINTS THROUGH LIFE (1993)); id. at 129 (citing A.R. Piquero et al., Assessing the Impact of Exposure Time and Incapacitation on Longitudinal Trajectories of Criminal Offending, 16 J. ADOL. RES. 54 (2001)); id. at 136 (citing G.G. Gaes & N. Kendig, The Skill Sets and Health Care Needs of Releasing Offenders, paper presented at the National Policy Conference, From Prison to Home: The Effect of Incarceration and Reentry on Children, Families, and Communities, Jan. 30–31, 2002). 365 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 61 It’s true that these measures come with a lot of noise attached— that is, with a lot of omitted variables reflecting the contribution of other people’s efforts, as well as environmental variables.372 But that doesn’t mean it’s wrong to use them for purposes of accountability, or even to tie compensation to them. There are two concerns about using these noisy measures: first, that the level of the unobserved variables at the beginning of the contract might establish a high-recidivism baseline, for which the contractor will have to be compensated very highly, or a lowrecidivism baseline, for which the contractor will collect more than it deserves; and second, that variation in the unobserved variables might create a lot of risk for the contractor.373 As to the first concern, recall the earlier discussion about whether to control for baselines.374 Whether or not we adjust the contract price to take into account the baseline expected level of performance should have little effect on government expenditures: a high baseline translates into less quality being attributed to the contractor and thus to lower payments, and so the contractor will demand more money at the bidding stage, and vice versa. The same reasoning addresses the second concern: because controlling for baselines doesn’t affect the contractor’s payout—it basically amounts to adding or subtracting a constant, which is subtracted or added right back at the bidding stage—it also doesn’t necessarily affect risk.375 What definitely affects risk is not the level of compensation, but its slope. A contract that compensates the contractor based on the portion of performance he was able to control isn’t necessarily less risky than one that doesn’t, but a contract where the perquality-unit price is lower is less risky. Thus, in the numerical example discussed earlier,376 a contract with a $1 reward per quality unit (regardless of the fixed component of the contract) is riskier DiIulio, supra note 325, at 5 (“[C]rime rates and recidivism rates are indeed important[, though not the only,] measures of the system’s performance, which ought to be continually used and refined.”). 372 Barnow, supra note 224, at 281 (these are “gross outcome measures . . . in the sense that they do not necessarily reflect gains from the program”). 373 HARDING, supra note 70, at 68 (“[T]he human variables are too volatile for any contractor to be expected to stand or fall by outputs alone . . . .”); Kyle, supra note 158, at 2112; Lynn, supra note 161, at 12. 374 See supra Part III.C.2.c. 375 See text accompanying supra notes 248. 376 See text accompanying supra notes 228–232, 244. 371 Draft—Please do not circulate 62 VOLOKH than a contract with a $0.50 reward per quality unit; an even less risky contract is one with a $0 reward per quality unit, that is, a fixed-price contract, which is close to the norm; and the least risky possible contract is the cost-plus contract typical of rate-of-return regulation.377 Compensation based on a continuous quality measure is less risky than compensation based on a discrete quality measure (as long as the provider has some chance of being on either side of the cutoff);378 thus, “$1 for each quality unit” is less risky than “$5 but only if you get five quality units.” Do we care? Perhaps large corporations like CCA or The GEO Group, which are publicly traded379 and diversified across many contracts,380 can handle the risk; and they cover three-quarters of the industry.381 Smaller, privately held companies like MTC382 may be more sensitive to risk. Various potential entrants, especially nonprofits,383 must be even more sensitive. Adopting highpowered (i.e., high-slope) contracts may scare away the most risksensitive potential bidders, leaving the field to a few large corporations. (And it isn’t just a matter of risk: if the fixed part of the contract is paid up front while the reward is paid later, possibly a few years later once recidivism statistics come in, this might disadvantage small companies or nonprofits with limited access to capi- 377 See text accompanying supra note 332. See text accompanying supra notes 251–263. 379 See CCA, About CCA, http://www.cca.com/about/ (CCA joined the NYSE in 1994); The GEO Group, Inc., Historic Milestones, http://www.geogroup.com/history (GEO joined the NYSE in 1996). 380 See CCA, supra note 379 (“CCA houses more than 80,000 inmates in more than 60 facilities . . . . CCA currently partners with all three federal corrections agencies . . . , 16 states, more than a dozen local municipalities, and Puerto Rico and the U.S. Virgin Islands.”); The GEO Group, Inc., Who We Are, http://www.geogroup.com/about_us (“GEO's operations include the management and/or ownership of 95 correctional, detention and residential treatment facilities encompassing approximately 72,000 beds.”). 381 See Volokh, supra note 9, at 1237 & n.182 (data from 1999). 382 See Management & Training Corp., Overview, http://www.mtctrains.com/about-mtc/ overview (“Management & Training Corporation (MTC) is a privately- held company”); Volokh, supra note 9, at 1237 & n.182 (5–8% share for MTC in 1999). 383 For discussions of the possibility of nonprofit prisons, see Low, supra note 217; Richard Moran, A Third Option: Nonprofit Prisons, N.Y. TIMES, Aug. 23, 1997, at 23. Compare with discussions of the advantages of nonprofit schools: see Byron W. Brown, Why Governments Run Schools, 11 ECON. EDUC. REV. 287, 293–96 (1992); John Morley, Note, For-Profit and Nonprofit Charter Schools: An Agency Approach, 115 YALE L.J. 1782 (2006). Cf. also Education: Raising the Bar, ECONOMIST, June 15, 2013, at 30 (discussing risk issues for schools and teachers resulting from educational accountability schemes). 378 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 63 tal markets.384) This has potential implications for the competitiveness of the industry,385 possibilities for innovation,386 and the political influence that drives changes in criminal law.387 But the contract doesn’t have to be especially high-stakes.388 The optimal level of risk transfer is probably less than 100%. Rewarding the contractor for increases in quality with a price equal to the social value of quality gives the contractor great incentives but also (since the per-unit reward will be high) subjects him to high risk.389 Flat-fee contracts are relatively low-risk390 but also lowincentive. Some moderate level of risk transfer will optimally balance incentives with risk.391 Thus, the incentive-based portion of the contract is only 10% of the contract price in UK’s Doncaster prison,392 and was only 5% in the federal Bureau of Prisons’ Taft demonstration project.393 Recall that in Britain’s Job Deal program, 30% of the payment is conditional, and only a third of that is related to “hard outcomes,” and even some of those outcomes are slightly “soft.”394 For the cash-flow issue noted above,395 one can also “[c]hange the timing of payments to providers,” for instance by making “a payment every six months for each offender who has not been reconvicted.”396 384 NICHOLSON, supra note 211, at 6 (“The working capital requirements of a [payment-byresults] system will cause problems for Small and Medium Size Enterprises and the third sector [i.e., nonprofits] in bidding for contracts.”). 385 DICKER, supra note 119, at 24 (high incentives, through high risk, will “reduce the diversity of the market” by making it less attractive for nonprofits or small companies). 386 Id. at 23. On the relationship between market concentration and innovation, see Richard Gilbert, Looking for Mr. Schumpeter: Where Are We in the Competition-Innovation Debate, in 6 INNOVATION POLICY AND THE ECONOMY 159 (Adam B. Jaffe et al. eds., 2006) (relationship is inconclusive). 387 See Volokh, supra note 9, 1213–14 (arguing that the degree of concentration of the industry can affect the political influence the industry exerts); Volokh, Privatization, Free-Riding, supra note 138, at 64 (same); Volokh, The Effect of Privatization, supra note 138, at 10–11 (same). 388 DICKER, supra note 119, at 6. 389 See supra note 349. 390 Though not zero-risk: recall that the least risky contracts are cost-plus. See text accompanying supra note 377. 391 DICKER, supra note 119, at 23–24; NICHOLSON, supra note 211, at 6–7. See generally BOLTON & DEWATRIPONT, supra note 249, at 13 (“[W]hen both employer and employee are risk averse, they will optimally share business risk.”). 392 Johnson, supra note 204. 393 See text accompanying supra notes 167–168. 394 See text accompanying supra note 213. 395 See text accompanying supra note 384. 396 DICKER, supra note 119, at 24. Draft—Please do not circulate 64 VOLOKH b. Financing Nonprofits: Social Impact Bonds The need to encourage the nonprofit sector calls for innovative funding mechanisms. Nonprofit prisons have been suggested397 though never implemented.398 But in light of the widespread concern that private prison firms will cut quality to save money, 399 the nonprofit form seems like an obvious alternative. Ed Glaeser and Andrei Shleifer discuss the value of nonprofit status: by weakening the provider’s incentives to maximize profits, nonprofit status can be a valuable signal of quality when quality itself is non-verifiable. (Even using performance measures, it’s reasonable to suppose that some aspects of quality will remain non-verifiable; the value of nonprofit status depends on how important these remaining non-verifiable components are.400) Moreover, altruistic entrepreneurs will tend to be attracted to the nonprofit form.401 And Timothy Besley and Maitreesh Ghatak show that, when both a provider and the government can make productive investments in a project, and when the provider is altruistic, then the provider should own the project if it values it more than the government does.402 Privatization can thus be more beneficial in the presence of altruistic providers. But banks or private equity houses are unlikely to finance such nonprofits, especially when the nonprofits don’t have much of a track record.403 Social Impact Bonds have been proposed as funding mechanism for nonprofits. Rather than contracting directly with a provider, the government contracts with a middleman. This middleman, a “social impact bond-issuing organization,”404 has two functions. 397 See sources cited supra note 383. See Low, supra note 217, at 5 (suggesting creation of nonprofits prisons on “an experimental basis”). 399 See, e.g., Dolovich, supra note 5, at 474–80. 400 See infra Part IV.C.2. 401 Edward L. Glaeser & Andrei Shleifer, Not-for-Profit Entrepreneurs, 81 J. PUB. ECON. 99 (2001). 402 Timothy Besley & Maitreesh Ghatak, Government Versus Private Ownership of Public Goods, 116 Q.J. ECON. 1343, 1347 (2001). 403 NICHOLSON, supra note 211, at 6–7. 404 JEFFREY B. LIEBMAN, SOCIAL IMPACT BONDS: A PROMISING NEW FINANCING MODEL TO ACCELERATE SOCIAL INNOVATION AND IMPROVE GOVERNMENT PERFORMANCE 2 (Ctr. for Am. Prog., 2011). 398 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 65 First, it hires the staff to provide the service. Second, it sells bonds to investors, particularly philanthropic ones;405 these bonds are essentially claims to a portion of the performance-based compensation. If the service provider fulfills the performance-based goals and receives its reward from the government, the investors make money; otherwise they don’t. At the Peterborough prison in the UK, the government doesn’t pay anything unless recidivism is 7.5% less than in a comparison group,406 and payments are capped when the difference reaches 13%;407 at Doncaster, payments don’t start until the difference is 5%.408 The provider’s employees may well be paid something like a flat wage, so their monetary incentives aren’t great; but the bond-issuing organization and the philanthropic investors (whose money is on the line) are probably better at monitoring the staff than the government would be. It remains to be seen, though, whether the philanthropic sector will provide enough funds for nonprofit prison providers to be a viable alternative to for-profit corporations.409 C. Undesirable Strategic Behavior Perhaps the biggest disadvantage of using performance-based compensation is the strategic behavior it may spawn. This strategic behavior may come in several flavors. First, there is the possibility of manipulating the performance goals themselves. Second, effort may be distorted away from some dimensions and toward others. Third, effort may be distorted away from some groups of inmates and toward others. And fourth, performance measures may simply be falsified. 405 Though social impact bonds in the U.S. have been funded by non-philanthropic types such as Goldman Sachs. Social Impact Bonds: Being Good Pays, ECONOMIST, Aug. 18, 2012, at 28. 406 LIEBMAN, supra note 404, at 2. 407 See text accompanying supra note 208. 408 See supra note 208. 409 NICHOLSON, supra note 211, at 16, 18; Social Market Found., Big Hurdles to Be Overcome if Social Impact Bonds to Move from Margins of Public Services, Says Think Tank (July 31, 2013); http://www.smf.co.uk/media/news/big-hurdles-be-overcome-if-social-impact-bonds-move-marginspublic-services-says-think-tank/; Tom Clougherty, Pioneering Social Impact Bonds in the United Kingdom, REASON FOUND. (Aug. 13, 2013), http://reason.org/news/show/pioneering-social-impactbonds. Draft—Please do not circulate 66 VOLOKH 1. Manipulating the Goals The Government Performance and Results Act of 1993410 is one example of a recent effort to inject performance measures into government agencies that hasn’t lived up to the hopes of its supporters. One of the problems was that setting the performance goals was left to the agencies that were to be evaluated. Agencies “tr[ied] to protect themselves by devising euphemistic performance goals in order to ensure that they [could] ‘pass’ their own grading criteria.”411 The Patent and Trademark Office, faced with rising backlogs, set itself progressively longer targets of “average total pendency” from year to year, rising from 27.7 months in fiscal year 2003 to 29.8 months in 2004, 31.0 months in 2005, and 31.3 months in 2006.412 (John DiIulio had warned of a similar danger: “that measurement-driven government workers will, so to speak, ‘set up the target in order to facilitate shooting.’”413 The similar problem was observed in the UK, where “Next Steps agencies,” a type of performance-based organization, set their own targets, which often reflected merely an incremental improvement rather than an assessment of what was possible.414 Why would agencies set goals in such unambitious ways? Perhaps because agencies feared being punished for bad performance with budget cuts.415 Various politicians have indeed suggested that agencies’ funding be tied to their performance results,416 and agencies’ performance results have indeed been relevant to the admin- 410 See supra note 160. Shapiro & Steinzor, supra note 218, at 1744; see also id. at 1760 (“[A]gencies compelled to function in an antiregulatory, even hostile, political atmosphere are predictably reluctant to tell the truth to power. Instead, their goal has become convincing congressional and White House overseers that they are performing well despite budgets that are inadequate for effective implementation of their missions.”). 412 Schoen, supra note 160, at 480. 413 DiIulio, supra note 279, at 154. 414 GEN. ACCOUNTING OFFICE, supra note 218, at 7. 415 Shapiro & Steinzor, supra note 218, at 1744. 416 Schoen, supra note 160, at 464 (citing The Results Act: Are We Getting Results?: Hearings Before the H. Comm. On Gov’t Reform, 105th Cong. 42 (1997), at 20 (statement of Rep. Dick Armey, H. Majority Leader)); id. at 465 (citing Seven Years of GPRA: Has the Results Act Provided Results?: Hearing Before the Subcomm. On Gov’t Mgmt., Info., and Tech. of the H. Comm. on Gov’t Reform, 105th Cong. 21 (2000) (statement of Rep. Pete Sessions, Chairman, Results Caucus)); id. at 466 (citing OMB, THE PRESIDENT’S MANAGEMENT AGENDA 29 (2002), available at http://www. whitehouse.gov/omb/budget/fy2002/mgmt.pdf). 411 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 67 istration’s budget proposals,417 so this fear may have been reasonable—though it’s also possible that performance scores have merely given political cover for cuts to programs that the administration wanted to defund for other reasons.418 On the other hand, the link between funding and performance results isn’t that tight,419 so agencies’ concern to look good may also have been a matter of good public relations. The problem here is that agencies were allowed to think up their own performance goals; that they weren’t required to meet those goals (and indeed, that often the performance information simply wasn’t used in decisionmaking420); and that the goals were binary rather than continuous outcome measures,421 e.g. that the EPA “will achieve and maintain at least 95 percent of the maximum score on readiness evaluation criteria in each region”422 or “complete an additional 975 Superfund-lead hazardous substance removal actions.”423 These problems have easy fixes, though perhaps they weren’t so easy in the context of the GPRA, where the problem was primarily giving performance incentives to public agencies. Prison 417 EILEEN C. NORCROSS & KYLE MCKENZIE, MERCATUS CENTER, GEORGE MASON UNIV., AN ANALYSIS OF THE OFFICE OF MANAGEMENT AND BUDGET’S PROGRAM ASSESSMENT RATING TOOL (PART) FOR FISCAL YEAR 2007, at 22 (May 2006); Eileen Norcross & Joseph Adamson, An Analysis of the Office of the Office of Management and Budget’s Program Assessment Rating Tool (PART) for Fiscal Year 2008, working paper, Mercatus Center, at 25 (2007). 418 John B. Gilmour & David E. Lewis, Does Performance Budgeting Work? An Examination of the Office of Management & Budget’s PART Scores, 66 PUB. ADMIN. REV. 742, 751 (2006); Norcross & Adamson, supra note 417, at 29–30. 419 See, e.g., Jerry Ellig, Has GPRA Increased the Availability and Use of Performance Information?, Mercatus Ctr. Working Paper No. 09-03, Mar. 2009, at 5; Teresa Curristine, Reforming the U.S. Department of Transportation: Challenges and Opportunities of the Government Performance and Results Act for Federal-State Relations, 32 PUBLIUS 25, 42 (2002). 420 Ellig, supra note 419, at 1 (citing Jerry Brito & Jerry Ellig, Toward a More Perfect Union: Regulatory Analysis and Performance Management, forthcoming FLA. ST. U. BUS. REV.); id. at 2 (citing GAO, GOVERNMENT PERFORMANCE: LESSONS LEARNED FOR THE NEXT ADMINISTRATION ON USING PERFORMANCE INFORMATION TO IMPROVE RESULTS (2008)); Schoen, supra note 160, at 466 (citing 10 Years of GPRA—Results, Demonstrated: Hearings Before the Subcomm. On Gov’t Efficiency and Fin. Mgmt. of the H. Comm. on Gov’t Reform, 108th Cong. 4 (2004) (statement of Rep. Edolphus Towns, Member, Subcomm. on Gov’t Efficiency and Fin. Mgmt. of the H. Comm. on Gov’t Reform)). 421 See supra Part III.C.2.d. 422 Shapiro & Steinzor, supra note 218, at 1764 (quoting EPA, 2006–2011 EPA STRATEGIC PLAN: CHARTING OUR COURSE 67 (2006), available at http://www.epa.gov/ocfo/plan/2006/entire_ report.pdf). 423 Id. at 1765 (quoting 2006–2011 EPA STRATEGIC PLAN, supra note 422, at 67); see also id. at 1773 (“[A]ttain water quality standards for all pollutants and impairments in more than 2,250 water bodies . . . . [R]emove at least 5,600 . . . specific causes of water body impairment . . . . [I]mprove water quality conditions in 250 . . . impaired watersheds nationwide . . . .”) (quoting 2006–2011 EPA STRATEGIC PLAN, supra note 422, at 67). Draft—Please do not circulate 68 VOLOKH contracts—or merit pay systems for public prison wardens424— should be set by the Department of Corrections or the relevant contracting authority; goals shouldn’t be set by those who we want to comply with them. No one should be “required” to meet any performance standard, but compensation should be tied to these measures; providers’ self-interest should take care of the rest. And adopting continuous outcome measures, rather than binary goals, reduces the ability to choose easy goals: one can game “achieve x% recidivism” by setting an appropriately high level of x, but it’s harder to game the general effort of reducing recidivism where additional reductions are met with additional rewards.425 2. Distortion Across Dimensions of Performance Everyone agrees that, in most areas, performance has multiple dimensions.426 Each dimension, in a performance-based contract, will have its price,427 and the relative prices of different dimensions will determine how the contractor will allocate his effort among them.428 So far, so good, as long as the set of performance measures is complete. But what if some dimensions of performance are unmeasurable?429 Just as cost-benefit analysis is accused of slighting the soft factors,430 so might performance measures be biased in favor of the measurable. The result is that the contractor’s work effort will be biased in the direction of increasing the measurable dimensions of performance.431 424 See supra Part III.C.3. See also Barnow, supra note 224, at 287 (discussing “whether the size of the award should vary with the extent to which standards are exceeded”); id. at 291–92 (“The national standards are set, based on experience in prior years, so that approximately 75 percent of the nation’s [providers] will exceed the standards . . . .”). 426 See text accompanying supra notes 283, 325–324. 427 See text accompanying supra note 348. 428 See Bengt Holmstrom & Paul Milgrom, Multitask Principal-Agent Analyses: Incentive Contracts, Asset Ownership, and Job Design, 7 J.L., ECON., & ORG. 24, 25 (special issue 1991) (“In general, when there are multiple tasks, incentive pay serves not only to allocate risks and to motivate hard work, it also serves to direct the allocation of the agents’ attention among their various duties.”). 429 See text accompanying supra note 344 (noting retributivism as a possible unmeasurable dimension). 430 See text accompanying supra note 344. 431 GRIZZLE ET AL., supra note 235, at 50–51. 425 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 69 Consider a hypothetical example involving education. Suppose there are two measures of educational quality: “hard” (e.g., knowledge of facts) and “soft” (e.g., citizenship, critical thinking, socialization). Without hard accountability, it might be hard to give teachers serious incentives, so they will slack in their overall work effort, but divide their time between hard and soft education in a balanced way. With hard accountability, teachers can get much higher-powered incentives, but these incentives will tend to be skewed toward the hard measures of education. Thus, the teachers will provide more overall work effort, but their time will be skewed toward hard education.432 How serious is this problem? It depends how important it is to have a balance between hard and soft factors, how hard the soft factors really are to measure, and how harmful the status quo of low work effort is.433 It also depends on whether the one type of education makes the other type easier or harder for the teacher; an excessively high-powered accountability system focusing, say, on standardized test scores could easily promote a “teaching to the test” strategy that can be antithetical to critical thinking (at the very least by taking up class time that could be otherwise used);434 this isn’t necessarily so, but it may be likely.435 Providing highpowered but skewed accountability may be beneficial in severely dysfunctional school systems where neither hard nor soft factors are taught well, but it may be harmful in better school systems. Analogously, in the prison context, one can imagine two dimensions of quality: humane in-prison conditions and low recidivism after prison. Suppose one of these is harder to measure than 432 See generally Holmstrom & Milgrom, supra note 428, at 25 (“It would be better, . . . critics argue, to pay a fixed wage without any incentive scheme than to base teachers’ compensation only on the limited dimensions of student achievement that can be effectively measured.”) (italics omitted); see also Education: Raising the Bar, supra note 383; Peter Smith, On the Unintended Consequences of Publishing Performance Data in the Public Sector, 18 INT’L J. PUB. ADMIN. 277, 284 (1995) (discussing “tunnel vision”). 433 See Holmstrom & Milgrom, supra note 428, at 26 (“[T]he desirability of providing incentives for any one activity decreases with the difficulty of measuring performance in any other activities that make competing demands on the agent’s time and attention.”). 434 This assumes that test scores really are a true outcome measure, even if a partial one. Perhaps this is too charitable, though: it may be better to characterize test scores as proxy measures for a type of intelligence, and “teaching to the test” as a form of manipulation, as described below. See text accompanying infra note 446. 435 See Holmstrom & Milgrom, supra note 432, at 25; id. at 32–33 (desirability of incentives for measurable task depends on whether measurable and unmeasurable tasks are complements or substitutes in agent’s cost function). Draft—Please do not circulate 70 VOLOKH the other. In-prison conditions could be harder to measure if effective monitoring is difficult;436 or perhaps recidivism is harder to measure if there aren’t good databases of offenders, especially if released inmates often commit their crimes in other states.437 Whichever one turns out to be less measurable, we can expect effort to be skewed toward the more measurable one. Would it make a difference if prison policies were skewed toward humane conditions or toward reducing recidivism? If the two go together—if humane conditions are, on balance, effective at reducing recidivism438—then the inability to monitor both dimensions can be harmless. On the other hand, if bad prison conditions, on balance, reduce recidivism through a general deterrent effect,439 a focus on recidivism could lead to bad prison conditions—in which case there’s no guarantee that high-powered accountability would improve overall quality in the absence of effective in-prison monitoring. Since the precise determinants of recidivism aren’t well understood, this shows the importance of properly monitoring whatever is considered desirable in the prison.440 In the extreme case, where some tasks remain completely unmeasurable and shirking on that task is highly detrimental to overall quality, we should junk the idea of high-powered incentives: the traditional input-and-output approach may then be optimal.441 If an unmeasurable outcome is represented in the accountability scheme by some inputs or outputs as proxies, the possibilities for undesirable strategic behavior multiply. The previous examples 436 See infra part IV.C.4. [*Cite source on problems with recidivism monitoring.] 438 See sources cited supra note 370. 439 See, e.g., Lawrence Katz et al., Prison Conditions, Capital Punishment, and Deterrence, 5 AM. L. & ECON. REV. 318, 331 (2003); Kelly Bedard & Eric Helland, The Location of Women's Prisons and the Deterrence Effect of “Harder” Time, 24 INT'L REV. L. & ECON. 147, 159–61 (2004); Alexander Volokh, Prison Vouchers, 160 U. PA. L. REV. 779, 843–45 (2012). But see TOM R. TYLER, WHY PEOPLE OBEY THE LAW 64 (1990) (“The most important normative influence on compliance with the law is the person's assessment that following the law accords with his or her sense of right and wrong; a second factor is the person's feeling of obligation to obey the law and allegiance to legal authorities.”); Paul H. Robinson & John M. Darley, The Role of Deterrence in the Formulation of Criminal Law Rules: At Its Worst When Doing Its Best, 91 GEO. L.J. 949, 953–56 (2003). 440 See infra Part IV.C.4. 441 See Holmstrom & Milgrom, supra note 428, at 27 (“[I]ncentives for a task can be provided in two ways: either the task itself can be rewarded or the marginal opportunity cost for the task itself can be lowered by removing or reducing the incentives on competing tasks. Constraints are substitutes for performance incentives and are extensively used when it is hard to assess the performance of the agent.”). 437 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 71 involved ignoring the unmeasurable elements and maximizing the measurable component of performance, rather than maximizing overall performance. Replacing unmeasurable elements with proxies within the provider’s direct control leads to pursuing the proxies for their own sake—which one can uncharitably call “manipulating” the proxy measures. For example, consider recidivism rates, which I’ve been treating throughout as a true outcome measure. In reality, no one knows true recidivism rates; we don’t know that a released inmate has committed a crime unless we catch him (and, depending on the recidivism measure we’re using, unless we convict him or reincarcerate him442). So in reality, rather than using the unmeasurable dimension of recidivism, we’re using the measurable proxy of, say, rearrest rates. If the relationship between rearrest rates and true recidivism is stable, using this proxy can be harmless; but more important still is that the contractor not be able to manipulate the rates in ways that don’t correspond to true social improvements. Thus, if in-prison misconduct is penalized, corrections officers will use their discretion very differently when deciding whether to write up an offense.443 If urinalysis tests based on suspicion are rewarded, we can magically expect more inmates to seem suspicious. Perhaps the output (drug tests based on suspicion) seems to have a straightforward correlation with the outcome (inmate drug use, if one chooses to consider that an outcome444); but make it a subject of compensation, and you can’t rely on that correlation anymore. Administrators will start pursuing the output for its own sake. Similarly, in the context of community corrections, Joan Petersilia criticizes the use of recidivism rates as an outcome measure: if the number of arrests increases, is that bad because more people are committing offenses? Or is it good because probation officers are better at detecting technical violations and sending released offenders back to prison?445 If we decided that increased arrest rates were bad and attached penalties to that variable, we might 442 See text accompanying supra note 118. GAES ET AL., supra note 7, at 51. 444 I prefer to think of drug use as neutral in itself, though one can want to control inmate drug use instrumentally for the sake of outcomes like violence or rehabilitation. 445 Petersilia, supra note 308, at 66–67; see also GAES ET AL., supra note 7, at 23–24; see also supra note 74. 443 Draft—Please do not circulate 72 VOLOKH find arrest rates plummeting, but merely because probation officers stopped supervising their charges very closely. Recidivism may thus be a bad measure for the accountability of probation officers. But it can be a good measure for the accountability of prisons, provided that prisons leave supervision and rearrest to entirely separate actors. This is a reason to insist on the separation of prisons and probation officers, not granting contracts to criminal justice providers that are too integrated, and more generally preventing prisons from giving any incentives at all, even subtle ones, to probation officers.446 Similarly, the results of drug testing can be an acceptable measure, but random testing is better than testing based on suspicion. In-prison misconduct can be an acceptable measure, but it should be the type of serious misconduct that’s least likely to be overlooked or characterized as something else. We might even have to guard against other kinds of gaming: if prisons can affect where prisoners are released, for instance by partnering with post-release job placement programs that have good contacts in particular areas, they can try to have prisoners released in areas where policing is weaker. For understandable political economy reasons, a state Department of Corrections might choose to ignore the welfare of people in other states and tie compensation only to an in-state measure of recidivism; then, the prison does better by finding out-of-state jobs for its inmates. A prison might also try to prevent recidivism by “paying offenders to desist,” but this might be controversial.447 (Of course, even if we only use performance measures to reward providers, providers will inevitably have to translate these incentives into specific input or output-based incentives to reward their own staff, at least in part—there are limits to the possibilities of stock options.448 But presumably then the provider will have better incentives and better ability to monitor its own staff than the government has to monitor the provider.) 446 See also Smith, supra note 432, at 286 (discussing “suboptimization” and “measure fixa- tion”). 447 448 DICKER, supra note 119, at 19. On the use of stock options in private prisons, see Volokh, supra note 6, at 174. Draft—Please do not circulate 2013] PERFORMANCE MEASURES 73 3. Distortion Across Types of Inmates One common complaint about high-powered outcome-based incentives is that they’ll lead to two related phenomena: “creaming”—only taking the easiest inmates—and “parking”—not providing services to the most difficult inmates.449 There’s an easy way to prevent providers from taking the easiest inmates: insist that providers take all comers,450 limit opportunities for providers to transfer inmates it doesn’t like out of the prison, and have assigning agencies not discriminate either in favor of or against particular providers in assignment.451 There remains, though, the concern that providers will be, for instance, more enthusiastic about providing rehabilitative services to those that can more likely benefit from them. There are two lines of response to this concern. Clearly, paying the same rate, regardless of how hard the offender is to serve, will lead to parking;452 one can therefore provide payments that are inmate-specific, where a harder-to-serve inmate’s desistance from crime is rewarded more generously than an easier-to-serve inmate’s. These payments can be based on the observable characteristics of the inmate; some characteristics might be illegal to consider while others can be better observed by the provider than by the government, so there will inevitably be some degree of mismatch.453 But a system of non-uniform rewards can generally alleviate parking. The second line of response would question whether parking is even bad. Suppose some inmates are hard to rehabilitate, so prisons—in the presence of uniform rewards—will tend to spend less time trying to rehabilitate them. Is this bad? Some nonuniformity of rewards will be inevitable—presumably a murder by a released 449 DICKER, supra note 119, at 23; see also Inwood, supra note 205; Kyle, supra note 158, at 2112; Barnow, supra note 224, at 287, 297–98, 305–06; Pozen, supra note 77, at 283; RICHARD A. MCGOWAN, PRIVATIZE THIS?: ASSESSING THE OPPORTUNITIES AND COSTS OF PRIVATIZATION 166 (2011). 450 See Gilroy, supra note 159 (“So literally, you have the private vendor take over the exact same population, and then use the same metrics you use to assess the public facilities.”); cf. Volokh, supra note 439, at 806–07. 451 See text accompanying supra notes 131–132. 452 DICKER, supra note 119, at 24; NICHOLSON, supra note 211, at 6–7; David Boyle, The Perils of Obsessive Measurement, RSA: 21ST CENTURY ENLIGHTENMENT, Nov. 1, 2010, available at http://comment.rsablogs.org.uk/2010/11/01/perils-obsessive-measurement/. 453 Cf. Volokh, supra note 439, at 806–07. Draft—Please do not circulate 74 VOLOKH inmate will be penalized more heavily than a minor crime. But suppose there’s a group of inmates whose recidivism is equally harmful. Wouldn’t it be socially beneficial for the provider to concentrate its resources on the ones whose crimes can be prevented most cheaply, so that more inmates can be treated at the same cost? At least, so an efficiency framework might counsel. If one subscribes to a certain form of equity where everyone should have some amount of (even ineffective) rehabilitation, one might want to fall back on the solution I mentioned above: offering higher payments for the harder-to-treat inmates454 or, if that can’t be done reliably, mandating some amount of inputs or outputs. 4. Falsifying Performance Measures Finally, when high-stakes compensation depends on numbers, there’s an obvious incentive to falsify the numbers themselves.455 Reports of school cheating scandals are commonplace.456 Similarly, in the prison context, private providers plausibly prefer to underreport incidents, at least if they wouldn’t inevitably become known.457 Failure to report is grounds for contract termination, which can cut in the other direction, but contract termination is a strong remedy that’s rarely used.458 Public prisons, on the other hand, might have an incentive to overreport to get more funds, unless they’re in competition with private facilities.459 Whichever way the incentives cut, the fact that compensation will inevitably be to some extent based on variables reported by the provider means that it’s important to seriously invest in monitoring. Currently, monitoring practices vary quite a lot, “from minimal attention from a centrally located contract administrator to a 454 Id. at 25. Boyle, supra note 452; Smith, supra note 432, at 292. See, e.g., Emily Richmond, Did High-Stakes Testing Cause the Atlanta Schools Teaching Scandal?, THEATLANTIC.COM, Apr. 3, 2013, http://www.theatlantic.com/national/archive/2013/04/ did-high-stakes-testing-cause-the-atlanta-schools-cheating-scandal/274619/. 457 See Gaes et al., supra note 59, at 18; Developments, supra note 9, at 1884; JOEL DYER, THE PERPETUAL PRISONER MACHINE: HOW AMERICA PROFITS FROM CRIME 211, 221 (2000); Low, supra note 217, at 39 (citing JOHN L. CLARK ET AL., REPORT TO THE ATTORNEY GENERAL: INSPECTION AND REVIEW OF THE NORTHEAST OHIO CORRECTIONAL CENTER, at VII.B.2 (1998) (reporting that CCA’s legal counsel advised administrators against writing reports about incidents because of concern over legal liability); id. at VIII, XI; HARDING, supra note 70, at 323–24). 458 See text accompanying supra notes 260–261. 459 See Gaes et al., supra note 59, at 18. 455 456 Draft—Please do not circulate 2013] PERFORMANCE MEASURES 75 combination of a contract administrator and one or more on-site monitors.”460 The monitors themselves may have responsibility for more than one facility, which puts them on site at any particular prison once a quarter, once a week, or daily.461 Instead, contracts should provide for a full-time, on-site monitor462 with “unlimited access to the correctional facilities and assigned correctional units,”463 who isn’t the provider’s employee (even if the contract might mandate that the provider pay his salary as part of the deal).464 Because the capture of monitors is an enduring concern,465 other forms of monitoring are possible: a public-interest group could be given inspection rights,466 the surrounding community might be designated as a third-party beneficiary,467 or the constitutional tort regime for prisons could be strengthened (rather than weakened, which is the current trend).468 A strong disclosure regime is also probably a good idea.469 One way of guaranteeing disclosure is to subject private prisons under contract with the federal government to the Freedom of Information Act,470 perhaps along the lines of the often-proposed Private Prison Information Act. Private prison firms themselves aren’t “agencies” for the purposes of FOIA,471 and the Bureau of Prisons isn’t covered if it hasn’t “created and retained” or doesn’t 460 MCDONALD ET AL., supra note 125, at 50. Id. at 50, 51 tbl. 4.1. 462 Thomas, supra note 134, at 109. 463 Fla. SB 2038, supra note 169, § 1, at 9 (creating FLA. STAT. § 944.7115(8)(d)). 464 Id.; see also Gilroy, supra note 159 (full-time monitor at each private prison in Ohio plus surprise inspections by Correctional Institution Inspection Committee); Nicole B. Cásarez, Furthering the Accountability Principle in Privatized Federal Corrections: The Need for Access to Private Prison Records, 28 U. MICH. J.L. REFORM 249, 293 (1995) (citing Ira P. Robbins, The Legal Dimensions of Private Incarceration, 38 AM. U. L. REV. 531, 752 (1989)) (Robbins’s Model Contract “calls for an employee of the contracting agency to have access to prison facilities and all records kept by the contractor at all times”); Low, supra note 217 (citing CLARK ET AL., supra note 457, at R-24, ch. XI). 465 Cásarez, supra note 464, at 295; Dolovich, supra note 5, at 490–95. 466 See Low, supra note 217, at 38. 467 See Freeman, supra note 482, at 1317. 468 See Volokh, supra note 145. 469 See also id. at 293 (American Correctional Association requires that certain records be maintained “for facility accreditation and the contracting agency”). 470 5 U.S.C. § 552. 471 Cásarez, supra note 464, at 268–79; Forsham v. Harris, 445 U.S. 169 (1980) (whether private firm subject to FOIA depends on whether subject to extensive, day-to-day government control). 461 Draft—Please do not circulate 76 VOLOKH actually possess the documents.472 Even after these hurdles, much qualifying information, like contracts or incident reports, would be exempt under exemption 4, which protects “trade secrets and commercial or financial information . . . [that is] privileged or confidential.”473 Exemption 4 could be applied either if “disclosure could impair the reliability of data,”474 or if “disclosure would cause substantial competitive injury to the provider.”475 The competitive injury justification could be fairly broad—knowing the terms of a contract, for instance, can reveal the terms of the winning proposal to the winning firm’s competitors.476 Indeed, FOIA has been criticized as “a lawful tool of industrial espionage.”477 On the other hand, says Cásarez, FOIA provides for the disclosure of “reasonably segregable portion[s]” of documents,478 which “should include monitoring and reporting requirements.”479 Logan counsels against “saddl[ing] private prison operators with expensive monitoring requirements ‘far beyond those that exist for government prisons,’”480 but FOIA applicability would cut in the direction of establishing parity. Similar legislative fixes are possible in the states: for instance, in Florida and Georgia, open records acts “already apply to private organizations that act on behalf of state agencies.”481 All of this (as well as any relevant public-law value) could also be imposed on private contractors by contract; Jody Freeman calls this process “publicization.”482 Another possibility is to assure access to the prison by the public and the press.483 Bentham, who had smart things to say about 472 Cásarez, supra note 464, at 279–84; Kissinger v. Reporters Comm. for Freedom of the Press, 445 U.S. 136 (1990) (FOIA requires agency to disclose only documents it has “created and retained”). 473 Cásarez, supra note 464, at 284–91; 5 U.S.C. § 552(b)(4). 474 Cásarez, supra note 464, at 287 (citing Critical Mass Energy Project v. NRC, 975 F.2d 871, 878 (D.C. Cir. 1992)). 475 Id. 476 Id. at 289; see also text accompanying supra note 36. 477 Id. at 292 (quoting Stephen S. Madsen, Note, Protecting Confidential Business Information from Federal Agency Disclosure After Chrysler Corp. v. Brown, 80 COLUM. L. REV. 109, 113 (1980)). 478 5 U.S.C. § 552(b). 479 Cásarez, supra note 464, at 289. 480 Id. at 260 (quoting CHARLES LOGAN, PRIVATE PRISONS: CONS AND PROS 147 (1990)). 481 Id. at 296 (quoting FLA. STAT. ANN. § 119.011(2); GA. CODE ANN. § 50-18-70(a)). 482 Pronounced [pŭb'lĭ-kĭ-zā'shən]. Jody Freeman, Extending Public Law Norms Through Privatization, 116 HARV. L. REV. 1285, 1285 (2003). 483 Id. at 299 (citing Robbins, supra note 464, at 752–53 (Model Contract § 6(B))). Draft—Please do not circulate 2013] PERFORMANCE MEASURES 77 the bidding process two centuries ago,484 also argued for “essentially unrestricted public access”485 to (private) facilities. His prison design: enables the whole establishment to be inspected almost at a view; it would be my study to render it a spectacle, as persons of all classes would, in the way of amusement, be curious to partake of, and that not only on Sundays at the time of Divine service, but on ordinary days at meal times or times of work, providing therefore a system of inspection, universal, free, and gratuitous, the most effectual and permanent securities against abuse.486 I don’t want to endorse watching prisoners as a source of amusement, but the idea of public access does seem to have some advantages in terms of accountability. V. CONCLUSION The failure of the comparative effectiveness studies, therefore, is completely understandable. Aside from the methodological problems, it’s quite plausible that the results of prison privatization have been inconclusive because the changes in prison management that would lead to better performance are often neither permitted nor rewarded. Using performance measures would change this by helping us do valid comparative studies, enabling the fair public-private competitions that are a hallmark of competitive neutrality, and pushing policymakers to clearly formulate what we want out of prisons. Even better, using performance measures directly to drive compensation has the potential to radically alter prison outcomes by rewarding good performance and penalizing bad performance; this definitely has applicability for private prisons but could possibly be used for public prison wardens as well. The critiques are serious, but I don’t believe they undermine the experiment too seriously. The information necessary to calculate the True Social Values in an efficiency framework may never be available, but we can ap484 485 486 See text accompanying supra note 245. Durham, supra note 20, at 69. Id. (quoting JEREMY BENTHAM, A BENTHAM READER 200 (Mary Peter Mack ed., 1969)). Draft—Please do not circulate 78 VOLOKH proach the exercise with an air of humility, seeking only to improve incentives at the margins, not to achieve optimal social engineering. The use of market incentives probably won’t alter the publicinterestedness of those who work at private prison firms, but it might alter the mix of people who choose to work in the public sector; on the other hand, combined with social impact bonds, performance-based compensation can also spur the growth of nonprofit providers. Because small firms and nonprofits are particularly sensitive to risk, the incentives should only be moderately highpowered, to trade off incentives and risk tolerance. Performance-based compensation will give rise to certain possibly undesirable strategic behavior. If providers can set their own goals, they’ll be inclined to set them in ways that are easy to meet; this is why providers shouldn’t set the goals at all, and in any event compensation should be based on the level of a continuous variable, not a binary goal. If some dimensions of quality are hard to measure, performance-based compensation will bias providers’ effort toward the more measurable aspects of performance; this means that some reliance on inputs and outputs will still be necessary, having due regard for the need to avoid choosing measures that can be easily and undesirably manipulated by providers. Compensation schemes might lead providers to concentrate on treating certain inmates and neglect others; even if this is bad (which isn’t clear), the problem can be alleviated by inmate-specific rewards. Finally, the levels of the measures themselves can be falsified, which points to the need for serious investments in monitoring and robust disclosure regimes. These concerns are real, but the lesson to take from them is that more experimentation is required to see how much of a real-world effect they have and to what degree they really vitiate the promise of performance incentives. The status quo, where the level of experimentation is close to zero, is unlikely to be optimal. Draft—Please do not circulate