Executive Summary
This paper investigates whether U.S. Immigration and Customs Enforcement (ICE) administrative data records neutral legal severity or whether it is systematically shaped by the enforcement pipeline through which individuals enter the immigration system. We analyze ICE arrest records from October 2022 to March 2026 and test whether the method of apprehension, especially the Criminal Alien Program (CAP) versus community enforcement, predicts deportation outcomes above and beyond the official threat score and criminality classification. The findings show that pipeline assignment is not a passive descriptor of how a case was found. It is an independent predictor of case outcome, an upstream determinant of official classification, and a mechanism through which administrative infrastructure produces unequal enforcement intensity.
The analysis proceeds in two stages. Phase 1 begins with a spatial audit testing whether general public-service administrative visibility, measured through Federally Qualified Health Center density, predicts ICE enforcement intensity. After the data analysis, that hypothesis is rejected. Public health infrastructure does not explain enforcement patterns; law-enforcement data infrastructure does (see secondary analysis). Phase 2 therefore shifts to individual ICE arrest records and examines whether the enforcement pipeline itself structures classification and outcomes. Across models, individuals processed through community enforcement have substantially lower odds of deportation than otherwise comparable individuals processed through CAP. This relationship persists after controlling for threat level, criminality classification, year, gender, and geographic jurisdiction. At the same official threat level, CAP cases predict an 82% deportation probability compared with 63% for community enforcement. Mediation analysis shows that only 6.7% of the pipeline effect operates through the official threat score, meaning that most of the effect is independent of the score and upstream of it.
This analysis does not claim to prove individual-level causal discrimination, because pipeline assignment is not random and unobserved case severity may remain. Instead, a robust form of algorithmic/data bias is identified: the dataset's official classifications do not merely measure legal severity; they partially encode the administrative pipeline that generated the case. The ICEberg Effect names this phenomenon: what appears in the dataset as individual risk is partly the visible surface of a deeper institutional data infrastructure.
From Individual Risk to Pipeline Risk
Administrative datasets are often treated as factual records of events: a person was arrested, assigned a legal category, given a threat score, and eventually deported or not deported. In this view, the data describe reality, a threat score represents threat, a criminality classification represents criminality; and an apprehension method is merely a procedural detail. If a model predicts deportation from these variables, it appears to be learning from neutral administrative facts.
This project challenges that assumption. It asks whether ICE enforcement data reflect only the legal characteristics of individuals, or whether the data also encode the administrative pipeline through which those individuals became visible to ICE. The distinction matters for algorithmic bias, as it does not need to originate inside a machine-learning model. A model trained on administrative data can inherit distortions already embedded in the data-generating process. If the pipeline that produces the data also shapes labels, scores, and outcomes, then downstream analysis will mistake institutional processes for individual risk.
The central argument of this analysis is that ICE enforcement data contains this kind of pipeline bias. The key variable is apprehension_method: how the person entered ICE's enforcement system. We distinguish between cases generated through the Criminal Alien Program (CAP), which screens jail and prison databases; Community Enforcement, which captures arrests without prior criminal-database contact; 287(g), which reflects local police database participation; and other enforcement channels.
The central question is therefore:
Does the enforcement pipeline independently predict deportation outcomes after controlling for the official threat score and legal classifications?
Indeed, CAP cases are much more likely to end in deportation than community enforcement cases, even at the same threat level and within the same criminality labels. The pipeline also predicts assignment into institutional enforcement programs and appears to contaminate the threat score itself. This means that the dataset's official variables do not fully absorb the pipeline effect. The pipeline is not downstream of risk; it is part of how risk is constructed.
This is what we call the ICEberg Effect. The observed deportation outcome is the visible tip. Beneath it is a data infrastructure, such as jail databases, police databases, community enforcement practices, classification systems, and policy shifts, which determine who becomes visible, how they are categorized, and how their cases move through the system.
Theoretical Framework: Data Bias Before the Model
The project is located within the study of algorithmic bias, but it does not require ICE to be using a formal predictive algorithm. The relevant "algorithm" is broader: an administrative decision pipeline that takes inputs, sorts cases into categories, assigns scores, and produces outcomes. In many public-sector systems, the data pipeline itself performs algorithmic work before any statistical model is built.
This framing follows work on construct validity and measurement bias. Obermeyer et al. show that biased outcomes can arise when a proxy variable does not measure the construct it claims to measure. In their case, medical cost was used as a proxy for health need, producing racial bias because cost reflected unequal access to care. The same logic applies here. ICE's threat score and criminality variables appear to measure legal severity, but our evidence suggests they also reflect administrative pathways. The measured variable is not the latent construct. It is a contaminated proxy.
The project also follows the logic of administrative legibility. States act through categories, registries, and databases. Those categories do not merely describe populations; they make populations governable. CAP is a particularly important example because it connects immigration enforcement to jail and prison databases. A person processed through CAP is not simply "more criminal" in the abstract; they are made visible through a criminal-justice data infrastructure that is already institutionally connected to removal.
This matters because bias can enter through multiple stages:
- Visibility bias: who becomes visible to ICE through a database pipeline.
- Measurement bias: how the person is classified once visible.
- Priority bias: how the person is assigned a threat score or enforcement program.
- Outcome bias: how likely the case is to end in formal deportation.
The empirical question is whether these stages are independent or whether the pipeline links them together. If pipeline assignment predicts classification, score, program assignment, and outcome, then the dataset is not simply recording legal severity. It is recording a chain of administrative transformations.
Data and Variables
The main analysis uses ICE arrest records covering October 2022 through March 2026. The dataset contains 713,464 individual arrest records. The key variables are apprehension method, apprehension criminality, case threat level, case status, citizenship country, gender, apprehension Area of Responsibility (AOR), apprehension date, and birth year.
The dependent variable is a binary deportation outcome. A case is coded as deported if the case status indicates formal removal, exclusion, or deportation. This is a conservative outcome definition because it does not initially include voluntary departure. A later robustness check expands the outcome definition to include voluntary departure.
The main independent variable is pipeline category, derived from apprehension method. We classify cases into four main categories:
- CAP (Criminal Alien Program): cases generated through jail or prison database screening.
- Community enforcement: cases without prior criminal-database contact.
- 287(g): cases connected to local police database cooperation.
- Other: border, fugitive, worksite, and miscellaneous enforcement pathways.
CAP is used as the reference category because it is the pipeline most directly tied to criminal-justice data infrastructure and produces the highest deportation intensity.
The key control variables are:
- Threat level: official ICE priority score.
- Criminality classification: official case criminality label.
- Year fixed effects: to account for changing enforcement policy over time.
- Gender: to account for demographic differences.
- AOR fixed effects: to account for geographic enforcement differences across ICE field offices.
A major conceptual distinction is necessary: 287(g) appears in two different ways across the project. In Phase 1, 287(g) is a state-level policy control. In Phase 2, 287(g) is an individual-level pipeline category derived from apprehension_method. These are not the same variable. The Phase 2 models do not include a state-level 287(g) policy variable because the goal is to estimate the predictive role of individual pipeline assignment, not to model state policy effects.
The Rejected Public-Health Visibility Hypothesis
The question originally asked was whether general administrative accessibility makes communities more visible to ICE. Specifically, whether states with denser Federally Qualified Health Center (FQHC) infrastructure also show higher ICE arrest or detainer intensity. The theoretical intuition was that public-service infrastructure might increase state visibility and therefore enforcement exposure.
To test this, we constructed an Administrative Accessibility Index using the log of FQHC sites per 100,000 people, standardized across states. We combined HRSA health center data, ICE state-level arrests and detainers, ACS demographic controls, and policy controls including sanctuary and 287(g) status. The models used OLS with HC3 robust standard errors and permutation placebo tests.
The result was not supportive. FQHC density did not positively predict arrests. For detainers, the association was negative rather than positive. By contrast, 287(g) policy was a dominant predictor of both arrests and detainers. This rejected the original hypothesis and redirected the project.
The conclusion from Phase 1 is important because it prevents the analysis from becoming a vague "administrative visibility" story. The evidence does not support the claim that any administrative infrastructure increases ICE enforcement. Instead, the relevant infrastructure is specifically law-enforcement data infrastructure. That is why the project turns to the individual-level ICE dataset and asks whether the enforcement pipeline itself predicts outcomes.
See the Secondary Analysis for the full analysis and figures.
Pipeline Bias in Individual ICE Records
Phase 2 is the core of the project. It tests whether the apprehension_method predicts deportation outcomes above and beyond the official variables that supposedly capture case severity. If threat level and criminality fully captured legal risk, then apprehension_method should add little information once those variables are controlled. If pipeline still matters, then the dataset is encoding something more than legal severity.
5.1 Regression: Does Pipeline Add Predictive Power Beyond Threat?
The first model follows the most direct and impactful test: regress case outcome on threat level, then add apprehension method.
The baseline model estimates:
Deported ~ Threat Level
Threat level significantly predicts deportation. This is expected: official priority scores are related to outcomes. However, the model explains very little variance by itself. The pseudo R² is approximately 0.020, meaning that the official score alone captures only a small part of the outcome-generating process.
The second model estimates:
Deported ~ Threat Level + Apprehension Method
When pipeline is added, model fit improves substantially. The pseudo R² rises from 0.020 to 0.045, and the likelihood-ratio test is extremely significant. Community enforcement has an odds ratio of approximately 0.376 relative to CAP, meaning that community cases have much lower odds of deportation than CAP cases even after controlling for threat level.
The predicted probabilities make the result intuitive. At the same average threat level, CAP cases predict an 82% deportation probability, while community enforcement cases predict 63%. This is an 18.8 percentage-point difference at identical official score.
This figure is the cleanest response to the claim that the threat score already explains deportation. It does not. The pipeline adds independent predictive information.
Interpretation: The official score matters, but it is not enough. Once apprehension method is added, the model becomes substantially more predictive. This demonstrates incremental predictive validity: pipeline assignment contains information about deportation outcomes that is not contained in threat level.
5.2 The Regression Chain: Threat, Pipeline, Controls, and Mediation
The next figure extends the argument by decomposing the relationship across four models.
- Model A: threat only.
- Model B: pipeline only.
- Model C: threat + pipeline.
- Model D: threat + pipeline + criminality + year fixed effects.
The key result is that pipeline alone explains more variance than threat alone. In the summary, Model A has pseudo R² = 0.020, while Model B has pseudo R² = 0.030. This means the administrative pipeline is more predictive of deportation than the official threat score when each is considered separately.
Model C then shows that pipeline remains significant after controlling for threat. The community enforcement odds ratio remains far below one. Model D adds criminality and year fixed effects. The effect attenuates, as expected, but remains large and significant.
The mediation decomposition is especially important. If the pipeline effect operated mainly through the threat score, then controlling for threat should substantially reduce the pipeline coefficient. It does not. Only 6.7% of the pipeline effect is mediated by threat score. In other words, 93.3% of the effect remains independent of the official score.
This is a major finding. It means the bias is not simply that CAP cases receive higher threat scores and therefore get deported more. Rather, the pipeline predicts deportation mostly outside the threat-score mechanism.
Interpretation: The pipeline is not just correlated with the score. It operates upstream and independently. The official score does not absorb the administrative pipeline effect.
Same Legal Label, Different Pipeline, Different Fate
The next analysis asks whether the gap persists within identical criminality classifications. This is crucial because a simple objection would be that CAP cases are more likely to involve serious criminal conduct. If that were the entire explanation, then within the same criminality label, pipeline gaps should disappear or become negligible. They do not.
The figure compares deportation rates across pipeline categories within the same official criminality labels: convicted criminal, pending charges, and immigration violator/no criminal record. CAP cases have higher deportation rates than community cases across categories. The result is especially important for immigration violators with no criminal record: CAP cases are deported at 35.6%, while community cases are deported at 19.5%, a ratio of 1.83x.
This does not prove causality, but it sharply narrows the alternative explanations. The gap is not only because CAP contains more convicted criminals. The gap also appears among people officially classified as having no criminal record.
The figure also includes hollow upper-bound bars for unresolved cases, which address right-censoring. If all unresolved community cases eventually resulted in deportation, some gaps would shrink. But the persistence of the gap in key categories shows that unresolved cases alone cannot explain the pattern.
Interpretation: Official labels do not fully determine outcomes. The same label has different enforcement consequences depending on the database pipeline that produced the case.
Summary Table
| Criminality Label | Pipeline | N | Deportation Rate | Ratio vs Community |
|---|---|---|---|---|
| Convicted Criminal | CAP Database | 131,284 | 82.1% | 1.36× |
| Convicted Criminal | Community | 43,526 | 60.6% | - |
| Pending Charges | CAP Database | 68,810 | 58.8% | 1.32× |
| Pending Charges | Community | 33,939 | 44.7% | - |
| Immigration Violator | CAP Database | 13,856 | 35.6% | 1.83× |
| Immigration Violator | Community | 197,405 | 19.5% | - |
Nationality and Structured Pipeline Assignment
The next question is whether pipeline assignment itself is structured. If pipeline were random conditional on legal severity, we would not expect large systematic variation by nationality. Yet the data show substantial variation.
CAP rates vary 3.8x across nationalities, from 10.8% for Peru to 40.7% for Mexico. This means nationality is associated not only with outcomes, but also with the administrative pipeline through which people are processed.
This matters because pipeline assignment is upstream of classification and outcome. If nationality predicts pipeline, and pipeline predicts outcome, then nationality can influence deportation risk indirectly through administrative routing. This is a mechanism of structural bias, not necessarily intentional discrimination. The system can produce unequal outcomes through patterned exposure to different data infrastructures.
Interpretation: Pipeline assignment is not evenly distributed. Nationality is associated with different exposure to CAP, community enforcement, and 287(g). This supports a structural rather than random interpretation of pipeline bias.
Geographic Robustness: The Gap Persists Within Field Offices
A major alternative explanation is geography. Perhaps CAP cases are concentrated in field offices with higher deportation rates, while community cases are concentrated elsewhere. If so, the pipeline gap could simply reflect regional enforcement differences rather than pipeline effects.
The within-AOR analysis addresses this. The figure compares CAP and community deportation rates within major ICE field offices. The gap persists in every major AOR shown. The magnitude varies, but the direction does not. CAP has higher deportation rates than community enforcement within each field office.
This is an important robustness check because it rules out a simple geographic confound. The pipeline effect is not merely a Texas effect, a border effect, or a regional enforcement artifact. It appears within jurisdictions.
Interpretation: Geographic enforcement context matters, but it does not explain away the pipeline gap. The pattern survives within field offices.
Pipeline Locks Cases Into Institutional Tracks
The next figure examines not just final deportation outcomes, but intermediate institutional routing. CAP does not only identify people; it appears to place them into a different enforcement track.
The figure shows that 90.4% of CAP arrests are assigned to the ERO Criminal Alien Program, compared with only 32.9% of community arrests. This means that once a person enters through CAP, they are much more likely to be routed into a deportation-oriented institutional program.
This is critical for the theoretical argument. The pipeline is not merely a label in the dataset. It is an institutional channel that structures what happens next. The pathway itself creates enforcement momentum.
Interpretation: Pipeline assignment is consequential because it routes cases into different institutional programs. CAP is not just detection; it is a deportation track.
Threat Score Contamination: The Score Measures the Pipeline
If threat score were a neutral measure of legal severity, then within the same criminality label, threat-level assignment should be similar across pipelines. However, the data shows otherwise.
Within the same criminality labels, CAP assigns the highest threat level at much higher rates. Among convicted criminals, CAP assigns Level 1 in approximately 51% of cases, compared with 26% for 287(g). This is nearly a twofold difference within the same broad criminality label.
This does not automatically mean that the score is invalid; there may be unobserved severity differences inside each broad label. But it does show that the official score is not independent of pipeline. The score partly reflects how the person was found.
This is one of the strongest pieces of evidence for measurement bias. The official variable used to represent threat is itself patterned by administrative origin. Therefore, controlling for threat may not fully control for legal severity, and it may also control for part of the pipeline mechanism.
Interpretation: The threat score is not a clean independent control. It is partly endogenous to pipeline. This is precisely the kind of construct-validity problem that algorithmic bias research warns about.
Robustness Across Model Specifications
The robustness forest plot summarizes seven model variants. Across all specifications, community enforcement has lower odds of deportation than CAP. The odds ratio ranges from 0.357 to 0.592, and all p-values remain extremely small.
The models include:
- threat + pipeline,
- threat + pipeline + gender,
- threat + pipeline + criminality + year fixed effects,
- the same with gender,
- AOR fixed effects,
- resolved-only cases,
- expanded outcome including voluntary departure.
The most important point is not the exact odds ratio in any one model. The important point is stability. The effect attenuates under some controls, especially when the sample is restricted to resolved cases, but it never disappears and never reverses.
Interpretation: The core finding is not model-dependent. It survives changes in specification, sample, geography, and outcome definition.
Policy-Responsive Pipeline Bias
The pipeline gap is not a fixed natural property of cases. It responds to enforcement priorities. From 2022 to 2025, CAP deportation rates stay high (61% rising to 74%), while community-enforcement deportation rates climb sharply (8% to 43%) once CAP-style criteria are expanded across the system.
The CAP : community ratio collapses from 7.49x in 2022 to 1.72x in 2025. If the gap were an intrinsic property of more "deportable" cases, we would expect a stable ratio. Instead the ratio moves with the policy regime, which is consistent with the gap being institutional rather than purely individual.
Interpretation: The pipeline gap fluctuates with enforcement priorities, suggesting that the disparity is produced by institutional policy and practice, not only by case composition.
Data Integrity: Criminality Reclassification Is Almost Absent
The data-integrity analysis examines whether criminality classifications change between arrest and case resolution. In theory, some reclassification should occur as cases are reviewed, corrected, or updated. Instead, 98.3% of records show identical classification at both stages.
This could mean that initial classifications are highly accurate, but it could also indicate that labels are carried forward with little independent review. In either case, it raises a data-quality issue. If the same classification is propagated through the case lifecycle, then early pipeline-based classification errors or biases may persist into final records.
This figure should be framed carefully. It is not proof of falsification. It is a data-integrity warning. The unusual stability of labels strengthens the concern that the dataset may preserve administrative classifications rather than independently verified legal truth.
Interpretation: The dataset shows unusually high label stability. This is relevant because pipeline bias introduced at arrest may be preserved through resolution.
Outcome Sensitivity: Including Voluntary Departure
A final concern is outcome coding. If formal deportation is too narrow, the analysis might undercount enforcement consequences. The extended outcome figure adds voluntary departure to formal deportation.
The result is that the gap persists. Community enforcement still shows lower enforcement intensity even when voluntary departure is included. This means the finding is not an artifact of a narrow outcome definition.
Interpretation: The pipeline gap is robust to a broader definition of enforcement outcome.
What the Evidence shows
The evidence strongly supports the claim that ICE enforcement data contain algorithmic/data bias in the form of pipeline bias. The core empirical facts are:
- Pipeline predicts deportation beyond threat score.
- Pipeline alone predicts outcomes better than threat alone.
- Pipeline effects persist after criminality, year, gender, and geography controls.
- Same legal labels produce different outcomes across pipelines.
- Threat score assignment itself varies by pipeline.
- Pipeline assignment varies by nationality.
- Pipeline routes people into different institutional programs.
- The effect is robust across specifications and outcome definitions.
Together, these facts show that the apprehension_method is not a neutral procedural variable. It is part of the system that structures measurement and outcome.
However, the evidence does not prove a clean causal claim. That would require random assignment, a valid natural experiment, an instrumental variable, or stronger quasi-experimental identification: Pipeline assignment is not random. CAP may capture cases with unobserved severity that broad criminality labels and threat scores do not fully measure. Prior deportation orders, detailed criminal histories, legal representation, detention access, and court backlog are not fully observed.
The resulting claim is therefore more precise and more defensible:
The data show robust evidence of pipeline-based measurement and outcome disparity. The enforcement pipeline independently predicts deportation and appears to contaminate official risk classifications. This is evidence of algorithmic/data bias in the administrative data-generating process, though not definitive proof of individual-level causal discrimination.
For an algorithmic bias audit, causal proof is not the only relevant standard. A biased dataset is one where measured variables systematically encode institutional processes that are not equivalent to the constructs they claim to measure.
Why This Is Algorithmic and Data Bias
The strongest framing is not that ICE used a biased machine-learning model. The stronger and more accurate argument is that the dataset itself is biased before any model is trained.
The pipeline behaves algorithmically because it performs a sequence of sorting operations:
- A person becomes visible through a database or enforcement channel.
- The person receives a criminality classification.
- The person receives a threat score.
- The person is assigned to an enforcement program.
- The case ends in deportation, voluntary departure, unresolved status, or another outcome.
Each stage depends partly on the previous stage. CAP does not simply observe risk; it creates a different administrative trajectory. Once a person enters through CAP, they are more likely to receive high threat classification, more likely to be routed into a deportation-oriented program, and more likely to be formally deported.
This is a form of bias because the measured variables do not cleanly represent the underlying constructs. "Threat" partly measures pipeline. "Criminality" may partly preserve initial administrative labeling. "Outcome" partly reflects institutional track. The dataset therefore converts pipeline exposure into apparent individual risk.
This is why the project belongs in a study on bias in AI. Many AI systems are trained on administrative data. If the training data already encode pipeline effects, then a downstream model can reproduce and legitimize those effects while appearing objective. The problem is not only biased prediction but also biased measurement.
Limitations
17.1 No causal identification
The pipeline assignment is not random, so causal inference is limited. CAP may identify cases that differ in unobserved ways from community enforcement cases. The analysis reduces this concern through controls and within-label comparisons, but it cannot eliminate it.
17.2 Unobserved case characteristics
The dataset that was accessed does not fully capture prior deportation orders, detailed criminal history, legal representation, court backlog, facility practices, detention status, or officer discretion. These factors could influence both pipeline assignment and deportation outcome.
17.3 Threat score endogeneity
Threat score is both a control variable and a potentially contaminated outcome of the pipeline. Controlling for it is useful for showing that pipeline adds predictive power beyond official scoring, but it may also control for part of the pipeline mechanism. This is why the mediation analysis is important but should be interpreted carefully.
17.4 Data reliability
The extremely high stability of criminality labels raises concerns about whether classifications are independently reviewed or simply carried forward. This does not invalidate the analysis, but it supports the broader claim that administrative data should not be treated as neutral ground truth.
17.5 Selection into arrest
The dataset contains only people arrested by ICE. It does not include people considered but not arrested, people never made visible to ICE, or people exposed to enforcement but not recorded in these data. Therefore, the analysis describes bias within recorded enforcement data, not the full population at risk.
17.6 Policy confounding
The period includes policy changes that affect CAP and community enforcement intensity. Year fixed effects address this partially, and temporal analysis makes the policy responsiveness visible. But the study cannot fully separate structural pipeline effects from shifting political enforcement priorities.
Conclusion
This project began by asking whether administrative visibility predicts immigration enforcement. The first answer was no: public health infrastructure does not explain ICE enforcement intensity. The initial result, led to a stronger finding. The relevant infrastructure is not general administration; it is law-enforcement data infrastructure.
At the individual level, the enforcement pipeline strongly predicts deportation outcomes. CAP cases are more likely to be deported than community enforcement cases even at the same threat level, within the same criminality categories, and within the same field offices. The pipeline also predicts institutional program assignment and appears to shape the official threat score itself. Across seven specifications and multiple robustness checks, the result persists.
The project's main contribution is therefore not a simple claim that one group has higher deportation rates than another. The contribution is to show that ICE administrative data contain a pipeline effect: the pathway by which a person becomes visible to the state is embedded into the variables that later appear to measure individual risk.
The ICEberg Effect names this structure. The deportation outcome is only the visible tip. Beneath it is a layered system of databases, enforcement pathways, classifications, scores, and institutional tracks. The dataset records the final surface of this system while presenting it as individual legal status.
The central conclusion is:
ICE enforcement data are not a neutral record of legal severity. They are a record shaped by the administrative pipeline that produced them. The pipeline does not merely observe risk; it helps construct the measurable form that risk takes.
Bias does not begin when a model is trained. Bias can already be present in the dataset, in the labels, in the proxies, and in the institutional pipeline that decides who appears in the data and how they are classified. This project demonstrates that structure in ICE enforcement data.