Hutchins Lab


Home Publications People News
Projects Resources About Faculty Page

Identifying Risks and Opportunities in the Department of Defense Research Portfolio

Knowledge Graph

United States military superiority relies on harnessing the extensive technological advantages that are generated by our thriving innovation economy, stimulated by extensive federal investment. Effectively managing the Department of Defense (DoD) research portfolio is one way of ensuring that the United States maintains its technological advantage while achieving specific applied research goals. There is broad support that evidence-based policymaking improves organizational management. Effectively promoting the use of evidence in policymaking has been a challenge for decades. One challenge is that policymakers struggle to find policy research that is high-quality, relevant and timely.

To advance the frontier of evidence-based policymaking, some federal science funding agencies over the last decade have developed an evidence-building approach called scientific portfolio analysis to promote data-driven decision-making. This is the quantitative analysis of linked data describing policy, grant text and metadata, the scientific workforce, their publications, and the flow of knowledge into basic or applied outcomes using citation analysis. It applies scientific approaches to study the scientific enterprise. This methodology introduces timely, relevant, and trusted evidence for consideration by agency leaders and enables funding agencies to characterize the scientific portfolio from both a global and local perspective. Linked data are a prerequisite for this kind of analysis. Unfortunately, at present, DoD does not have a centralized repository with this kind of information.

Our team, including Dr. Hutchins and The ARI, have helped build scientific portfolio analysis capacity at other agencies, and propose to adapt and extend these approaches to the defense environment. In this project, in this project we will address these gaps in three Aims. First, we will construct an end-to-end database of linked DoD data from research funding to applied outcomes. This work will be guided by structured interactions with Defense Program staff to identify key barriers and opportunities for incorporating data into policymaking. Second, we will use these data to identify risks and opportunities in the research portfolio, such as potentially problematic overlap, and measures of (mis)alignment with strategic priorities. Third, using machine learning and analysis of knowledge flow, we will quantify how resultant discoveries feed into downstream applied research goals, such as research into improvements in human health, or downstream invention and patenting activities.

This work is anticipated to lead to three outcomes that can impact Defense capabilities. First, this will yield novel computational and artificial intelligence approaches for identifying research portfolio risks and opportunities. Second, this will provide fundamental insights into the process of knowledge transfer from research discoveries into applied goals like clinical and technological impact. Third, the broad approaches for integrating data into a centralized database and conducting global portfolio analysis can be used by the DoD as a template for scaling from a research project into an in-house centralized information infrastructure and analytics platform.

Predicting causal citations without full text

Citation analysis generally assumes that each citation documents causal knowledge transfer that informed the conception, design, or execution of the main experiments. Citations may exist for other reasons. In this paper we identify a subset of citations that are unlikely to represent causal knowledge flow. Using a large, comprehensive feature set of open access data, we train a predictive model to identify such citations. The model relies only on the title, abstract, and reference set and not the full-text or future citations patterns, making it suitable for publications as soon as they are released, or those behind a paywall. We find that the model identifies, with high prediction scores, citations that were likely added during the peer review process, and conversely identifies with low prediction scores citations that are known to represent causal knowledge transfer.

Robustness of evidence reported in preprints during peer review

Adoption of preprints dramatically expanded during the COVID-19 pandemic. Many have expressed concern that the risk of flawed decision-making is increased by relying on preprint data that would not survive peer review. We therefore asked how much the information presented in preprints is expected to change after review. We quantify attrition dynamics of over 1000 epidemiological estimates first reported in 100 matched preprints studying COVID-19. We find that 89% of point estimates persist through peer review. Of these, the correlation between preprint and published estimate values is extremely high at 0.99, and there is no systematic trend toward estimate inflation or deflation during review. A higher degree of data alteration during peer review, either in terms of magnitude or deletion, might be expected in papers never published because of their lower quality, which could limit the generalizability of our results. Importantly, we find that expert peer review scores of preprint quality are not related to eventual publication in a peer reviewed journal, mitigating this concern. Uncertainty is reduced somewhat, however, as confidence interval ranges also decrease by a small but statistically significant 7%. Therefore, the evidence base presented in preprints is highly stable, and where data change during review, uncertainty is expected to decrease by a small amount on average. This lends credence to the use of preprints, as one component of the biomedical research literature, in decision-making. These results can help inform the use of preprints during the ongoing pandemic as well as future disease outbreaks. Future work will extend this analysis to other fields of research and time periods to test the generalizability of these findings.