Paper

Unveiling scientific articles from paper mills with provenance analysis

João Phillipe Cardenuto1,2 , Daniel Moreira2 , Anderson Rocha1

1Universidade Estadual de Campinas, Campinas, São Paulo, Brazil, 2Loyola University Chicago, Chicago, Illinois, United States of America

Code Data DOI:10.1371/journal.pone.0312666
Image Provenance Analysis
Image Provenance Analysis

Abstract

The increasing prevalence of fake publications created by paper mills poses a significant challenge to maintaining scientific integrity. While integrity analysts typically rely on textual and visual clues to identify fake articles, determining which papers merit further investigation can be akin to searching for a needle in a haystack. To address this challenge, we developed a new methodology for provenance analysis, which automatically tracks and groups suspicious figures and documents. Our approach groups manuscripts from the same paper mill by analyzing their figures and identifying duplicated and manipulated regions. These regions are linked and organized in a provenance graph, providing evidence of systematic production. We tested our solution on a paper mill dataset of hundreds of documents and on an extended version containing thousands of distractor articles. Our approach successfully identified and linked systematically produced articles, offering a promising tool to support scientific integrity efforts.

For further details, please refer to the full publication in PLOS ONE.


Method Overview

Our method consists of two main stages:

  1. Filtering & Evidence Collection: Figures are extracted from PDF documents, split into panels, and described using deep-learning features.
  2. Provenance Analysis: The method calculates content-sharing scores between panels and builds provenance graphs that reveal systematic image reuse.

Citation

Cardenuto JP, Moreira D, Rocha A (2024) Unveiling scientific articles from paper mills with provenance analysis. PLOS ONE, 19(2), e0312666. DOI: 10.1371/journal.pone.0312666
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
@article{Cardenuto2024,
  title = {Unveiling scientific articles from paper mills with provenance analysis},
  volume = {19},
  ISSN = {1932-6203},
  url = {http://dx.doi.org/10.1371/journal.pone.0312666},
  DOI = {10.1371/journal.pone.0312666},
  number = {10},
  journal = {PLOS ONE},
  publisher = {Public Library of Science (PLoS)},
  author = {Cardenuto,  João Phillipe and Moreira,  Daniel and Rocha,  Anderson},
  year = {2024},
  month = oct,
  pages = {e0312666}
}