In our earlier posts, we explored how useful existing Custom GPTs are for academic tasks and explained how to create your own GPT from scratch. This follow-up post puts those insights into practice by introducing QuantiCheck—a Custom GPT we developed specifically to assess the methodological rigour and reproducibility of quantitative research.
How to Access QuantiCheck
Unlike general-purpose assistants, QuantiCheck follows a transparent, structured evaluation framework. It reviews manuscripts along key dimensions such as research design, measurement validity, statistical justification, and open science practices. In the following sections, we explain how the tool was built, how its evaluation logic works, and what it can (and cannot) do to support consistent, high-quality research assessment.
QuantiCheck is publicly available in the GPT Store and can be tested directly. To use it, simply upload a manuscript (PDF format recommended), and the assistant will provide a structured evaluation based on its predefined review criteria.


How QuantiCheck Works
QuantiCheck functions as a domain-adapted reviewer assistant. When a manuscript is uploaded—typically in PDF format—it reads the full content, including footnotes, appendices, and supplementary materials, and produces a formal, structured assessment of its methodological rigour. Its language is tailored to academic standards, with a tone that is critical yet constructive.
The assistant follows a predefined seven-part evaluation framework, which includes:
- Research Design and Methodology – clarity of the research question, appropriateness of the methodological approach, and justification of sample size.
- Measurement Quality – whether the manuscript demonstrates reliability and validity of its instruments.
- Statistical Rigor – appropriateness of statistical tests, assumptions checks, and corrections for multiple comparisons.
- Reporting Transparency – whether key information such as null results, effect sizes, and replication materials are clearly reported and accessible.
- Ethical Considerations – including preregistration, bias disclosure, and ethical approvals.
- Contextual and Methodological Contribution – the novelty and relevance of the study, discussion of limitations, and methodological safeguards.
- Replicability and Open Science Practices – the presence and accessibility of shared data, code, or supplementary documentation.
At the end of the review, QuantiCheck generates a summary of the manuscript’s strengths and weaknesses, followed by an overall rating of methodological rigour on a three-level scale: High, Moderate, or Low. These ratings are based not on surface features or section headings, but on the actual presence and clarity of procedures and materials necessary for replication.
The system prompt ensures that the assistant does not rely on structural cues alone, but actively checks whether the information provided is sufficient for verification or reconstruction. If no manuscript is provided, it does not proceed with the evaluation.
Testing QuantiCheck on a Peer-Reviewed Study
To demonstrate how QuantiCheck performs in real academic settings, we tested it on a peer-reviewed, D1-ranked publication from Social Science Computer Review (Sebők et al., 2025). The study presents the Babel Machine, a multilingual natural language processing pipeline developed for comparative policy topic classification. It includes a clearly defined research objective, open-source code, replication data, and detailed methodological documentation.
QuantiCheck assigned a "high" rigour rating. The assistant recognised strengths across nearly all dimensions of the framework: a robust experimental design, transparent sourcing of training data, appropriate use of evaluation metrics, and public availability of tools and replication materials. Limitations—such as the absence of inter-rater reliability tests for newly annotated data—were acknowledged, but did not compromise the overall rating. In sum: QuantiCheck confirmed that the study met high standards of methodological rigour and reproducibility, offering a balanced review that highlighted both strengths and minor gaps. Selected excerpts from the assistant’s structured output are shown below.
In its concluding section, QuantiCheck provided a concise yet well-structured summary of the manuscript’s strengths and areas for improvement. The assistant highlighted the paper’s clear research design, public access to both data and code, and detailed methodological documentation as key strengths, justifying the High rating. It also acknowledged the authors’ commitment to open science practices through platforms like HuggingFace and OSF.
As for potential limitations, the assistant noted the lack of inter-rater reliability tests for newly annotated data and the absence of alternative model comparisons. These points were not treated as disqualifying, but rather as areas for possible future enhancement.
Overall, the review demonstrated how a domain-adapted Custom GPT can deliver balanced, transparent feedback that aligns with scholarly expectations.
Click the button below to access the assistant in the GPT Store and upload a PDF for review:
Recommendations
QuantiCheck shows how Custom GPTs can move beyond generic use and provide structured, transparent support for academic assessment—when carefully designed and configured. Its predefined evaluation logic helps standardise expectations and highlight overlooked aspects of methodological rigour.
While no AI can replace expert judgement, tools like QuantiCheck can complement human reviewers by offering consistency, clarity, and reproducibility in feedback. We recommend trying it in manuscript development, internal reviews, or teaching contexts—and adapting it further if more domain-specific criteria are needed.
The authors used GPT-4o [OpenAI (2025) GPT-4o (accessed on 15 May 2025), Large language model (LLM), available at: https://openai.com] to generate the output.