Testing a GPT Academic Reviewer for Pre-Submission Manuscript Improvement

Testing a  GPT Academic Reviewer for Pre-Submission Manuscript Improvement
Source: Unsplash - Markus Winkler

With the rapid advancement of GenI tools, researchers now have access to innovative ways to improve their manuscripts before journal submission. One such method involves using custom GPT-based reviewers to gain preliminary feedback.

To practically evaluate the usefulness of these GPT-driven reviewing tools, we tested the Scientific Paper Reviewer – ChatGPT using one of our manuscripts currently in the pre-submission stage. The paper, titled "The (worst) laws of the land: The concept of legislative basket cases and legislative backsliding in Central-Eastern Europe" (authored by Rebeka Kiss and Miklós Sebők), introduces the concept of legislative basket cases—laws that fall short of widely accepted normative standards of legislative quality across multiple dimensions.

Drawing on a multidimensional framework, the article examines emblematic legislative examples from Czechia, Hungary, and Poland, all adopted during periods of illiberal governance. These cases, including the Czech church restitution tax, the Hungarian Medical Chamber Law, and the Polish Forest Law, are analysed through four core dimensions: public policy coherence, legal-constitutional compliance, procedural integrity, and legislative stability.

The manuscript argues that such laws systematically bypass procedural safeguards, violate constitutional norms, and undermine democratic standards, contributing to democratic backsliding. While similar legislative flaws exist in liberal democracies (e.g. Spain’s “Gag Law”, Austria’s smoking ban reversal), their recurrence and severity in Central-Eastern Europe suggest a deeper, regime-related dynamic. The proposed framework offers a tool for identifying extreme cases of legislative failure and understanding their role in broader autocratisation processes.

Although the generated feedback was not exceptionally deep or detailed in every aspect, it nonetheless provided valuable and actionable suggestions. During the testing process, we also applied the custom prompts offered by the GPT instance.

Prompt

Comprehensively read and understand this document.

This initial prompt was designed to prompt the model to build a contextual understanding of the manuscript as a whole. It allowed for more coherent and relevant responses in subsequent evaluations, as the model could reference the content holistically rather than reacting only to isolated excerpts.

Act as a reviewer for this scientific manuscript I just uploaded. Write a comprehensive review with suggestions for specific experiments to improve it.

This instruction led to a relatively detailed and structured peer-review-style response. Compared to more general reviewing prompts, this one explicitly emphasized the inclusion of experimental recommendations, which significantly enhanced the practical value of the feedback.

Output

One of the most striking strengths of the ChatGPT-generated review lies in its ability to identify key areas for improvement that aligned closely with the feedback we received during the actual peer review process. The GPT-generated review provided both major and minor points of critique, reflecting the structure and depth typically found in formal peer review.

The custom GPT identified the following substantive concerns as key areas for improvement (major points) aimed at enhancing the overall rigour and impact of the manuscript.

1. Need for Quantitative and Comparative Expansion

Issue: The manuscript identifies a strong framework and uses illustrative case studies, but it lacks empirical generalizability.

Suggestion: To strengthen the manuscript’s impact and external validity, the authors should supplement the qualitative approach with quantitative operationalization.

Suggested Experiments / Analyses:

  • Develop and apply a legislative basket case index using measurable indicators (e.g., time from introduction to passage, number of amendments within 12 months, absence of consultation).
  • Collect and analyze parliamentary roll-call data to evaluate the degree of consensus/conflict per bill.
  • Utilize text-as-data methods (e.g., natural language processing) to measure clarity and coherence in legal drafting across bills.

2. Ambiguity in Causal Claims

Issue: The manuscript implies that illiberal regimes are more likely to produce legislative basket cases, but this causal relationship is not empirically demonstrated.

Suggestion: Frame these claims more clearly as hypotheses and specify causal mechanisms (e.g., weakened oversight, reduced legislative capacity, executive aggrandizement).

Suggested Experiments:

  • A cross-national time-series analysis comparing the rate of legislative basket cases before and after key regime changes (e.g., Orbán in Hungary post-2010).
  • Comparative process tracing in liberal vs. illiberal regimes to establish causal chains leading to basket-case outcomes.

3. Case Selection Logic Needs Refinement

Issue: While the four V4 countries are logical choices, the rationale for these particular laws (e.g., why these and not others) is underdeveloped.

Suggestion: Clarify selection criteria. Were these the worst offenders? The most visible? The most contested?

Suggested Improvement:

  • Include a brief typology of legislative failures within each country and justify the selected case as a “most likely” or “most illustrative” case.
  • Consider including counterexamples — well-crafted legislation from the same contexts — to test the robustness of the basket case concept.

External validity was indeed a weaker aspect of the manuscript, which we subsequently strengthened by integrating additional empirical elements. The comment on case selection was a good point, and we addressed it by elaborating the rationale and incorporating a more detailed research design.

Minor points were also provided, though these were less directly applicable or useful for developing the manuscript.

1. Terminological Suggestion:

“Basket case” may carry pejorative connotations; consider whether this term should be softened in scholarly discourse (e.g., “systemically flawed laws”).

2. Ethical Statement:

The "Not applicable" designation could be elaborated: if no human subjects were involved, say so explicitly.

In sum, custom GPT-based reviewers can serve as a useful preliminary step in the manuscript development process. While their feedback is not a substitute for expert peer review and can, at times, remain superficial, these tools offer an accessible way to identify structural and conceptual weaknesses early on. In our case, the 'Scientific Paper Reviewer – ChatGPT' provided several actionable suggestions. As such, these models can support researchers in refining their work prior to formal submission.

The authors used GPT-4.5 [OpenAI (2025), GPT-4.5 (accessed on 24 March 2025), Large language model (LLM), available at: https://openai.com] to generate the output.