Exploring Temperature Settings in Creative Writing: A Haiku Generation Case Study with OpenAI API

Exploring Temperature Settings in Creative Writing: A Haiku Generation Case Study with OpenAI API
Photo by Katharina Bill on Unsplash

Every time you prompt a large language model, you're witnessing the result of thousands of sequential sampling decisions—each token drawn from a probability distribution over the model's vocabulary. In standard chat interfaces, these distributions are sampled with fixed settings that users cannot adjust. But API access changes this: as we introduced in an earlier post, the temperature parameter gives direct control over how the model samples from these distributions, determining whether it selects the highest-probability token or explores lower-ranked alternatives.

In this post, we demonstrate how this control translates into meaningful differences for creative writing. Using haiku generation, we systematically vary temperature from 0 (fully deterministic) to 2 (highly exploratory) and examine not just the resulting poems, but the token-level probability data that reveals why the model made each choice. The haiku form is ideal for this experiment: its rigid 5-7-5 structure provides a constant against which we can observe how temperature affects word selection, imagery, and creative risk-taking. The underlying log probability data – explored in more detail in a recent post – shows us exactly which alternatives the model considered and rejected at each step.

Prerequisites and Setup

To replicate this experiment, you will need:

  • An OpenAI API key (stored securely in your environment)
  • Python 3.7 or higher
  • The OpenAI Python library (pip install openai)
  • Basic familiarity with API calls and token-level model behavior

For this demonstration, we used Google Colab to store the API key securely via userdata, though any secure credential management approach will work.

Understanding Temperature: A Brief Overview

For a comprehensive discussion of temperature and other LLM parameters, see this earlier post on the topic, but to give a brief overview:

Temperature controls the randomness of token selection during text generation. At temperature 0, the model always selects the most probable next token, resulting in deterministic output. As temperature increases, the model samples from a broader distribution of possible tokens, weighted by their probabilities. At temperature 2, even relatively unlikely tokens have a chance of being selected, leading to more creative—but potentially less coherent—results.

Code Implementation

The attached code demonstrates how to generate haikus at different temperature settings while capturing token-level probabilities (logprobs) to understand the model's decision-making process.

The code structure is straightforward: after initializing the OpenAI client and defining a stable prompt ("Write a haiku about cows"), we make an API call with logprobs=True to retrieve not only the generated text but also the probability distributions that informed each token selection. The top_logprobs=5 parameter requests the top five alternative tokens at each step, filtered by a minimum probability threshold of 0.05 to focus on meaningful alternatives.

Experimental Design

We tested four temperature settings—0, 0.2, 1, and 2—generating five haikus at each level to observe both consistency and variation. The same prompt was used across all experiments to ensure comparability.

Results: Temperature 0 (Deterministic Output)

At temperature 0, the model produces identical output across all five samples:

Cows graze in still fields, Warm breath fogs the morning air— Bells hum, day begins.

Token-Level Probability Analysis (Temperature 0)

The logprobs data reveals how the model selects tokens at temperature 0. The complete output can be viewed here:

Key observations from the probability distributions:

Opening token selection:

000 token='C'
  logp=-0.9520 p=0.385973
  top alternatives (p > 0.05):
   01.     'C' logp= -0.9520 p=0.385973
   02.  'Morning' logp= -1.2020 p=0.300596
   03.     'D' logp= -2.4520 p=0.086122
   04.   'Quiet' logp= -2.4520 p=0.086122
   05.    'Gent' logp= -2.7020 p=0.067072

The first token 'C' (starting "Cows") had a probability of only 38.6%, yet, with temperature set to 0, the model consistently chose it because it was the highest-probability option. Alternative openings were 'Morning' (30.1%) and 'Quiet' (8.6%).

Critical word choice:

005 token=' still'
  logp=-0.6909 p=0.501127
  top alternatives (p > 0.05):
   01.   ' still' logp= -0.6909 p=0.501127
   02.   ' soft' logp= -0.9409 p=0.390278

Similarly to the starting word, several tokens had close competing alternatives. The word 'still', for example, in "still fields" had a probability of just 50.1%—barely preferred over 'soft' (39.0%). Other marginal choices were: 'fog' with probability 58.4% and main alternative 'clouds' , and 'day' with 54.6% probability and main alternative 'grass' . At temperature 0, even these marginal probability differences result in consistent selection.

High-certainty selections:

003 token='ze'
  logp=0.0000 p=1.000000
  top alternatives (p > 0.05):
   01.     'ze' logp= 0.0000 p=1.000000

Some tokens show near-perfect certainty. Once the model has committed to "gra" (from "graze"), the completion "ze" is virtually guaranteed (p=1.000000).

The full logprobs reveal that while multiple plausible paths existed, the model always followed the single highest-probability trajectory at temperature 0.

Results: Temperature 0.2 (Slight Variation)

Increasing temperature to 0.2 introduces modest variation while maintaining structural coherence. The five generations where:

  1. Cows graze in soft light, / Warm breath fogs the morning air— / Meadow bells, slow hearts.
  2. Cows graze in still fields, / Warm breath fogs the morning air— / Bells hum, day begins.
  3. Cows graze in still fields, / Warm breath fogs the morning air— / Bells hum, day begins.
  4. Cows graze in still fields, / Warm breath clouds the morning air— / Bells hum, grass whispers.
  5. Cows graze in soft light, / Warm breath clouds the morning air— / Bells hum, fields at peace.

with first deviations from the most likely outcome highlighted with bold.

Here, the model occasionally selects alternatives that were visible in the temperature 0 logprobs—such as ' soft' instead of ' still', or ' clouds' instead of ' fogs'. These substitutions demonstrate that even a slight increase in temperature allows the model to explore near-probable alternatives without straying far from the original structure.

Results: Temperature 1 (Creative Exploration)

At temperature 1 – which is the default used in OpenAI's chat environment -- the outputs become noticeably more diverse:

  1. Slow breaths in green fields, / Cows hum under summer light— / Milk warms the dusk air.
  2. Cows graze in soft light, / Meadow breath and steady bells— / Milk-sweet dusk settles.
  3. Cows graze in still fields, / Warm breath fogs the morning air— / Bells hum, day begins.
  4. Cows graze in soft light, / Warm breath clouds the morning air— / Bells hum, fields reply.
  5. Cows graze in soft light, / Warm breath rises with the mist— / Meadow hums in peace.

The structural integrity of the haiku form remains intact, but the content shifts substantially. Interestingly, one sample still matches the temperature 0 output exactly, showing that even at moderate temperatures, the most probable path remains accessible. This balance likely motivates this being the preferred setting with chat-based access.

Results: Temperature 2 (High Variability)

At temperature 2, the model samples from a much broader probability distribution, resulting in the most varied outputs:

  1. Quiet cows in dusk, / Warm breath rises like soft mist— / Meadow holds their peace.
  2. Morning mist drifts low, / Cows breathe warm into cool air— / Dew clings to their hides.
  3. Cows graze in green hush, / bells stitched into summer breeze— / warm milk moons at dawn.
  4. Morning mist drifts low, / Cows chew slow, bells softly swing— / Meadow breathes in peace.
  5. Quiet pasture sighs, / Cows drift like clouds through green dawn, / Milk warms morning hands.

Token-Level Probability Analysis (Temperature 2)

We demonstrate how temperature 2 enables selection of highly improbable tokens using the last of these results:

Quiet pasture sighs, / Cows drift like clouds through green dawn, / Milk warms morning hands.

The complete token-level results are:

Key examples from the logprobs:

Opening divergence:

000 token='Quiet'
  logp=-2.4520 p=0.086122
  top alternatives (p > 0.05):
   01.     'C' logp= -0.9520 p=0.385973
   02.  'Morning' logp= -1.2020 p=0.300596
   03.     'D' logp= -2.4520 p=0.086122
   04.   'Quiet' logp= -2.4520 p=0.086122
   05.    'Gent' logp= -2.7020 p=0.067072

Starting with 'Quiet' (8.6% probability) instead of the expected 'C' (38.6%) immediately creates a different trajectory.

Low-probability word selection:

002 token=' sigh'
  logp=-5.3784 p=0.004615
  top alternatives (p > 0.05):
   01.    ' hum' logp= -0.8784 p=0.415455
   02.  ' breath' logp= -1.1284 p=0.323557
   03.   ' dawn' logp= -2.1284 p=0.119030

The verb ' sigh' in "pasture sighs" had a probability of only 0.46%, yet it was selected over far more likely alternatives: ' hum' (41.5%) and ' breath' (32.4%).

Creative noun choice:

013 token=' dawn'
  logp=-2.1571 p=0.115663
  top alternatives (p > 0.05):
   01.   ' grass' logp= -0.4071 p=0.665592
   02.   ' dawn' logp= -2.1571 p=0.115663
   03.  ' fields' logp= -2.4071 p=0.090078
   04.   ' light' logp= -2.9071 p=0.054635

Choosing ' dawn' (11.6%) over ' grass' (66.6%) creates unexpected imagery while maintaining coherence.

Unusual verb selection:

008 token=' drift'
  logp=-2.0283 p=0.131557
  top alternatives (p > 0.05):
   01.   ' chew' logp= -0.2783 p=0.757061
   02.   ' drift' logp= -2.0283 p=0.131557
   03.  ' breathe' logp= -2.7783 p=0.062143

The model selected ' drift' (13.2%) instead of the much more probable ' chew' (75.7%), producing the poetic phrase "Cows drift like clouds through green dawn."

Despite these low-probability selections, the haikus remain structurally valid and often evocative. The phrase "bells stitched into summer breeze" in sample 3 demonstrates how creative token combinations can emerge from broader sampling, even if they deviate significantly from the model's typical output.

Recommendations

The logprobs data, displayed in this experiment, confirmed the expected pattern: as temperature increases, the model samples from progressively wider probability distributions. At temperature 0, only the highest-probability token is ever selected. At temperature 0.2, near-probable alternatives occasionally appear. At temperature 1, the model explores a broader range of plausible tokens while maintaining coherence. At temperature 2, even tokens with probabilities below 1% can be selected, leading to more surprising—and sometimes more evocative—results.

Critically, in this experiment, higher temperatures did not eliminate structure. Even at temperature 2, the haiku format (5-7-5 syllable structure) was consistently preserved, and the thematic coherence around pastoral imagery remained intact. This suggests that the underlying training of the model enforces certain constraints that persist even when sampling becomes highly stochastic. Note, however, that temperature 2 setting, can result in meaningless output, as shown in our earlier experiment.

Further, while for creative tasks, large temperature values may be useful, for factual or analytical tasks, where consistency and accuracy are paramount, temperature 0 or very low values (0.1–0.3) are recommended to enhance output reliability.


The authors used GPT-5.2 [OpenAI (2025) GPT-5.2, Large language model (LLM), available at: https://openai.com] to generate the haiku outputs and logprobs data demonstrated in this post.