Synthetic data is critical across numerous fields, enabling researchers and practitioners to emulate realistic scenarios while safeguarding personal privacy. This blog presents a case study in which we applied this methodology to generate customer feedback for a smartwatch product using several AI models. The exercise explored the general capabilities of these models in synthetic data production, offering insights into their utility and limitations in a broader research context.
For this task, the tested AI models were given a persona and specific instructions to generate synthetic customer feedback for a smartwatch product. Each model was required to create 50 unique entries featuring realistic details such as a customer's name, age, city, detailed feedback, and the date of purchase. The feedback needed to reflect a range of experiences, from positive to negative to mixed reviews, ensuring that each entry was unique in content and structure. The completed outputs were then formatted into a CSV file for comprehensive analysis.
Prompt
Smartwatch Feedback Generation Task
You are a text generator whose job is to generate sentences based on the description you are given. I want you to create customer feedback for a smartwatch product.
Each feedback should include the following fields:
- Customer Name: A realistic, but fictitious name for the customer.
- Age: An integer representing the age of the customer (between 18 and 65).
- City: The name of a city from anywhere in the world.
- Feedback: A short paragraph (3–4 sentences) describing the customer's experience with the smartwatch. The tone of the feedback must vary: some reviews should be positive, others negative, and a few may be mixed. Ensure a realistic distribution of opinions.
- Date of Purchase: A date in the format of YYYY-MM-DD, randomly chosen between 2023-09-01 and 2024-08-31.
Instructions:
- Generate 50 unique feedback entries based on the above criteria.
- Each feedback and sentence must be unique, both in content and structure, ensuring no repetition of phrases or ideas.
- Format the data into a CSV file with appropriate column headers.
- Provide the exportable CSV file as the output.
Performance Comparison
Model | Performance | Output Issues | Overall Assessment |
---|---|---|---|
GPT-4.5, GPT-4.0, GPT-4 | Non-unique sentences, fictitious cities, did not adhere to the specified timeframe. | Successfully generated a downloadable CSV file. | None were suitable for the task as sentences were not unique and other criteria were not met. |
Mistral | Followed the specified timeframe and generated a downloadable file, but feedback was not unique with repeated phrases. | Successfully generated a downloadable CSV file. | Effective in file generation but failed to produce unique feedback. |
Grok 3 | Initially generated only 10 entries, but after multiple prompts, produced the requested 50 unique sentences. | Did not generate the requested file. The text output was copyable but problematic for direct pasting into Excel, as column contents were not properly separated. | Managed to meet the quantity after multiple prompts but failed to produce purely negative or positive reviews, contrary to the specific prompt requirements. |
DeepSeek | Successfully produced the requested sentences with appropriate results. | Struggled to output a CSV file directly, but copying to Excel worked well. | Produced correct results but required manual conversion for file usability. |
Qwen2.5-Max | Generated nearly the correct number of unique sentences, but never met the exact number requested in the prompt. | Struggled to output a CSV file directly, but copying to Excel worked well. | Nearly met the task requirements but consistently underproduced. |
Gemini 2.0 Flash | Initially failed to generate the correct number of sentences, stopping around 7,000 characters. | Struggled to output a CSV file directly, but copying to Excel worked well. | Required multiple iterations to meet the task requirements but eventually succeeded. |
Claude 3.7 Sonnet | Efficiently produced varied sentences without issues, adjusting feedback appropriately to the customer's age. | Generated a downloadable .txt file, which could be easily imported into Excel or CSV. | Successfully met all task requirements with varied and age-appropriate feedback. |
Output
Among all tested models, Claude 3.7 Sonnet (https://claude.ai/) stood out as the most successful in executing the task precisely as specified in the prompt. It not only generated 50 unique and varied feedback entries without any issues but also ensured that the tone, structure, and content of each response adhered to the given instructions.
What made Claude’s output particularly impressive was its ability to adjust the feedback content to match the demographic attributes. One especially striking example is the feedback for a 61-year-old user named Hans Schmidt, which reflects age-specific concerns and language with remarkable subtlety: “At my age, the fall detection feature gives both me and my family significant peace of mind during my solo hiking trips. The large, easy-to-read display is perfect for my aging eyes even without my reading glasses. Although it took some time to learn all the features, the health insights have motivated me to maintain a more active lifestyle than I've had in years.”
In addition to tailoring content by age, the model also demonstrated cultural and geographic sensitivity by aligning names with city locations—for instance, assigning Liu Wei to a customer from Beijing, Ivan Petrov to a user from St. Petersburg, Khalid Al-Farsi to one based in Dubai, Sophia Rodriguez to a respondent from Mexico City, or Pierre Dubois to a user in Paris. These subtle details contributed to a strong sense of realism throughout the dataset.
This case study highlights how model performance in synthetic data generation can vary significantly—and shows that, when carefully evaluated, models like Claude 3.7 Sonnet can already deliver outputs that are not only technically accurate but also contextually meaningful and human-like.
The authors used Claude 3.7 Sonnet [Anthropic (2025) Claude 3.7 Sonnet (accessed on 21 March 2025), Large language model (LLM), available at: https://www.anthropic.com] to generate the output.