Experiment Design
The 6-Experiment Matrix
Each document tested with a baseline strategy vs. a more structured one — isolating what changes when prompting complexity increases.
| Exp | Document | Strategy | Temp | Role | Result |
|---|---|---|---|---|---|
| E1 | NASA Blog | Direct | 0.4 | None | Surface positive |
| E2 | NASA Blog | Role-Based | 0.1 | Tech Analyst | Scientific, precise BEST |
| E3 | BBC Article | Direct | 0.2 | None | Over-confident positive |
| E4 | BBC Article | Combined CoT + Role | 0.4 | Tech Analyst | Framing detected BEST |
| E5 | Chain-of-Thought | 0.4 | None | Sarcasm caught BEST | |
| E6 | Role-Based | 0.7 | Creative Asst. | Expressive, hallucination risk |
Performance Comparison
Strategy Effectiveness by Content Type
Relative effectiveness (conceptual — based on output quality)
Key insight: Role-Based excels on technical texts; Chain-of-Thought wins on social media; Combined (CoT + Role) is the best all-rounder for complex, mixed-sentiment content.
Variable Analysis
Temperature Effect
Temperature vs. Output Characteristic
Failure Analysis
Failure Modes Identified
High Risk
High-Temp Hallucination
At 0.7, Creative Assistant invented details not in the source text.
↳ Lower temp for factual tasks
Medium Risk
Over-Confident Labels
Direct prompting labelled ambiguous journalistic text as confidently "positive".
↳ Use CoT + Role for nuanced texts
Low Risk
Strategy-Content Mismatch
Direct prompting on social media missed sarcasm entirely — wrong but confident.
↳ Match strategy to content type