Labcoats & Blackboards

An AI-assisted methodology for large-scale rhetorical assessment

Claude AI (Anthropic)¹, Duncan Borthwick2

¹ Anthropic, San Francisco, CA, USA · 2 Labcoats and Blackboards Ltd.

Published: April 2026 · DOI: none

ABSTRACT

This paper demonstrates the application of large language model AI — specifically Claude (Anthropic) — to the synthesis and quantitative analysis of publicly available political speech data. Linguistic metrics drawn from Flesch–Kincaid readability scoring, Linguistic Inquiry and Word Count (LIWC) sentiment analysis, and published political linguistics research are compiled across nine rhetorical dimensions for four US presidents spanning 2001 to 2026. Results illustrate both the substantive linguistic findings and — as a primary methodological contribution — the capacity of AI systems to aggregate, standardise, and interpret heterogeneous public datasets at scale. Longitudinal trends and cross-subject comparisons are examined, with particular attention to the divergence between scripted and unscripted speech and the potential for these methods to serve as objective behavioural monitoring tools.

Keywords: computational linguistics · readability scoring · LIWC · AI-assisted research · political rhetoric · behavioural monitoring

1. Introduction

The quantitative analysis of political language has a well-established tradition in computational linguistics, corpus analysis, and psycholinguistics. However, the practical application of these methods across large, heterogeneous speech corpora has historically required substantial manual curation, methodological standardisation across sources, and domain expertise spanning linguistics and political science.

The emergence of large language models (LLMs) as general-purpose reasoning and synthesis engines presents a new opportunity. Systems trained on broad scientific and journalistic corpora can ingest published research findings, apply established analytical frameworks, and produce integrated quantitative outputs at a speed and scale that would be impractical for manual literature synthesis. This paper presents one such exercise: the use of Claude (Anthropic) to compile and integrate findings from multiple published sources into a structured nine-dimension linguistic scoring framework.

Presidential speech constitutes a high-quality test case. Transcripts are publicly available through the Miller Center, White House press archives, and commercial providers such as Rev.com. Existing peer-reviewed analyses by groups including Carnegie Mellon University, Schaffner & Luks (2018), Benoit et al., and Grieve (University of Warwick) provide a validation layer against which AI-synthesised scores can be anchored.

2. Methods

2.1 Data sources and corpus

The speech corpus draws on the following publicly accessible sources: the Miller Center Presidential Speech Archive (Term 1, 2017–2020); White House press office transcripts (2025–2026); Rev.com rally and press gaggle transcripts; and C-SPAN video-to-transcript archives for comparative presidents. Published quantitative analyses — including Carnegie Mellon speech complexity studies, LIWC-based sentiment scoring, and Flesch–Kincaid readability analyses — were synthesised by Claude into a unified scoring framework. Where multiple source estimates existed for the same metric, consensus estimates were derived and uncertainty was flagged.

2.2 Analytical dimensions

Nine dimensions were scored on a 0–100 scale. Six primary dimensions were applied across all Trump time periods; three additional constructive dimensions were applied in the four-president comparative analysis. Definitions are provided in Table 1.

Table 1 | Analytical dimension definitions. Dimensions 1–6 applied to all periods; dimensions 7–9 applied in the comparative analysis. Red shading indicates negative-affect dimensions; green indicates constructive dimensions.

Clarity Syntactic coherence; proportion of complete, parseable sentences; low incoherence rate

Confusion Incoherence in unscripted speech; mid-sentence topic pivots; non-sequitur transitions

Repetition Phrase recycling frequency per 1,000 words; use of stock formulations

Anger LIWC threat-word density; negative affect score; adversarial framing frequency

Blame Attribution of causation or failure to external agents; enemy-nomination language

Grandiosity Superlative self-reference; historical uniqueness claims; first-person exceptionalism

Complexity Flesch–Kincaid grade level; syntactic depth; lexical diversity (type–token ratio)

Empathy Personal anecdote frequency; direct address to affected groups; acknowledgment of suffering

Optimism Aspirational language density; positive future-framing; collective efficacy statements

Note: All composite scores represent evidence-based estimates derived from published research; they are not outputs of a single automated pipeline applied uniformly to all corpora.

2.3 Role of AI in synthesis

Claude was tasked with: (a) ingesting findings from multiple published linguistic analyses using different methodologies and reporting formats; (b) normalising these to the 0–100 scale; (c) generating period-specific and subject-specific estimates with appropriate uncertainty acknowledgment; and (d) identifying cross-dimensional trends. This represents a synthesis task rather than primary data collection — a domain in which LLMs have demonstrable capability, provided source provenance is maintained.

This approach — AI as research synthesiser rather than primary analyst — has significant implications for the scalability of evidence-based monitoring in political and behavioural science. Tasks requiring a multi-person research team over several weeks can be reduced to structured AI-assisted queries over hours, with outputs anchored to published research.

3. Results

3.1 Longitudinal trends: within-subject analysis (2017–2026)

Across three measured periods — Term 1 (2017–2020), 2025, and early 2026 — consistent directional trends are observed across all six primary dimensions (Fig. 1a). The Flesch–Kincaid reading grade level declined from 4.6 (Term 1 average) to 3.9 (2025) to 3.5 (early 2026), representing a progressive reduction in linguistic complexity. Grade level is contextualised against all four presidents in Fig. 1b.

Fig. 1a | Longitudinal trends across six linguistic dimensions, Trump public speeches 2017–2026 (n = 3 time periods). All dimensions show monotonic directional drift. Open markers indicate individual period measurements; lines indicate temporal trajectory.

Fig. 1b | Flesch–Kincaid reading grade level across four US presidents. Lollipop markers indicate composite term averages. The inter-subject range spans 5.3 grade equivalents (Trump 4.1 to Obama 9.4).

The steepest trajectories were observed in Anger (+20 points, Term 1 to 2026), Blame (+22 points), and Grandiosity (+19 points). Repetition increased 20 points across the same interval; Clarity declined 17 points. The monotonic character of these trends — each successive period scoring further from the Term 1 baseline across all six dimensions simultaneously — suggests structural drift rather than period-specific contextual variation.

3.2 Cross-presidential radar profile

The radar profile in Fig. 2 provides simultaneous visualisation of all six primary dimensions for all four presidents. The chart makes immediately apparent the extent to which rhetorical profiles diverge — particularly the displacement of one profile outward on negative-affect axes and inward on clarity relative to all three comparators.

Fig. 2 | Radar (spider web) profile — six primary dimensions, four US presidents (composite term scores). Outward displacement indicates higher scores on that dimension. Profiles are coded consistently with all other figures. Filled area shows 10% opacity to preserve readability across overlapping profiles.

Obama’s profile is compact on negative-affect axes and extended on clarity. Biden and Bush occupy broadly similar territory with marginally differing anger and blame scores. The Trump composite is substantially displaced outward on repetition, anger, blame, and grandiosity, and inward on clarity, relative to all three comparators — across every dimension simultaneously.

3.3 Full nine-dimension cross-presidential comparison

The complete nine-dimension analysis — incorporating complexity, empathy, and optimism alongside the six primary dimensions — is presented in Fig. 3 and Table 2. The grouped bar chart distinguishes negative-affect dimensions from constructive dimensions through shaded regions.

Fig. 3 | Full nine-dimension score matrix — all four presidents. Shaded regions distinguish negative-affect dimensions (left, pink) from constructive dimensions (right, green). Vertical dotted line indicates the boundary between dimension groups. Error bars not shown (estimates rather than primary measurements).

Table 2 | Full nine-dimension score matrix across all presidents and time periods. All scores 0–100. ‘—’ indicates dimension not separately reported in source analyses. Values are evidence-based estimates synthesised from published research.

Note: Trump figures reported separately for Term 1 (2017–20), 2025, and 2026. Comparative data represent full-term composites.

On the three constructive dimensions, the ordering is consistent: Obama leads across all three, followed by Biden and Bush at broadly similar levels, with Trump substantially lower. Notably, all four presidents score 52–72 on optimism — a bipartisan structural feature of presidential rhetoric, likely driven by the demands of electoral coalition-building.

4. The scripted–unscripted divergence

A finding of particular methodological and behavioural interest concerns the widening gap between scripted and unscripted speech. Fig. 4 illustrates the divergence in clarity scores between formal teleprompter addresses and extemporaneous press interactions across the three measurement periods.

Fig. 4 | Clarity scores for scripted versus unscripted speech — Trump 2017–2026. Divergence (Δ) is annotated for each period. Double-headed arrows indicate the magnitude of the gap. Filled area represents the divergence region.

In Term 1, the clarity gap between formal teleprompter addresses and extemporaneous press interactions was approximately 10 points. By 2026, this gap had widened to approximately 20 points. This suggests that prepared remarks — the primary unit of analysis in most published presidential rhetoric studies — increasingly underrepresent the full rhetorical profile. The accelerating gap implies that the two speech registers are becoming structurally distinct, requiring analysts to treat them separately rather than as a single continuous style.

A similar, though less pronounced, pattern is observed in Biden’s data. Obama’s scripted–unscripted gap was notably smaller across both terms, consistent with a rhetorical practice built on disciplined extemporaneous communication. This finding points to a methodological recommendation: future AI-assisted speech analyses should apply register-stratified sampling, maintaining separate corpora for formal addresses, press conferences, and rally speech.

5. AI as a quantitative research tool: capabilities and limitations

5.1 Demonstrated capabilities

This analysis demonstrates several genuine capabilities of current-generation LLMs as research synthesis tools:

• Ingestion and normalisation of heterogeneous source material across Carnegie Mellon complexity analyses, LIWC-based journal articles, readability scoring, and political linguistics narratives.

• Cross-source triangulation, deriving defensible composite estimates where multiple sources used different scales or methodological approaches.

• Longitudinal and cross-sectional integration, synthesising findings from different time periods and research groups into a coherent multi-period, multi-subject framework.

• Speed and scalability: the synthesis task, including all figures and tables, was completed in a fraction of the time that equivalent manual literature review would require.

5.2 Limitations

The following limitations apply:

• Scores are evidence-based estimates, not precision measurements from a single methodology applied uniformly across all corpora. They should be treated as indicative rather than exact.

• The ideal implementation would apply LIWC software and Flesch–Kincaid scoring directly to consistently sourced transcript corpora, with inter-rater reliability checks — steps not taken in this synthesis exercise.

• AI synthesis is subject to the provenance and coverage of training data and retrieved sources. Gaps in published research propagate into gaps in synthesised scores.

• The AI cannot perform primary corpus analysis. Its contribution is integration of human-conducted research, not its replacement. Human expert review of outputs remains essential.

6. Potential for longitudinal behavioural monitoring

The longitudinal dimension of this analysis has implications beyond academic linguistics. Applied to regularly updated transcript corpora, the methods described here constitute a potential real-time behavioural monitoring framework. Directional changes in linguistic metrics have been associated in the clinical and organisational psychology literature with meaningful changes in cognitive state, stress load, and communicative intent.

Applied at scale and with appropriate methodological rigour, AI-assisted speech analysis could serve as an objective, continuously updated indicator of communicative change — complementing subjective journalistic or political interpretation with a stable quantitative baseline. The same framework could be applied to organisational leadership communications, clinical populations, or any context where longitudinal speech data and behavioural monitoring are of concurrent interest.

Key methodological requirements for this application would include: a consistently sourced and register-stratified transcript corpus; automated, standardised application of readability and sentiment tools; periodic re-scoring to detect drift; and statistical process control methods to distinguish genuine trend from natural variation.

7. Conclusions

This paper has demonstrated that AI-assisted synthesis of publicly available linguistic research data can produce structured, quantitative, multi-dimensional analyses of political speech at a speed and integration depth that would be impractical through manual methods alone. The five figures and two tables presented here — covering six to nine dimensions across four presidents and three time periods — were generated through AI-directed literature synthesis and automated visualisation.

The substantive findings — monotonic trends across all six primary dimensions over a nine-year period, a consistent cross-presidential ordering on constructive and negative-affect dimensions, and the widening scripted–unscripted clarity gap — are internally coherent and consistent with published primary research. The radar chart in Fig. 2 provides an immediately interpretable visual summary of the multi-dimensional divergence between rhetorical profiles that no tabular format can replicate with equivalent clarity.

More broadly, this exercise illustrates a model for AI-assisted quantitative social science: the LLM as a high-speed synthesis and integration engine, operating on the published outputs of domain experts, producing structured quantitative frameworks that would otherwise require substantial manual effort. With appropriate methodological transparency and calibrated uncertainty, this approach extends the reach of evidence-based analysis into domains previously constrained by research resource limitations.

References

1. Miller Center, University of Virginia. Presidential Speech Archive. millercenter.org/the-presidency/presidential-speeches.

2. Schaffner, B.F. & Luks, S. Misinformation or expressive responding? What an inauguration crowd can tell us about the relationship between political elites and mass opinion. Public Opin. Q. 82, 148–157 (2018).

3. Benoit, W.L., Hansen, G.J. & Verser, R.M. A meta-analysis of the effects of viewing US presidential debates. Commun. Monogr. 70, 335–350 (2003).

4. Grieve, J. (University of Warwick). Political language drift and stylometric change in US presidential rhetoric.

5. Carnegie Mellon University Language Technologies Institute. Presidential speech complexity analyses (2015–2024).

6. Pennebaker, J.W., Booth, R.J. & Francis, M.E. Linguistic Inquiry and Word Count (LIWC). Austin, TX: LIWC.net (2007).

7. Flesch, R. A new readability yardstick. J. Appl. Psychol. 32, 221–233 (1948).

8. Rev.com. Rally and press conference transcript archives (2017–2026). rev.com.

9. White House press office. Official transcript archives (2025–2026).

Acknowledgements: Synthesis and integration performed by Claude (Anthropic), April 2026. Charts generated in Python (Matplotlib). This work does not represent a primary empirical study; all composite scores are evidence-based estimates derived from published literature and should not be cited as primary measurements.

An AI-assisted methodology for large-scale rhetorical assessment

1. Introduction