Articles

How to Produce Content Optimised for LLM Citation

Obsurfable

How to Produce Content Optimised for LLM Citation: Evidence-Based Recommendations

A follow-up to "How People Actually Use ChatGPT." Practical advice for creators and publishers who want their work to be more likely retrieved, summarised, or cited by ChatGPT and similar LLMs.


If you've read what the data says about how people use ChatGPT, you know that nearly 80% of usage falls into three buckets: Practical Guidance, Seeking Information, and Writing. Users are either asking for facts, asking for advice they can adapt, or asking for help producing or refining text. When ChatGPT (or similar systems) answer those queries, they draw on training data and, when available, web search and external retrieval. Content that fits how people actually ask—and that is easy for models to parse and reuse—is more likely to show up in answers and get cited.

This post turns those findings into concrete recommendations for producing content that is optimised for LLM retrieval and citation.


1. Prioritise the two use cases where "sources" matter most

The NBER paper separates Seeking Information (factual lookups: "same for all users") from Practical Guidance (customised advice: tutoring, how-to, ideation). Both drive a large share of queries; both are where your content can be used as a source.

  • Seeking Information (~14–24% of messages and growing)
    People ask for facts about people, events, products, recipes, definitions, and current affairs. The paper describes this as "a very close substitute for web search." When ChatGPT has search or retrieval, it's this kind of query that pulls in and cites external pages.
    Implication: Invest in reference-style content: clear, factual, canonical answers to questions that many people ask the same way.

  • Practical Guidance (~29% of messages)
    How-to advice, tutoring, creative ideation, health/fitness/beauty. The model often synthesises multiple sources into custom advice. Content that states principles, steps, or frameworks in a reusable way is more likely to be reflected in those answers (and, where systems support it, cited).
    Implication: Produce structured guidance: how-tos, tutorials, checklists, and decision frameworks that can be summarised or adapted.

Recommendation: Focus your "citation optimisation" efforts on (a) factual, search-style content and (b) practical guidance (how-to, teaching, frameworks). These align with the two biggest non-writing use cases in the data.


2. Write for the way people ask: direct questions and clear intents

Roughly 49% of messages are Asking—users want information or advice to inform a decision. Asking is growing faster than Doing and gets higher satisfaction. So a lot of value comes from answering questions and supporting decisions.

  • Front-load answers. Put the direct answer or definition near the top (e.g. in the first paragraph or under a clear subheading). Models (and users) tend to latch onto the first coherent, on-topic block.
  • Use question-shaped headings. Headings that mirror real queries ("What is X?", "How do I…?", "When did…?") match how people phrase prompts and improve the chance your section is retrieved for that intent.
  • Provide decision support. Comparisons, criteria, pros/cons, and "when to use X vs Y" are exactly the kind of content that supports Asking. Structure them so they can be extracted as lists or short paragraphs.

Recommendation: Structure each piece around one or more explicit questions, with concise, quotable answers early in the section. This matches Asking-dominated usage and makes your content easier to cite.


3. Make factual content "one right answer" friendly

The paper defines Seeking Information as "factual information that should be the same for all users" (e.g. Boston Marathon qualifying times by age and gender). Content that states facts unambiguously and in one place is easier for retrieval systems to surface and for the model to attribute.

  • One canonical formulation per concept or fact. Avoid scattering the same fact across many pages or wording it differently each time. Prefer a single "source of truth" page or section per topic.
  • Use lists, tables, and definitions. Structured data (dates, numbers, criteria, steps) is easier for models to extract and quote. Tables and numbered lists also improve scannability for both humans and systems.
  • Name entities and events clearly. Proper nouns, standardised terms, and consistent phrasing ("X is…", "Y refers to…") help retrieval match user queries and help the model point back to you.

Recommendation: For reference and factual content, aim for one clear, structured answer per question, with consistent terminology and machine-friendly formatting (headings, lists, tables).


4. Support "Writing" use cases with reusable patterns and examples

Writing is the single largest work use (~40% of work-related messages). About two-thirds of Writing messages ask the model to modify user text (edit, critique, translate, summarize) rather than create from scratch. So people use ChatGPT to improve or adapt something they already have.

  • Provide templates, examples, and principles. Content that describes formats, tone, structure, or "how to write X" gives the model patterns to suggest. When the model helps with emails, reports, or summaries, it's drawing on such patterns—often from training or retrieval.
  • Explain conventions and criteria. "What makes a good X", "how to structure a Y", "common mistakes in Z" are decision-support content that feeds into both Asking and Doing (e.g. "improve this email").

Recommendation: If your audience is professionals or students, create writing guidance: templates, before/after examples, and clear criteria. That aligns with the dominant work use and increases the chance your conventions and examples shape (and, where possible, get cited in) writing-related answers.


5. Align topics with high-volume and high-satisfaction use

The data shows where usage and satisfaction are concentrated:

  • Practical Guidance and Seeking Information together account for the majority of non-writing usage; Asking is growing and rated higher quality.
  • Tutoring/teaching is a major slice of Practical Guidance (~10% of all messages).
  • Work use is heavily about obtaining/documenting/interpreting information and making decisions, giving advice, solving problems, and thinking creatively.

So content that serves learning, decision-making, and professional tasks (especially information and writing) is well aligned with how people use ChatGPT.

Recommendation: Prioritise topics that support education (explanations, tutorials, concept breakdowns), decision support (comparisons, criteria, frameworks), and professional writing/information (how-tos, reference, templates). These match the highest-volume and highest-satisfaction intents.


6. Optimise for retrieval as well as "reading"

When ChatGPT uses search or RAG, it's matching user queries to documents or passages. Content that is discoverable and parseable is more likely to be retrieved and then cited.

  • Semantic clarity. Use full sentences and clear paragraphs; avoid jargon without explanation. Models (and search) do better when the meaning of a passage is self-contained.
  • Stable, descriptive structure. Consistent heading hierarchy (H1 → H2 → H3) and predictable sections (e.g. "Overview", "Steps", "Examples") help retrieval systems and models identify "this block answers that question."
  • Technical basics. Fast loading, crawlable URLs, and valid markup improve the chance your content is indexed and available to retrieval. These are the same basics that help traditional search.

Recommendation: Treat "LLM citation" as an extension of search and structure: write for clarity and intent, use consistent structure, and keep technical SEO and crawlability in order so your content can be retrieved when users ask.


7. Be the canonical source for a narrow slice

The paper shows that Getting Information, Documenting/Recording Information, and Making Decisions and Solving Problems are among the top activities across many occupations. So there's broad demand for authoritative, well-structured information in many domains.

  • Own a clear niche. One definitive guide, one well-maintained reference, or one clearly explained framework is more useful than many shallow pages. Models and retrieval prefer a single strong match over many weak ones.
  • Update and maintain. Stale or contradictory content is harder to trust and cite. Regular updates and a clear "last updated" or version signal help both users and systems.

Recommendation: Aim to be the (or a) go-to source for a specific set of questions or tasks. Depth and accuracy in a narrow area beat breadth and vagueness for retrieval and citation.


Summary: a short checklist

PriorityAction
Use caseFocus on Seeking Information (factual reference) and Practical Guidance (how-to, teaching, frameworks).
IntentWrite for Asking: direct answers, question-shaped headings, decision support (comparisons, criteria, pros/cons).
Factual contentOne clear answer per question; lists, tables, definitions; consistent terminology.
Writing supportProvide templates, examples, and criteria for professional and educational writing.
TopicsEmphasise education, decision support, and professional information/writing.
StructureFront-load answers; clear headings and hierarchy; semantic clarity; crawlable, indexable pages.
AuthorityBe the canonical source for a defined slice; keep content updated and consistent.

The goal isn't to "game" algorithms—it's to produce content that matches how hundreds of millions of people already use ChatGPT: asking for facts, asking for advice, and asking for help with writing and decisions. Content that answers those intents clearly, in a form that retrieval systems and LLMs can find and reuse, is content that is more likely to be cited when it matters.

Based on: Chatterji, A., Cunningham, T., Deming, D. J., Hitzig, Z., Ong, C., Shan, C. Y., & Wadman, K. (2025). "How People Use ChatGPT." NBER Working Paper 34255.