đ Research summary: How citations impact user trust in LLM chat responses
A new paper shows user trust in LLM output is meaningfully influenced by citationsâ and that while the quality of those citations matters, the quantity doesn't
The massive shift weâre undergoing in how people seek and obtain information and knowledge is profoundâ maybe as profound as the advent of the internet itself. Large language modelsâ ability to simply answer a question rather than deliver dozens or hundreds of web results is an obvious boon to users in terms of time and effort. Why sift through pages of links on a search engine results page to only maybe get a clear answer to your question when you can simply ask an LLM and get a concise answer?
But, of course, doing that is a different transaction altogether, and one that comes with tradeoffs. Poring over several links from search results can be laborious, but it comes with all sorts of attendant signals about how much you can believe what youâre reading (the rank on the results page itself, the publisher, the author, the sources cited on the page, etc.). Just how and to what extent these signals are used and made sense of varies greatly from person to person, of course, but the signals are there nonetheless. When you ask a question of an LLM, you often get a clear answer but no attendant signals about the quality of that answer.
A new paper explores one tactic LLMs use to address that gap: citations. And I love the study design that was used here: using the same prompt, the researchers randomly returned the same answer to participants with no citations, one citation, or five citations. For the people who got citations in their response, they were delivered either valid or random citations. Participants were then asked how much they trusted the answer.
There were three big takeaways in the results:
Trustworthiness of responses increased when citations were present
When citations are random, however, trustworthiness of responses was lower
Trustworthiness of responses was not meaningfully different if there was one or five citations
So, put simply, citations improve trust in responses, but only when theyâre valid, and more is not necessarily better.
The implications here for product builders are interesting and largely rest on the idea of âvalidity.â The sources shown for the random citation in the study are clearly not relevant to the question at handâ itâs easy for someone to say they donât trust the NBA as a source of information about space. The validity of source data for less general knowledge queries are presumably much harder to parse for an end user and much easier for a model to get wrong.
If youâre building an enterprise AI layer that seeks to deliver accurate information to an employee about the state of a specific project or deal, for example, how do you identify the best sources of truth and ensure that those are what 1) get referenced by the model in its answer and 2) get presented as reliable, if not canonical answers to the query? Making sure both are done exceedingly well is a prerequisite for using citations as a meaningful signal for trust, and citations should probably be reserved for responses where that level of confidence is very high, otherwise theyâll have the opposite of their intended effect.