I’m Gabriel Recchia, a cognitive scientist interested in methods for aligning modern language models. I’m particularly interested in cases where current approaches to eliciting human preferences might not yield a training signal that results in desired outcomes. As machine learning models become increasingly capable and are presented with more complex and hard-to-evaluate tasks, it will become increasingly difficult to determine if the results are trustworthy. My research currently focuses on extending our ability to successfully evaluate whether models are behaving in ways we prefer, even in cases where doing so is difficult or costly. As such, I’m interested in a range of topics related to scalable oversight and evaluating the faithfulness/trustworthiness of explanations generated by language models, such as metrics and benchmarks for such evaluation, process-oriented learning, debate, sandwiching, evaluating risks and benefits of approaches driven by AI feedback, externalized reasoning oversight, and some approaches to explainable AI.
Previously, I was at the University of Cambridge’s Winton Centre for Risk and Evidence Communication, where I worked on how to communicate information in ways that support comprehension and informed decision-making. I also led on user testing research and evaluation of patient-friendly genetic reports and the NHS: Predict family of prognostic tools. I have spent much of my career involved in research investigating the capabilities, properties, and applications of distributional models trained on large volumes of text, and continued this while at the Winton Centre to explore their applications in characterizing how risk is communicated and perceived. See my Google Scholar profile for a list of my most cited papers.
Before this, I was affiliated with the Centre for Research in the Arts, Social Sciences and Humanities, where I worked with distributional approaches to the analysis of large corpora of historical texts, and the University of Memphis Institute for Intelligent Systems, where I investigated what geographical information was latent in simple co-occurrence-based models.
I received my bachelor’s degree in Symbolic Systems from Stanford University in 2007, and my doctorate is in Cognitive Science at Indiana University, with a minor in computational linguistics and with language modelling as my content specialization.