{"id":2,"date":"2012-05-28T19:14:51","date_gmt":"2012-05-28T19:14:51","guid":{"rendered":"http:\/\/www.twonewthings.com\/gabrielrecchia\/?page_id=2"},"modified":"2025-04-01T04:58:01","modified_gmt":"2025-04-01T04:58:01","slug":"sample-page","status":"publish","type":"page","link":"http:\/\/www.twonewthings.com\/gabrielrecchia\/","title":{"rendered":""},"content":{"rendered":"<div class=\"header\"><a href=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2019\/07\/Gabriel_Recchia.jpg\"><img loading=\"lazy\" decoding=\"async\" class=\"alignleft size-full wp-image-580\" style=\"margin: 40px;\" src=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2019\/07\/Gabriel_Recchia.jpg\" alt=\"The face of Gabriel Recchia.\" width=\"256\" height=\"384\" srcset=\"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2019\/07\/Gabriel_Recchia.jpg 256w, http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-content\/uploads\/2019\/07\/Gabriel_Recchia-200x300.jpg 200w\" sizes=\"auto, (max-width: 256px) 100vw, 256px\" \/><\/a>\n<div class=\"details\">\n<h1>Gabriel Recchia<\/h1>\n<h2>Director, Modulo Research<\/h2>\n<\/div>\n<\/div>\n<p>I&#8217;m a cognitive scientist working on the evaluation and alignment of large language models as the director of Modulo Research. We recently released a <a href=\"https:\/\/github.com\/modulo-research\/findtheflaws\/\">dataset<\/a> of expert-annotated valid and invalid solutions involving long-form reasoning intended to facilitate scalable oversight research (preprint <a href=\"https:\/\/arxiv.org\/abs\/2503.22989\">here<\/a>). We&#8217;re now finalizing a dataset of textual representations of the research processes followed by high-performing participants in an experiment involving an online research task \u2014 for use in improving LLM capability elicitations \u2014 and writing up the results of our associated experiments. You can <a href=\"https:\/\/forms.gle\/xBSiN7buHoA7Cw4K8\">sign up<\/a> to be notified when we release future datasets.<\/p>\n<p>I&#8217;m also grateful to have had the opportunity to contribute to Usman et al.&#8217;s monumental agenda paper, <a href=\"https:\/\/arxiv.org\/abs\/2404.09932\">Foundational Challenges in Assuring Alignment and Safety of Large Language Models<\/a>, and to contribute to some of Anthropic&#8217;s Frontier Red Team evaluation\/demo projects as part of collaborations with Hidden Variable Limited.<\/p>\n<p>See my <a href=\"https:\/\/scholar.google.com\/citations?user=XJxGdu8AAAAJ&amp;hl=en&amp;oi=ao\" target=\"_blank\" rel=\"noopener\">Google Scholar profile<\/a> for a list of my most cited works, and <a href=\"#recent-papers\">the bottom of this page<\/a> for recent updates that may not be reflected there.<\/p>\n<h3>Recognition<\/h3>\n<ul>\n<li>My sole-authored preprint &#8220;<a href=\"https:\/\/arxiv.org\/abs\/2109.02102\" target=\"_blank\" rel=\"noopener\">Teaching autoregressive language models complex tasks by demonstration<\/a>&#8221; has been <a href=\"https:\/\/scholar.google.com\/scholar?cites=5768770486195341372&amp;as_sdt=2005&amp;sciodt=0,5&amp;hl=en\">cited<\/a> by papers out of Google Brain and DeepMind and was discussed on <a href=\"https:\/\/www.youtube.com\/watch?v=yPMtSXXn4OY&amp;t=58m\">Machine Learning Street Talk<\/a><\/li>\n<li>One of four winners of the <a href=\"https:\/\/blog.aiimpacts.org\/p\/winners-of-the-essay-competition\">AI Impacts essay competition on the Automation of Wisdom and Philosophy<\/a> (out of 90 entries)<\/li>\n<li>Third Prize recipient in the <a href=\"https:\/\/irmckenzie.co.uk\/round2\">Inverse Scaling Prize competition<\/a>, which focused on identifying tasks where larger language models exhibit decreased performance<\/li>\n<li>Co-authored &#8220;<a href=\"https:\/\/www.tandfonline.com\/doi\/full\/10.1080\/13669877.2020.1758193\">Risk perceptions of COVID-19 around the world<\/a>&#8220;, referenced by U.S. News, The Telegraph, The Daily Mail, BBC Future and 130 other outlets<\/li>\n<\/ul>\n<h3>&#8220;Are you the same Gabriel Recchia who&#8230;?&#8221;<\/h3>\n<p>In a former life, I did things like:<\/p>\n<ul>\n<li>leading on user testing research\/evaluation of <a href=\"https:\/\/wintoncentre.maths.cam.ac.uk\/projects\/communicating-results-genetic-testing\/\">patient-friendly genetic reports<\/a> and the widely used prognostic tool <a href=\"https:\/\/breast.predict.cam\/\">Predict: Breast Cancer<\/a> at the University of Cambridge&#8217;s Winton Centre for Risk and Evidence Communication<\/li>\n<li><a href=\"https:\/\/scholar.google.com\/citations?view_op=view_citation&amp;hl=en&amp;user=XJxGdu8AAAAJ&amp;cstart=20&amp;pagesize=80&amp;sortby=pubdate&amp;citation_for_view=XJxGdu8AAAAJ:qjMakFHDy7sC\">investigating<\/a> <a href=\"https:\/\/scholar.google.com\/citations?view_op=view_citation&amp;hl=en&amp;user=XJxGdu8AAAAJ&amp;cstart=20&amp;pagesize=80&amp;sortby=pubdate&amp;citation_for_view=XJxGdu8AAAAJ:_FxGoFyzp5QC\">capabilities<\/a>, <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/abs\/pii\/S0732118X17300065\">properties<\/a>, and <a href=\"https:\/\/scholar.google.com\/citations?view_op=view_citation&amp;hl=en&amp;user=XJxGdu8AAAAJ&amp;cstart=20&amp;pagesize=80&amp;sortby=pubdate&amp;citation_for_view=XJxGdu8AAAAJ:LkGwnXOMwfcC\">applications<\/a> of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Distributional_semantics\">distributional models<\/a> trained on lots of text<\/li>\n<li>conducted various studies of <a href=\"https:\/\/www.frontiersin.org\/journals\/human-neuroscience\/articles\/10.3389\/fnhum.2012.00315\/full\">human<\/a> <a href=\"https:\/\/escholarship.org\/content\/qt8mb76610\/qt8mb76610.pdf\">semantic<\/a> memory and how risk is <a href=\"https:\/\/www.sciencedirect.com\/science\/article\/pii\/S2772628221000030\">communicated<\/a>, <a href=\"https:\/\/onlinelibrary.wiley.com\/doi\/full\/10.1111\/risa.14091\">perceived<\/a>, and <a href=\"https:\/\/journals.plos.org\/plosone\/article?id=10.1371\/journal.pone.0250935\">predicted<\/a><\/li>\n<li>writing an <a href=\"https:\/\/www.goodreads.com\/book\/show\/41956001-exoplanets-a-to-z\">alphabet book about exoplanets<\/a> (sadly uncalibrated to the reading level of any child young enough to still be interested in alphabet books)<\/li>\n<\/ul>\n<h3 id=\"recent-papers\">Recent papers, preprints, and work in progress<\/h3>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li><strong>Recchia, G., Mangat, C., Nyachhyon, J., Sharma, M., Canavan, C., Epstein-Gross, D., and Abdulbari, M.<\/strong> (in prep.) Automation bias: A challenge for scalable oversight. Presents results of two <a href=\"https:\/\/www.alignmentforum.org\/posts\/PZtsoaoSLpKjjbMqM\/the-case-for-aligning-narrowly-superhuman-models\">sandwiching<\/a>-like experiments intended to establish baselines for simple approaches.<\/li>\n<li><strong>Recchia, G., Mangat, C. S., Li, I., &amp; Krishnakumar, G.<\/strong> (2025). FindTheFlaws: Annotated errors for use in scalable oversight research. <a href=\"https:\/\/arxiv.org\/abs\/2503.22989\">Link<\/a><\/li>\n<li><strong>Phan, L., Gatti, A., Han, Z., Li, N., Hu, J., Zhang, H., &#8230; &amp; Verbeken, B.<\/strong> (2025). Humanity&#8217;s Last Exam. <a href=\"https:\/\/arxiv.org\/abs\/2501.14249\">Link<\/a>. <span class=\"note\">Co-author on account of contributing question(s) that were selected for the dataset.<\/span><\/li>\n<li><strong>Anwar, U., Saparov, A., Rando, J., Paleka, D., Turpin, M., Hase, P., &#8230; &amp; Krueger, D.<\/strong> (2024). Foundational challenges in assuring alignment and safety of large language models. <em>Transactions on Machine Learning Research<\/em>, 2835-8856. <a href=\"https:\/\/openreview.net\/forum?id=oVTkOs8Pka\">Link<\/a><\/li>\n<li><strong>McKenzie, I. R., Lyzhov, A., Pieler, M., Parrish, A., Mueller, A., Prabhu, A., &#8230; &amp; Perez, E.<\/strong> (2023). Inverse scaling: When bigger isn&#8217;t better. <em>Transactions on Machine Learning Research<\/em>. <a href=\"https:\/\/arxiv.org\/abs\/2306.09479\">Link<\/a> <span class=\"note\">Co-author on account of submitting a winning task (e.g., identifying a task on which language model performance decreases with scale).<\/span><\/li>\n<li><strong>Proto, R., Recchia, G., Dryhurst, S., Freeman, A.L.<\/strong> (2023). Do colored cells in risk matrices affect decision\u2010making and risk perception? Insights from randomized controlled studies. <em>Risk Analysis<\/em>. <a href=\"https:\/\/onlinelibrary.wiley.com\/doi\/full\/10.1111\/risa.14091\">Link<\/a><\/li>\n<li><strong>Recchia, G., Lawrence A. C. E., Capacchione, L., &amp; Freeman, A.L.J.<\/strong> (2022). Making BRCA1 genetic test reports easier to understand through user-centered design: A randomized trial. <em>Genetics in Medicine<\/em>. <a href=\"https:\/\/doi.org\/10.1016\/j.gim.2022.04.016\">Link<\/a><\/li>\n<li><strong>Recchia, G.<\/strong> (2021). Teaching autoregressive language models complex tasks by demonstration. <a href=\"https:\/\/arxiv.org\/pdf\/2109.02102\">Link<\/a>. <span class=\"note\">Early preprint demonstrating an example of capability elicitation via fine-tuning. Cited by papers out of DeepMind and Google Research.<\/span><\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul>\n<li style=\"list-style-type: none;\">\n<ul>\n<li style=\"list-style-type: none;\">\u00a0<\/li>\n<\/ul>\n<p>More at <a href=\"https:\/\/scholar.google.com\/citations?user=XJxGdu8AAAAJ&amp;hl=en&amp;oi=ao\" target=\"_blank\" rel=\"noopener\">Google Scholar<\/a><\/p>\n<\/li>\n<\/ul>\n\n\n<p class=\"wp-block-paragraph\"><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Gabriel Recchia Director, Modulo Research I&#8217;m a cognitive scientist working on the evaluation and alignment of large language models as the director of Modulo Research. We recently released a dataset of expert-annotated valid and invalid solutions involving long-form reasoning intended to facilitate scalable oversight research (preprint here). We&#8217;re now finalizing a dataset of textual representations [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":2,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-2","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/pages\/2","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/comments?post=2"}],"version-history":[{"count":10,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/pages\/2\/revisions"}],"predecessor-version":[{"id":772,"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/pages\/2\/revisions\/772"}],"wp:attachment":[{"href":"http:\/\/www.twonewthings.com\/gabrielrecchia\/wp-json\/wp\/v2\/media?parent=2"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}