Tools

This page describes several applications that I have developed, intended for use as research tools.

Concept Lab Tools
University of Cambridge

The Concept Lab tools are not yet ready for public distribution,
but many thanks to Alan Liu for his kind tweet and for taking the above action shot. 🙂

Lightweight Metrics of Semantic Similarity
Indiana University
Citations for this tool and the associated paper on Google Scholar

LMOSS (available here) calculates a variant of pointwise mutual information (PMI) scores between any pair of words in a corpus. When trained on a large corpus, rank-ordered PMI scores correlate quite well with rank-ordered human judgments of word-pair similarity (e.g., The WordSimilarity-353 Test Collection). If the corpus is large enough, PMI handily outperforms implementations of LSA and other algorithms trained on traditional (smaller) corpora.

LMOSS offers an interface for PMI scores similar to the official web interface for Latent Semantic Analysis, allowing matrix comparison, one-to-many comparison, and pairwise comparison. You’ll need your own corpus to train it on–one of the advantages of PMI is that it’s fast and easy to train on your own domain-specific corpora.

STRATA: A Search Tool for Richly Annotated and Time-Aligned Data
Stanford University
Citations for this tool on Google Scholar

Recently, there has been increased attention to psycholinguistic research that considers both the syntactic and temporal characteristics of naturally occurring speech, such as corpus-based studies of syntactic priming. However, few tools exist to facilitate corpus research on time-aligned speech. STRATA is a general-purpose tool for searching and extracting data from the Switchboard corpus of recorded telephone conversations, a speech corpus that has been annotated on many different levels.

STRATA was developed as part of my undergraduate thesis. To my surprise, it has a handful of citations despite being an unpublished document, so it looks like it’s getting some use! Due to license restrictions on the Switchboard corpus, it is available only at Stanford University.


Property Pictionary – Features
Citations for this tool and the associated paper on Google Scholar


Property Pictionary – Features is a simplified version of the program described on the Games page for use in controlled experiments.  Brent Kievit-Keylar, who is responsible for most of the other applets on the website, deserves credit for many of the recent changes and updates to the version of Property Pictionary – Features currently online at the Cognitive Computing Lab Playground.

Support for Beesign implementations

Beesign is a Flash computer simulation created by Joshua Danish that teaches children about honeybees from a complex systems perspective. I wrote JavaScript code to embed BeeSign in a webpage and make it easy to customize and navigate interactive discussion questions, simplifying the process of implementing Beesign in classroom activities.