Publications

Filter by type:

Disaggregated evaluation is a central task in AI fairness assessment, with the goal to measure an AI system’s performance across …

Recent claims about the impressive abilities of large language models (LLMs) are often supported by evaluating publicly available …

Restless multi-armed bandits are often used to model budget-constrained resource allocation tasks where receipt of the resource is …

Restless and collapsing bandits are often used to model budget-constrained resource allocation in settings where arms have …

Crowdworker-constructed natural language inference (NLI) datasets have been found to contain statistical artifacts associated with the …

Scientists construct and analyze computational models to understand the world. That understanding comes from efforts to augment, …