Papers
arxiv:1903.00161

DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs

Published on Mar 1, 2019
Authors:
,

Abstract

A new reading comprehension benchmark, DROP, requires systems to perform discrete reasoning over paragraphs, emphasizing numerical operations, and state-of-the-art models underperform significantly compared to humans.

AI-generated summary

Reading comprehension has recently seen rapid progress, with systems matching humans on the most popular datasets for the task. However, a large body of work has highlighted the brittleness of these systems, showing that there is much work left to be done. We introduce a new English reading comprehension benchmark, DROP, which requires Discrete Reasoning Over the content of Paragraphs. In this crowdsourced, adversarially-created, 96k-question benchmark, a system must resolve references in a question, perhaps to multiple input positions, and perform discrete operations over them (such as addition, counting, or sorting). These operations require a much more comprehensive understanding of the content of paragraphs than what was necessary for prior datasets. We apply state-of-the-art methods from both the reading comprehension and semantic parsing literature on this dataset and show that the best systems only achieve 32.7% F1 on our generalized accuracy metric, while expert human performance is 96.0%. We additionally present a new model that combines reading comprehension methods with simple numerical reasoning to achieve 47.0% F1.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 1903.00161
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 531

Browse 531 models citing this paper

Datasets citing this paper 10

Browse 10 datasets citing this paper

Spaces citing this paper 2,346

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.