Abstract Silviu Paun 5 Feb 2018
Computational Linguistics practice for crowdsourced data analysis is moving away from the more traditional methods based on majority vote and coefficients of agreement, to the use of models of annotation. But although there has been substantial effort to develop new models, there has been much less work comparing such models on the same datasets. The aim of this paper is to fill this gap. We analyse six of the best known models of annotation, with distinct structures (pooled, unpooled and partially-pooled) and a diverse set of assumptions (annotator abilities, item difficulty or both). We carry out this evaluation using 4 datasets with different degrees of spamming and annotator quality, and provide guidelines for both model selection and implementation.