Human evaluation of automatically generated text: Current trends and best practice guidelines
•The current paper provides an overview of human evaluation practices in NLG.•The current paper gives an overview of the steps necessary to undertake a human evaluation study.•Building on findings from NLG, but also statistics and the behavioral sciences, the current paper provides a set of recommen...
        Saved in:
      
    
          | Published in | Computer speech & language Vol. 67; p. 101151 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier Ltd
    
        01.05.2021
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0885-2308 1095-8363 1095-8363  | 
| DOI | 10.1016/j.csl.2020.101151 | 
Cover
| Summary: | •The current paper provides an overview of human evaluation practices in NLG.•The current paper gives an overview of the steps necessary to undertake a human evaluation study.•Building on findings from NLG, but also statistics and the behavioral sciences, the current paper provides a set of recommendations and best practices for human evaluation in NLG.
Currently, there is little agreement as to how Natural Language Generation (NLG) systems should be evaluated, with a particularly high degree of variation in the way that human evaluation is carried out. This paper provides an overview of how (mostly intrinsic) human evaluation is currently conducted and presents a set of best practices, grounded in the literature. These best practices are also linked to the stages that researchers go through when conducting an evaluation research (planning stage; execution and release stage), and the specific steps in these stages. With this paper, we hope to contribute to the quality and consistency of human evaluations in NLG. | 
|---|---|
| ISSN: | 0885-2308 1095-8363 1095-8363  | 
| DOI: | 10.1016/j.csl.2020.101151 |