Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?

Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected...

Full description

Saved in:
Bibliographic Details
Published in2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE) pp. 373 - 384
Main Authors Liu, Zhongxin, Xia, Xin, Hassan, Ahmed E., Lo, David, Xing, Zhenchang, Wang, Xinyu
Format Conference Proceeding
LanguageEnglish
Published ACM 03.09.2018
Subjects
Online AccessGet full text
ISSN2643-1572
DOI10.1145/3238147.3238190

Cover

Abstract Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.
AbstractList Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers.
Author Lo, David
Hassan, Ahmed E.
Wang, Xinyu
Xia, Xin
Xing, Zhenchang
Liu, Zhongxin
Author_xml – sequence: 1
  givenname: Zhongxin
  surname: Liu
  fullname: Liu, Zhongxin
  organization: Zhejiang University,China
– sequence: 2
  givenname: Xin
  surname: Xia
  fullname: Xia, Xin
  organization: Monash University,Australia
– sequence: 3
  givenname: Ahmed E.
  surname: Hassan
  fullname: Hassan, Ahmed E.
  organization: Queen's University,Canada
– sequence: 4
  givenname: David
  surname: Lo
  fullname: Lo, David
  organization: Singapore Management University,Singapore
– sequence: 5
  givenname: Zhenchang
  surname: Xing
  fullname: Xing, Zhenchang
  organization: Australian National University,Australia
– sequence: 6
  givenname: Xinyu
  surname: Wang
  fullname: Wang, Xinyu
  organization: Zhejiang University,China
BookMark eNotjEFLwzAYhqMouM2dPXjJH8hM8iVt4kVmcZu46aWgt5G2X7TSpZJUxH-vVJ_LA88L75SchD4gIReCL4RQ-gokGKHyxWjLj8j0t3LQFvKXYzKRmQImdC7PyDyld865NDnXUk_IwyN-Rtexnavf2oCsjC6kzg1tH9itS9jQoj8c2oHuMCX3inSNAeO4X9NN_0VXLtJlRPqMN-fk1Lsu4fzfM1Ku7spiw7ZP6_tiuWVO6mxgtqlq57nQBoX1GUefKQMaRA3GIuRG2xyzJjMeFejKenBola8a42uuKpiRy7_bFhH3H7E9uPi9t3wEfgDm_U1e
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1145/3238147.3238190
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL) (F)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 145035937X
9781450359375
EISSN 2643-1572
EndPage 384
ExternalDocumentID 9000000
Genre orig-research
GroupedDBID 29I
6IE
6IF
6IH
6IK
6IL
6IM
6IN
6J9
AAJGR
AAWTH
ABLEC
ACREN
ADYOE
ADZIZ
AFYQB
ALMA_UNASSIGNED_HOLDINGS
AMTXH
APO
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IPLJI
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-a256t-9dbcaf0158e19f60ef6483531c389e378597e6d68fe435b9f3ae94fbd8fc04b3
IEDL.DBID RIE
IngestDate Wed Aug 13 06:22:43 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a256t-9dbcaf0158e19f60ef6483531c389e378597e6d68fe435b9f3ae94fbd8fc04b3
PageCount 12
ParticipantIDs ieee_primary_9000000
PublicationCentury 2000
PublicationDate 2018-09-03
PublicationDateYYYYMMDD 2018-09-03
PublicationDate_xml – month: 09
  year: 2018
  text: 2018-09-03
  day: 03
PublicationDecade 2010
PublicationTitle 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE)
PublicationTitleAbbrev ASE
PublicationYear 2018
Publisher ACM
Publisher_xml – name: ACM
SSID ssj0002870525
ssj0051577
Score 2.5406632
Snippet Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for...
SourceID ieee
SourceType Publisher
StartPage 373
SubjectTerms Artificial neural networks
Commit message generation
Documentation
Generators
Nearest neighbor algorithm
Neural machine translation
Noise measurement
Roads
Software algorithms
Software engineering
Software maintenance
Training
Title Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?
URI https://ieeexplore.ieee.org/document/9000000
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ09TN_E3OXg0XbukaeJFVBxDqXiYuNtI2hcRZZPRIfjX-5J2G4gHTykplJCQft9Lvvc9Qs5BIcZKIZlVMTDhUsOMNiVTzikBVookGGnnj3L0LO4n6aRFLta5MAAQxGcQ-cdwl1_Oi6U_KuvrQGkxQN_KlKxztdbnKf7CLt347CFMZ1lj5ZOItM89NIksCm34AW9qqQQoGXZIvhpErSB5j5aVjYrvX_6M_x3lDultkvbo0xqOdkkLZnuks6raQJtN3CUP3o_DfLA8yCiBBbSqFXHsBjGtpD5p5K2iua-O8gq0dqb27y_paP5Fh2ZBrxdAX-CqR8bDu_HtiDUlFZhBblMxXdrCOKQAChLtZAxOCuRgPCmQuADPFMYXIEupHCCPstpxA1o4WypXxMLyfdKezWdwQCgSqwKp4iDhDoMeJ4zGTyYDa2PtQxA4JF0_NdPP2jRj2szK0d_dx2QbmUgQYsT8hLSrxRJOEe0rexaW-QcP6Kc5
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5jHvTkj038bQ4eTdeuadp4ERVHdevwMHG3kbQvIsomo0Pwr_cl7TYQD55SUighIf2-l3zve4RcQIIYK7hgOvGBcRMppqQqWGJMwkELHjgj7Wwo0mf-OI7GDXK5yoUBACc-A88-urv8YpYv7FFZRzpKiwH6RsQ5j6psrdWJir2yi9ZOewjUcVyb-QQ86oQWnHjsudb9gtfVVByY9LZJthxGpSF59xal9vLvXw6N_x3nDmmv0_bo0wqQdkkDpntke1m3gdbbuEX61pFDfbDMCSmBObyqNHHsFlGtoDZt5K2kma2P8gq08qa2769oOvuiPTWnN3OgL3DdJqPe_eguZXVRBaaQ3ZRMFjpXBklAAoE0wgcjOLKwMMiRukAYJxhhgChEYgCZlJYmVCC50UVicp_rcJ80p7MpHBCK1CpHstgNQoNhj-FK4ieDrta-tEEIHJKWnZrJZ2WbMaln5ejv7nOymY6ywWTwMOwfky3kJU6W4YcnpFnOF3CK2F_qM7fkPwSbqoY
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2018+33rd+IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=Neural-Machine-Translation-Based+Commit+Message+Generation%3A+How+Far+Are+We%3F&rft.au=Liu%2C+Zhongxin&rft.au=Xia%2C+Xin&rft.au=Hassan%2C+Ahmed+E.&rft.au=Lo%2C+David&rft.date=2018-09-03&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=373&rft.epage=384&rft_id=info:doi/10.1145%2F3238147.3238190&rft.externalDocID=9000000