Neural-Machine-Translation-Based Commit Message Generation: How Far Are We?
Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected...
Saved in:
| Published in | 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE) pp. 373 - 384 |
|---|---|
| Main Authors | , , , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
ACM
03.09.2018
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 2643-1572 |
| DOI | 10.1145/3238147.3238190 |
Cover
| Abstract | Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers. |
|---|---|
| AbstractList | Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for program comprehension and software maintenance. However, due to the lack of time and direct motivation, commit messages sometimes are neglected by developers. To address this problem, Jiang et al. proposed an approach (we refer to it as NMT), which leverages a neural machine translation algorithm to automatically generate short commit messages from code. The reported performance of their approach is promising, however, they did not explore why their approach performs well. Thus, in this paper, we first perform an in-depth analysis of their experimental results. We find that (1) Most of the test diffs from which NMT can generate high-quality messages are similar to one or more training diffs at the token level. (2) About 16% of the commit messages in Jiang et al.'s dataset are noisy due to being automatically generated or due to them describing repetitive trivial changes. (3) The performance of NMT declines by a large amount after removing such noisy commit messages. In addition, NMT is complicated and time-consuming. Inspired by our first finding, we proposed a simpler and faster approach, named NNGen (Nearest Neighbor Generator), to generate concise commit messages using the nearest neighbor algorithm. Our experimental results show that NNGen is over 2,600 times faster than NMT, and outperforms NMT in terms of BLEU (an accuracy measure that is widely used to evaluate machine translation systems) by 21%. Finally, we also discuss some observations for the road ahead for automated commit message generation to inspire other researchers. |
| Author | Lo, David Hassan, Ahmed E. Wang, Xinyu Xia, Xin Xing, Zhenchang Liu, Zhongxin |
| Author_xml | – sequence: 1 givenname: Zhongxin surname: Liu fullname: Liu, Zhongxin organization: Zhejiang University,China – sequence: 2 givenname: Xin surname: Xia fullname: Xia, Xin organization: Monash University,Australia – sequence: 3 givenname: Ahmed E. surname: Hassan fullname: Hassan, Ahmed E. organization: Queen's University,Canada – sequence: 4 givenname: David surname: Lo fullname: Lo, David organization: Singapore Management University,Singapore – sequence: 5 givenname: Zhenchang surname: Xing fullname: Xing, Zhenchang organization: Australian National University,Australia – sequence: 6 givenname: Xinyu surname: Wang fullname: Wang, Xinyu organization: Zhejiang University,China |
| BookMark | eNotjEFLwzAYhqMouM2dPXjJH8hM8iVt4kVmcZu46aWgt5G2X7TSpZJUxH-vVJ_LA88L75SchD4gIReCL4RQ-gokGKHyxWjLj8j0t3LQFvKXYzKRmQImdC7PyDyld865NDnXUk_IwyN-Rtexnavf2oCsjC6kzg1tH9itS9jQoj8c2oHuMCX3inSNAeO4X9NN_0VXLtJlRPqMN-fk1Lsu4fzfM1Ku7spiw7ZP6_tiuWVO6mxgtqlq57nQBoX1GUefKQMaRA3GIuRG2xyzJjMeFejKenBola8a42uuKpiRy7_bFhH3H7E9uPi9t3wEfgDm_U1e |
| ContentType | Conference Proceeding |
| DBID | 6IE 6IL CBEJK RIE RIL |
| DOI | 10.1145/3238147.3238190 |
| DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) (F) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISBN | 145035937X 9781450359375 |
| EISSN | 2643-1572 |
| EndPage | 384 |
| ExternalDocumentID | 9000000 |
| Genre | orig-research |
| GroupedDBID | 29I 6IE 6IF 6IH 6IK 6IL 6IM 6IN 6J9 AAJGR AAWTH ABLEC ACREN ADYOE ADZIZ AFYQB ALMA_UNASSIGNED_HOLDINGS AMTXH APO BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IPLJI M43 OCL RIE RIL |
| ID | FETCH-LOGICAL-a256t-9dbcaf0158e19f60ef6483531c389e378597e6d68fe435b9f3ae94fbd8fc04b3 |
| IEDL.DBID | RIE |
| IngestDate | Wed Aug 13 06:22:43 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | true |
| Language | English |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-a256t-9dbcaf0158e19f60ef6483531c389e378597e6d68fe435b9f3ae94fbd8fc04b3 |
| PageCount | 12 |
| ParticipantIDs | ieee_primary_9000000 |
| PublicationCentury | 2000 |
| PublicationDate | 2018-09-03 |
| PublicationDateYYYYMMDD | 2018-09-03 |
| PublicationDate_xml | – month: 09 year: 2018 text: 2018-09-03 day: 03 |
| PublicationDecade | 2010 |
| PublicationTitle | 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE) |
| PublicationTitleAbbrev | ASE |
| PublicationYear | 2018 |
| Publisher | ACM |
| Publisher_xml | – name: ACM |
| SSID | ssj0002870525 ssj0051577 |
| Score | 2.5406632 |
| Snippet | Commit messages can be regarded as the documentation of software changes. These messages describe the content and purposes of changes, hence are useful for... |
| SourceID | ieee |
| SourceType | Publisher |
| StartPage | 373 |
| SubjectTerms | Artificial neural networks Commit message generation Documentation Generators Nearest neighbor algorithm Neural machine translation Noise measurement Roads Software algorithms Software engineering Software maintenance Training |
| Title | Neural-Machine-Translation-Based Commit Message Generation: How Far Are We? |
| URI | https://ieeexplore.ieee.org/document/9000000 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5zJ09TN_E3OXg0XbukaeJFVBxDqXiYuNtI2hcRZZPRIfjX-5J2G4gHTykplJCQft9Lvvc9Qs5BIcZKIZlVMTDhUsOMNiVTzikBVookGGnnj3L0LO4n6aRFLta5MAAQxGcQ-cdwl1_Oi6U_KuvrQGkxQN_KlKxztdbnKf7CLt347CFMZ1lj5ZOItM89NIksCm34AW9qqQQoGXZIvhpErSB5j5aVjYrvX_6M_x3lDultkvbo0xqOdkkLZnuks6raQJtN3CUP3o_DfLA8yCiBBbSqFXHsBjGtpD5p5K2iua-O8gq0dqb27y_paP5Fh2ZBrxdAX-CqR8bDu_HtiDUlFZhBblMxXdrCOKQAChLtZAxOCuRgPCmQuADPFMYXIEupHCCPstpxA1o4WypXxMLyfdKezWdwQCgSqwKp4iDhDoMeJ4zGTyYDa2PtQxA4JF0_NdPP2jRj2szK0d_dx2QbmUgQYsT8hLSrxRJOEe0rexaW-QcP6Kc5 |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA5jHvTkj038bQ4eTdeuadp4ERVHdevwMHG3kbQvIsomo0Pwr_cl7TYQD55SUighIf2-l3zve4RcQIIYK7hgOvGBcRMppqQqWGJMwkELHjgj7Wwo0mf-OI7GDXK5yoUBACc-A88-urv8YpYv7FFZRzpKiwH6RsQ5j6psrdWJir2yi9ZOewjUcVyb-QQ86oQWnHjsudb9gtfVVByY9LZJthxGpSF59xal9vLvXw6N_x3nDmmv0_bo0wqQdkkDpntke1m3gdbbuEX61pFDfbDMCSmBObyqNHHsFlGtoDZt5K2kma2P8gq08qa2769oOvuiPTWnN3OgL3DdJqPe_eguZXVRBaaQ3ZRMFjpXBklAAoE0wgcjOLKwMMiRukAYJxhhgChEYgCZlJYmVCC50UVicp_rcJ80p7MpHBCK1CpHstgNQoNhj-FK4ieDrta-tEEIHJKWnZrJZ2WbMaln5ejv7nOymY6ywWTwMOwfky3kJU6W4YcnpFnOF3CK2F_qM7fkPwSbqoY |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2018+33rd+IEEE%2FACM+International+Conference+on+Automated+Software+Engineering+%28ASE%29&rft.atitle=Neural-Machine-Translation-Based+Commit+Message+Generation%3A+How+Far+Are+We%3F&rft.au=Liu%2C+Zhongxin&rft.au=Xia%2C+Xin&rft.au=Hassan%2C+Ahmed+E.&rft.au=Lo%2C+David&rft.date=2018-09-03&rft.pub=ACM&rft.eissn=2643-1572&rft.spage=373&rft.epage=384&rft_id=info:doi/10.1145%2F3238147.3238190&rft.externalDocID=9000000 |