A grammar for spreadsheet formulas evaluated on two large datasets

Spreadsheets are ubiquitous in the industrial world and often perform a role similar to other computer programs, which makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research o...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM) pp. 121 - 130
Main Authors Aivaloglou, Efthimia, Hoepelman, David, Hermans, Felienne
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.09.2015
Subjects
Online AccessGet full text
DOI10.1109/SCAM.2015.7335408

Cover

Abstract Spreadsheets are ubiquitous in the industrial world and often perform a role similar to other computer programs, which makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research on spreadsheet codebases. This paper presents a grammar for spreadsheet formulas that is compatible with the spreadsheet formula language, is compact enough to feasibly implement with a parser generator, and produces parse trees aimed at further manipulation and analysis. We evaluate the grammar against more than one million unique formulas extracted from the well known EUSES and Enron spreadsheet datasets, successfully parsing 99.99%. Additionally, we utilize the grammar to analyze these datasets and measure the frequency of usage of language features in spreadsheet formulas. Finally, we identify smelly constructs and uncommon cases in the syntax of formulas.
AbstractList Spreadsheets are ubiquitous in the industrial world and often perform a role similar to other computer programs, which makes them interesting research targets. However, there does not exist a reliable grammar that is concise enough to facilitate formula parsing and analysis and to support research on spreadsheet codebases. This paper presents a grammar for spreadsheet formulas that is compatible with the spreadsheet formula language, is compact enough to feasibly implement with a parser generator, and produces parse trees aimed at further manipulation and analysis. We evaluate the grammar against more than one million unique formulas extracted from the well known EUSES and Enron spreadsheet datasets, successfully parsing 99.99%. Additionally, we utilize the grammar to analyze these datasets and measure the frequency of usage of language features in spreadsheet formulas. Finally, we identify smelly constructs and uncommon cases in the syntax of formulas.
Author Aivaloglou, Efthimia
Hoepelman, David
Hermans, Felienne
Author_xml – sequence: 1
  givenname: Efthimia
  surname: Aivaloglou
  fullname: Aivaloglou, Efthimia
  email: e.aivaloglou@tudelft.nl
  organization: Software Eng. Res. Group, Delft Univ. of Technol., Delft, Netherlands
– sequence: 2
  givenname: David
  surname: Hoepelman
  fullname: Hoepelman, David
  email: d.j.hoepelman@student.tudelft.nl
  organization: Software Eng. Res. Group, Delft Univ. of Technol., Delft, Netherlands
– sequence: 3
  givenname: Felienne
  surname: Hermans
  fullname: Hermans, Felienne
  email: f.f.j.hermans@tudelft.nl
  organization: Software Eng. Res. Group, Delft Univ. of Technol., Delft, Netherlands
BookMark eNotj8tOwzAQRY0EErT0AxAb_0DC2I5jexkiKEhFLIB1NbEnJVIelZ2C-HuK6OrqbI7OXbDzcRqJsRsBuRDg7t7q6iWXIHRulNIF2DO2EEVplNHSyUu2SqlrQAE4Jwtzxe4rvos4DBh5O0We9pEwpE-i-Y-HQ4-J0xf2B5wp8Gnk8_fEe4w74gFnTDSna3bRYp9oddol-3h8eK-fss3r-rmuNlknwc5ZEQClVT7oYBrvAmprCHwZ3LFOOCwBylAIpKJ1HoL0PgB5S1a0rdONUUt2--_tiGi7j90x-md7uql-AbXTSyw
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/SCAM.2015.7335408
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 1467375292
9781467375290
EndPage 130
ExternalDocumentID 7335408
Genre orig-research
GroupedDBID 6IE
6IL
ALMA_UNASSIGNED_HOLDINGS
CBEJK
RIB
RIC
RIE
RIL
ID FETCH-LOGICAL-i208t-4d0a283cd5d7bc9da587e0c6d967319a6006d41ae4f9c0d2ccd0ec8e81ff95b73
IEDL.DBID RIE
IngestDate Wed Dec 20 05:19:05 EST 2023
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i208t-4d0a283cd5d7bc9da587e0c6d967319a6006d41ae4f9c0d2ccd0ec8e81ff95b73
PageCount 10
ParticipantIDs ieee_primary_7335408
PublicationCentury 2000
PublicationDate 20150901
PublicationDateYYYYMMDD 2015-09-01
PublicationDate_xml – month: 09
  year: 2015
  text: 20150901
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE 15th International Working Conference on Source Code Analysis and Manipulation (SCAM)
PublicationTitleAbbrev SCAM
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib030099247
Score 1.5980511
Snippet Spreadsheets are ubiquitous in the industrial world and often perform a role similar to other computer programs, which makes them interesting research targets....
SourceID ieee
SourceType Publisher
StartPage 121
SubjectTerms Arrays
Generators
Grammar
Indexes
Production
Spreadsheet programs
Syntactics
Title A grammar for spreadsheet formulas evaluated on two large datasets
URI https://ieeexplore.ieee.org/document/7335408
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PS8MwFA7bTp5UNvE3OXi0XdolTXKcwzGEiaCD3UaavKLo2rGmCP71Jm2nKB68JYFHEl7C-17e914QupI0VpkwEDBq6qcbFigSmQCo8P6Dc4rqCP78Ppkt6N2SLTvo-isXBgBq8hmEvlnH8k2hK_9UNnSiDmCILupyLptcrd3ZGXmoE1PeBi4jIoePk_Hcc7dY2Mr9-EClth_TfTTfzdzQRl7Dyqah_vhVlPG_SztAg-9MPfzwZYMOUQfyProZY8-5WqstdpAUlxsHDE35DGB9f105wIzbKt9gcJFj-17gN08Jx54wWoItB2gxvX2azIL2r4TgJSbCBtQQ5ZCCNszwVEujmOBAdGJkwt0tUw7XJIZGCmgmNTGx1oaAFiCiLJMs5aMj1MuLHI4RzpgmcSpS5fTr3BmiwFCgDjhkjHGpohPU9_tfbZpyGKt266d_D5-hPa-DhpZ1jnp2W8GFs-M2vawV-AkR8p5y
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA9zHvSksonf5uDRdmmXtOlxDsfUdQhusNtIk1eUuXasLYJ_vUnbTRQP3pJAyMdLeL-X93svCN0E1BUxV2AxqsqnG2YJ4igLKDf2gzaKSg9-OPaGU_o4Y7MGut3GwgBAST4D2xRLX75KZWGeyjq6qwYYfAftMm1V-FW01ub0dA3Ycalfuy4dEnRe-r3QsLeYXff88YVKqUEGByjcjF0RRxZ2kUe2_PyVlvG_kztE7e9YPfy81UJHqAFJC931sGFdLcUaa1CKs5WGhip7BchNfVloyIzrPN-gcJrg_CPF74YUjg1lNIM8a6Pp4H7SH1r1bwnWm0t4blFFhMYKUjHlRzJQgnEfiPRU4Pn6ngmNbDxFHQE0DiRRrpSKgOTAnTgOWOR3j1EzSRM4QThmkrgRj4SWsN5nIkBRoBo6xIz5gXBOUcusf76qEmLM66Wf_d18jfaGk3A0Hz2Mn87RvpFHRdK6QM18XcCl1up5dFUK8wuUx6HD
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2015+IEEE+15th+International+Working+Conference+on+Source+Code+Analysis+and+Manipulation+%28SCAM%29&rft.atitle=A+grammar+for+spreadsheet+formulas+evaluated+on+two+large+datasets&rft.au=Aivaloglou%2C+Efthimia&rft.au=Hoepelman%2C+David&rft.au=Hermans%2C+Felienne&rft.date=2015-09-01&rft.pub=IEEE&rft.spage=121&rft.epage=130&rft_id=info:doi/10.1109%2FSCAM.2015.7335408&rft.externalDocID=7335408