TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology

With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated fr...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 14; no. 11; p. e0225196
Main Authors	Fang, Yong, Han, Shengjun, Huang, Cheng, Wu, Runpu
Format	Journal Article
Language	English
Published	United States Public Library of Science 18.11.2019 Public Library of Science (PLoS)
Subjects	Algorithms Analysis Applications programs Artificial intelligence Biology and Life Sciences Cable television broadcasting industry Codes Computer and Information Sciences Computer security Content management software Cybersecurity Data flow analysis Databases, Factual Datasets Deep Learning Engineering and Technology Hypertext International conferences International economic relations Internet Internet software Iterative methods Language preprocessors Long short-term memory Machine learning Neural networks Principal components analysis ROC Curve Security Social Sciences Software Software upgrading Source code Structured Query Language-SQL Technology Web applications Web Browser Websites United States Taiwan China
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0225196

Cover

More Information
Summary:	With the widespread usage of Web applications, the security issues of source code are increasing. The exposed vulnerabilities seriously endanger the interests of service providers and customers. There are some models for solving this problem. However, most of them rely on complex graphs generated from source code or regex patterns based on expert experience. In this paper, TAP, which is based on token mechanism and deep learning technology, was proposed as an analysis model to discover the vulnerabilities of PHP: Hypertext Preprocessor (PHP) Web programs conveniently and easily. Based on the token mechanism of PHP language, a custom tokenizer was designed, and it unifies tokens, supports some features of PHP and optimizes the parsing. Besides, the tokenizer also implements parameter iteration to achieve data flow analysis. On the Software Assurance Reference Dataset(SARD) and SQLI-LABS dataset, we trained the deep learning model of TAP by combining the word2vec model with Long Short-Term Memory (LSTM) network algorithm. According to the experiment on the dataset of CWE-89, TAP not only achieves the 0.9941 Area Under the Curve(AUC), which is better than other models, but also achieves the highest accuracy: 0.9787. Further, compared with RIPS, TAP shows much better in multiclass classification with 0.8319 Kappa and 0.0840 hamming distance.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0225196