ARVO-CL: The OpenCL version of the ARVO package — An efficient tool for computing the accessible surface area and the excluded volume of proteins via analytical equations
Introduction of Graphical Processing Units (GPUs) and computing using GPUs in recent years opened possibilities for simple parallelization of programs. In this update, we present the modernized version of program ARVO [J. Buša, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorný, J....
Saved in:
| Published in | Computer physics communications Vol. 183; no. 11; pp. 2494 - 2497 |
|---|---|
| Main Authors | , , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
01.11.2012
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 0010-4655 1879-2944 |
| DOI | 10.1016/j.cpc.2012.04.019 |
Cover
| Summary: | Introduction of Graphical Processing Units (GPUs) and computing using GPUs in recent years opened possibilities for simple parallelization of programs. In this update, we present the modernized version of program ARVO [J. Buša, J. Dzurina, E. Hayryan, S. Hayryan, C.-K. Hu, J. Plavka, I. Pokorný, J. Skivánek, M.-C. Wu, Comput. Phys. Comm. 165 (2005) 59]. The whole package has been rewritten in the C language and parallelized using OpenCL. Some new tricks have been added to the algorithm in order to save memory much needed for efficient usage of graphical cards. A new tool called ‘input_structure’ was added for conversion of pdb files into files suitable for work with the C and OpenCL version of ARVO.
Program title: ARVO-CL
Catalog identifier: ADUL_v2_0
Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADUL_v2_0.html
Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland
Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html
No. of lines in distributed program, including test data, etc.: 11834
No. of bytes in distributed program, including test data, etc.: 182528
Distribution format: tar.gz
Programming language: C, OpenCL.
Computer: PC Pentium; SPP’2000.
Operating system: All OpenCL capable systems.
Has the code been vectorized or parallelized?: Parallelized using GPUs. A serial version (non GPU) is also included in the package.
Classification: 3.
External routines: cl.hpp (http://www.khronos.org/registry/cl/api/1.1/cl.hpp)
Catalog identifier of previous version: ADUL_v1_0
Journal reference of previous version: Comput. Phys. Comm. 165(2005)59
Does the new version supercede the previous version?: Yes
Nature of problem: Molecular mechanics computations, continuum percolation
Solution method: Numerical algorithm based on the analytical formulas, after using the stereographic transformation.
Reasons for new version: During the past decade we have published a number of protein structure related algorithms and software packages [1,2,3,4,5,6] which have received considerable attention from researchers and interesting applications of such packages have been found. For example, ARVO [4] has been used to find that ratios of volume V to surface area A, for proteins in Protein Data Bank (PDB) distribute in a narrow range [7]. Such a result is useful for finding native structures of proteins.
Therefore, we consider that there is a demand to revise and modernize these tools and to make them more efficient. Here we present the new version of the ARVO package. The original ARVO package was written in the FORTRAN language. One of the reasons for the new version is to rewrite it in C in order to make it more friendly to the young researchers who are not familiar with FORTRAN. Another, more important reason is to use the possibilities for speeding-up provided by modern graphical cards. We also want to eliminate the necessity of re-compiling the program for every molecule. For this purpose, we have added the possibility of using general pdb [8] files as an input. Once compiled, the program can receive any number of input files successively. Also, we found it necessary to go through the algorithm and to make some tricks for avoiding unnecessary memory usage so that the package becomes more efficient.
Summary of revisions: 1. New tool. ARVO is designed to calculate the volume and accessible surface area of an arbitrary system of overlapping spheres (representing atoms), the biomolecules being just one albeit important, application. The user provides the coordinates and radii of the spheres as well as the radius of the probe sphere (water molecule for biomolecules). In the old version of ARVO the input of data was organized immediately in the code, which made it necessary to re-compile the program after every change in the input data. In the current version a module called ‘input_structure’ has been created to input the data from an independent external file. The coordinates and radii are stored in the file with extension *.ats (see the directory ‘input’ in the package). Each line in the file corresponds to one sphere (atom) and has the format 24.733−4.992−13.2562.800. The first three numbers are the (x,y,z) coordinates of the atom and the last one is the radius. It is important to remember that the radius of the probe sphere must be already added to this number. In the above example, the value 2.800 is obtained by the formula “sphere radius+probe sphere radius”. In the case of the arbitrary system of spheres the file *.ats is created by the user. In the case of proteins the ‘input_structure’ takes as an input a file in the format compatible with Protein Data Bank (pdb) format [8] and creates a corresponding *.ats file. It also assigns automatically, radii to individual spheres and (optionally) adds to all radii the probe sphere (water molecule) radius. As output, it produces a file containing coordinates of spheres together with radii. This file works automatically as an input for ARVO. Using an external tool allows users to create their own mappings of atoms and radii without the need to re-compile the tool ‘input_structure’ or program ARVO. It is again the user’s responsibility to assign proper radii to each type of atom. One can use any of the published standard sets of radii (see for example, [9,10,11,12,13]). Alternatively, the user can assign his own values for radii immediately in the module input_structure. The radii are assigned in a special file with extension *pds (see the documentation) which consists of lines like this: ATOM CA ALA 2.0 which is read as “the Calpha atom of Alanine has radius 2.0 Angstroms”. Here we provide for testing of the file rashin.pds where the radii are assigned according to [12].
The output file contains only recognized atoms. Atoms that were not recognized (are not part of mapping) are written to a separate log file allowing the user to review and correct the mapping files later.
2. The Language. Implementing the program in C is a natural first step when translating a program into OpenCL. This implementation is rewritten line-by-line from the original FORTRAN version of ARVO.
3. OpenCL implementation. OpenCL [14] is an open standard for parallel programming of heterogeneous systems. Unlike other parallelization technologies like CUDA [15] or ATI Stream [16] which are interconnected with specific hardware (produced by NVIDIA or ATI, respectively), OpenCL is vendor-independent, and programs written in OpenCL can be run on any hardware of companies supporting this standard, including AMD, INTEL, and NVIDIA. Programs written in OpenCL can be run without much change both on CPUs and GPUs.
Improvements as compared with the original version: Support for files in the format as created by ‘input_structure’; input of parameters (name of input file) via command line; dynamic size of arrays—removal of the necessity to re-compile the program after any change in size of structures; memory allocation according to the real demands of the application; replacing north pole test by slight reduction of the radius (see below).
To compile an OpenCL program, one needs to download and install the appropriate driver and software development kit (SDK). The program itself consists of two parts: a part running on the CPU and a part running on the GPU. The CPU initializes communication between the computer and the GPU, load data, processes and exports results. The GPU does the parallel part of calculation, consisting of the search for neighboring atoms and calculating the contribution of the area and volume of the individual atom to the total area and volume of the molecule. For details of the algorithm, please read Refs. [3,4].
In programming using OpenCL, more attention must be given to memory used than in a classical approach. Memory of the device is usually limited and therefore, some changes to the original algorithm are necessary. First, unlike in the FORTRAN version of the program, no structures containing the list of neighbor atoms are created. The search for the neighbors is done on-line, when the calculation of the contribution from individual atoms is being performed. Table 1Comparison of volumes and surface areas of different proteins obtained by original ARVO and by the new version. Different strategies for dealing with the “north pole” are applied. The first column contains the PDB ID of the protein and the number of atoms. Second column contains the volume of the protein obtained with original ARVO (upper number) and the difference with the new approach (lower number). Third column contains the same as in the second column for the surface area. Fourth column contains the number of rotations of the molecule in original ARVO (upper number) and the number of atoms whose radii have been reduced in the new version (lower number). Fifth column contains the relative errors for the volume (upper number) and the area (lower number).Protein atoms #Volume diffArea diffRotat. reduct.δvolume (%) δarea (%)3rn323,951.1804696858.3226363−1.04⋅10−7957−0.000025−0.0000071−1.02⋅10−73cyt40,875.86739511,455.4748323−3.85⋅10−61600−0.0015750.00141541.24⋅10−42act38,608.2430389054.00735041.28⋅10−416570.0494800.00173321.91⋅10−52brd43,882.73547910,918.20352921−7.84⋅10−71738−0.000344−0.0000971−8.88⋅10−78tln56,698.98888312,496.97806415−1.70⋅10−62455−0.0009660.00045943.67⋅10−61rr8105,841.50219227,983.15977218−6.60⋅10−74108−0.000699−0.0002144−7.65⋅10−71xi51743,445.092001863,139.88270314.42⋅10−715,6960.0077090.00007018.11⋅10−9
The strategy behind the North Pole check and molecule rotation [4, Sec. 4.7] has been changed. If during the north pole test, the north pole of the active sphere lies close to the surface of a neighboring sphere, the radius of such a neighboring sphere is multiplied by 0.9999 instead of rotating the whole molecule. This allows the algorithm to continue normally. Changing the radius of one atom changes the area and the volume of this atom by 0.02% and 0 |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
| ISSN: | 0010-4655 1879-2944 |
| DOI: | 10.1016/j.cpc.2012.04.019 |