Deep Learning in Bioinformatics Techniques and Applications in Practice

Deep Learning in Bioinformatics: Techniques and Applications in Practice introduces the topic in an easy-to-understand way, exploring how it can be utilized for addressing important problems in bioinformatics, including drug discovery, de novo molecular design, sequence analysis, protein structure p...

Full description

Saved in:
Bibliographic Details
Main Author Izadkhah, Habib
Format eBook
LanguageEnglish
Published Chantilly Elsevier Science & Technology 2022
Academic Press
Edition1
Subjects
Online AccessGet full text
ISBN0128238224
9780128238226
DOI10.1016/C2020-0-00432-9

Cover

Table of Contents:
  • Front Cover -- Deep Learning in Bioinformatics -- Copyright -- Contents -- Acknowledgments -- Preface -- 1 Why life science? -- 1.1 Introduction -- 1.2 Why deep learning? -- 1.3 Contemporary life science is about data -- 1.4 Deep learning and bioinformatics -- 1.5 What will you learn? -- 2 A review of machine learning -- 2.1 Introduction -- 2.2 What is machine learning? -- 2.3 Challenge with machine learning -- 2.4 Overfitting and underfitting -- 2.4.1 Mitigating overfitting -- 2.4.2 Adjusting parameters using cross-validation -- 2.4.3 Cross-validation methods -- 2.5 Types of machine learning -- 2.5.1 Supervised learning -- 2.5.2 Unsupervised learning -- 2.5.3 Reinforcement learning -- 2.6 The math behind deep learning -- 2.6.1 Tensors -- 2.6.2 Relevant mathematical operations -- 2.6.3 The math behind machine learning: statistics -- 2.7 TensorFlow and Keras -- 2.8 Real-world tensors -- 2.9 Summary -- 3 An introduction of Python ecosystem for deep learning -- 3.1 Basic setup -- 3.2 SciPy (scientific Python) ecosystem -- 3.3 Scikit-learn -- 3.4 A quick refresher in Python -- 3.4.1 Identifier -- 3.4.2 Comments -- 3.4.3 Data type -- 3.4.4 Control flow statements -- 3.4.5 Data structure -- 3.4.6 Functions -- 3.5 NumPy -- 3.6 Matplotlib crash course -- 3.7 Pandas -- 3.8 How to load dataset -- 3.8.1 Considerations when loading CSV data -- 3.8.2 Pima Indians diabetes dataset -- 3.8.3 Loading CSV files in NumPy -- 3.8.4 Loading CSV files in Pandas -- 3.9 Dimensions of your data -- 3.10 Correlations between features -- 3.11 Techniques to understand each feature in the dataset -- 3.11.1 Histograms -- 3.11.2 Box-and-whisker plots -- 3.11.3 Correlation matrix plot -- 3.12 Prepare your data for deep learning -- 3.12.1 Scaling features to a range -- 3.12.2 Data normalizing -- 3.12.3 Binarize data (make binary) -- 3.13 Feature selection for machine learning
  • 3.13.1 Univariate selection -- 3.13.2 Recursive feature elimination -- 3.13.3 Principal component analysis -- 3.13.4 Feature importance -- 3.14 Split dataset into training and testing sets -- 3.15 Summary -- 4 Basic structure of neural networks -- 4.1 Introduction -- 4.2 The neuron -- 4.3 Layers of neural networks -- 4.4 How a neural network is trained? -- 4.5 Delta learning rule -- 4.6 Generalized delta rule -- 4.7 Gradient descent -- 4.7.1 Stochastic gradient descent -- 4.7.2 Batch gradient descent -- 4.7.3 Mini-batch gradient descent -- 4.8 Example: delta rule -- 4.8.1 Implementation of the SGD method -- 4.8.2 Implementation of the batch method -- 4.9 Limitations of single-layer neural networks -- 4.10 Summary -- 5 Training multilayer neural networks -- 5.1 Introduction -- 5.2 Backpropagation algorithm -- 5.3 Momentum -- 5.4 Neural network models in keras -- 5.5 `Hello world!' of deep learning -- 5.6 Tuning hyperparameters -- 5.7 Data preprocessing -- 5.7.1 Vectorization -- 5.7.2 Value normalization -- 5.8 Summary -- 6 Classification in bioinformatics -- 6.1 Introduction -- 6.1.1 Binary classification -- 6.1.2 Pima indians onset of diabetes dataset -- 6.1.2.1 Import libraries -- 6.1.2.2 Load data -- 6.1.2.3 Keras model -- 6.1.2.4 Compile the model -- 6.1.2.5 Fit the model -- 6.1.2.6 Evaluate the model -- 6.1.2.7 Tie it all together -- 6.1.2.8 Make predictions -- 6.1.3 Label encoding -- 6.2 Multiclass classification -- 6.2.1 Sigmoid and softmax activation functions -- 6.2.2 Types of classification -- 6.3 Summary -- 7 Introduction to deep learning -- 7.1 Introduction -- 7.2 Improving the performance of deep neural networks -- 7.2.1 Vanishing gradient -- 7.2.2 Overfitting -- 7.2.2.1 Reducing the network's size -- 7.2.2.2 Dropout -- 7.2.2.3 Weight regularization -- 7.2.3 Computational load -- 7.3 Configuring the learning rate in keras
  • 8.7.3.5 Preprocessing data -- 8.7.3.6 Dealing with imbalanced data -- 8.7.3.7 Defining the sequential model -- 8.7.4 Conclusion -- 8.8 Diabetic retinopathy detection -- 8.8.1 Goals -- 8.8.2 Introduction and task definition -- 8.8.3 Implementation -- 8.8.3.1 Importing required libraries and reading the data -- 8.8.3.2 Preprocessing data -- 8.8.3.3 Defining model based on functional API -- 8.8.3.4 Defining another model using ResNet50 model -- 8.8.4 Conclusion -- 8.9 Summary -- 9 Popular deep learning image classifiers -- 9.1 Introduction -- 9.2 LeNet-5 -- 9.3 AlexNet -- 9.4 ZFNet -- 9.5 VGGNet -- 9.6 GoogLeNet/inception -- 9.7 ResNet -- 9.8 DenseNet -- 9.9 SE-Net -- 9.10 Summary -- 10 Electrocardiogram (ECG) arrhythmia classification -- 10.1 Introduction -- 10.2 MIT-BIH arrhythmia database -- 10.3 Preprocessing -- 10.4 Data augmentation -- 10.5 Architecture of the CNN model -- 10.6 Summary -- 11 Autoencoders and deep generative models in bioinformatics -- 11.1 Introduction -- 11.2 Autoencoders -- 11.2.1 Encoder -- 11.2.2 Decoder -- 11.2.3 Distance function -- 11.3 Variant types of autoencoders -- 11.3.1 Undercomplete autoencoders -- 11.3.2 Deep autoencoders -- 11.3.3 Convolutional autoencoders -- 11.3.4 Sparse autoencoders -- 11.3.5 Denoising autoencoders -- 11.3.6 Variational autoencoders -- Intuition -- VAE is a generative model -- How does a variational autoencoder work? -- Creating decoder -- Building the architecture of the VAE: connecting the encoder and decoder -- Defining loss function and compiling model -- 11.3.7 Contractive autoencoders -- 11.4 An example of denoising autoencoders - bone suppression in chest radiographs -- 11.4.1 Architecture -- 11.5 Implementation of autoencoders for chest X-ray images (pneumonia) -- 11.5.1 Undercompleted autoencoder -- 11.5.2 Sparse autoencoder -- 11.5.3 Denoising autoencoder
  • 7.3.1 Adaptive learning rate -- 7.3.2 Layer weight initializers -- 7.4 Imbalanced dataset -- 7.5 Breast cancer detection -- 7.5.1 Goals -- 7.5.2 Introduction and task definition -- 7.5.3 Implementation -- 7.5.3.1 Loading, preprocessing, preparations for modeling -- 7.5.3.2 Fully connected neural network (FCNN) -- 7.5.3.3 Adding dropout to the network (FCNN + dropout) -- 7.5.3.4 Adding L2 weight regularization (FCNN + L2) -- 7.5.3.5 Adding L2 weight regularization and dropout (FCNN + L2 + dropout) -- 7.5.3.6 Adding L1_L2 weight regularization (FCNN + L1_L2) -- 7.5.3.7 Reducing the size of the network -- 7.5.3.8 Summary -- 7.6 Molecular classification of cancer by gene expression -- 7.6.1 Goals -- 7.6.2 Introduction and task definition -- 7.6.3 Implementation -- 7.6.3.1 Loading, preprocessing, preparations for modeling -- 7.6.3.2 Dimension reduction using principal component analysis (PCA) -- 7.6.3.3 Model -- 7.7 Summary -- 8 Medical image processing: an insight to convolutional neural networks -- 8.1 Convolutional neural network architecture -- 8.2 Convolution layer -- 8.3 Pooling layer -- 8.4 Stride and padding -- 8.5 Convolutional layer in keras -- 8.6 Coronavirus (COVID-19) disease diagnosis -- 8.6.1 Goals -- 8.6.2 Introduction and task definition -- 8.6.3 Implementation -- 8.6.3.1 Importing required libraries -- 8.6.3.2 Plotting some instances of the dataset -- 8.6.3.3 Defining the model -- 8.6.3.4 Discussing the relevance of deep learning for small-data problems -- 8.6.3.5 Predicting covid-19 -- 8.6.4 Conclusion -- 8.7 Predicting breast cancer -- 8.7.1 Goals -- 8.7.2 Introduction and task definition -- 8.7.3 Implementation -- 8.7.3.1 Importing required libraries -- 8.7.3.2 Looking for all available directories in Kaggle account -- 8.7.3.3 Plotting images using cv2 module -- 8.7.3.4 Finding specific pattern in the name of images
  • 11.5.4 Variational autoencoder -- 11.5.5 Contractive autoencoder -- 11.6 Generative adversarial network -- 11.6.1 GAN network architecture -- 11.6.2 GAN network cost function -- 11.6.3 Cost function optimization process in GAN -- 11.6.4 General GAN training process -- 11.7 Convolutional generative adversarial network -- 11.7.1 Deconvolution layer -- 11.7.2 DCGAN network structure -- 11.8 Summary -- 12 Recurrent neural networks: generating new molecules and proteins sequence classification -- 12.1 Introduction -- 12.2 Types of recurrent neural network -- 12.3 The problem, short-term memory -- 12.4 Bidirectional LSTM -- 12.5 Generating new molecules -- 12.5.1 Simplified molecular-input line-entry system -- 12.5.2 A generative model for molecules -- 12.5.3 Generating new SMILES -- 12.5.4 Analyzing the generative model's output -- 12.6 Protein sequence classification -- 12.6.1 Protein structure -- 12.6.2 Protein function -- 12.6.3 Prediction of protein function -- 12.6.4 LSTM with dropout -- 12.6.5 LSTM with bidirectional and CNN -- 12.7 Summary -- 13 Application, challenge, and suggestion -- 13.1 Introduction -- 13.2 Legendary deep learning architectures, CNN, and RNN -- 13.3 Deep learning applications in bioinformatics -- 13.4 Biological networks -- 13.4.1 Learning tasks on graphs -- 13.4.2 Graph neural networks -- 13.5 Perspectives, limitations, and suggestions -- 13.6 DeepChem, a powerful library for bioinformatics -- 13.7 Summary -- Index -- Back Cover