Reusability report: Learning the transcriptional grammar in single-cell RNA-sequencing data using transformers
The rise of single-cell genomics is an attractive opportunity for data-hungry machine learning algorithms. The scBERT method, inspired by the success of BERT (‘bidirectional encoder representations from transformers’) in natural language processing, was recently introduced by Yang et al. as a data-d...
Saved in:
| Published in | Nature machine intelligence Vol. 5; no. 12; pp. 1437 - 1446 |
|---|---|
| Main Authors | , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
London
Nature Publishing Group UK
01.12.2023
Nature Publishing Group |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2522-5839 2522-5839 |
| DOI | 10.1038/s42256-023-00757-8 |
Cover
| Summary: | The rise of single-cell genomics is an attractive opportunity for data-hungry machine learning algorithms. The scBERT method, inspired by the success of BERT (‘bidirectional encoder representations from transformers’) in natural language processing, was recently introduced by Yang et al. as a data-driven tool to annotate cell types in single-cell genomics data. Analogous to contextual embedding in BERT, scBERT leverages pretraining and self-attention mechanisms to learn the ‘transcriptional grammar’ of cells. Here we investigate the reusability beyond the original datasets, assessing the generalizability of natural language techniques in single-cell genomics. The degree of imbalance in the cell-type distribution substantially influences the performance of scBERT. Anticipating an increased utilization of transformers, we highlight the necessity to consider data distribution carefully and introduce a subsampling technique to mitigate the influence of an imbalanced distribution. Our analysis serves as a stepping stone towards understanding and optimizing the use of transformers in single-cell genomics.
scBERT, a pretrained neural network for single-cell sequencing tasks, was published last year in
Nature Machine Intelligence
. To test the reusability of the method, Khan et al. use the code to assess the generalizablility of transformer architectures on single-cell genomics tasks. |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 2522-5839 2522-5839 |
| DOI: | 10.1038/s42256-023-00757-8 |