A multi-tasking model of speaker-keyword classification for keeping human in the loop of drone-assisted inspection
Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the gro...
        Saved in:
      
    
          | Published in | Engineering applications of artificial intelligence Vol. 117; p. 105597 | 
|---|---|
| Main Authors | , , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier Ltd
    
        01.01.2023
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0952-1976 1873-6769  | 
| DOI | 10.1016/j.engappai.2022.105597 | 
Cover
| Summary: | Audio commands are a preferred communication medium to keep inspectors in the loop of civil infrastructure inspection performed by a semi-autonomous drone. To understand job-specific commands from a group of heterogeneous and dynamic inspectors, a model must be developed cost-effectively for the group and easily adapted when the group changes. This paper is motivated to build a multi-tasking deep learning model that possesses a Share–Split–Collaborate architecture. This architecture allows the two classification tasks to share the feature extractor and then split subject-specific and keyword-specific features intertwined in the extracted features through feature projection and collaborative training. A base model for a group of five authorized subjects is trained and tested on the inspection keyword dataset collected by this study. The model achieved a 95.3% or higher mean accuracy in classifying the keywords of any authorized inspectors. Its mean accuracy in speaker classification is 99.2%. Due to the richer keyword representations that the model learns from the pooled training data Adapting the base model to a new inspector requires only a little training data from that inspector Like five utterances per keyword. Using the speaker classification scores for inspector verification can achieve a success rate of at least 93.9% in verifying authorized inspectors and 76.1% in detecting unauthorized ones. Further The paper demonstrates the applicability of the proposed model to larger-size groups on a public dataset. This paper provides a solution to addressing challenges facing AI-assisted human–robot interaction Including worker heterogeneity Worker dynamics And job heterogeneity.
•The Share–Split–Collaborate multitask learning architecture is suitable for speaker-keyword classification.•Subject-specific and phonetic-specific features intertwined in audio data can be disentangled.•Rich keyword representations are learned from multi-subject spoken command data.•Small data of new speakers are sufficient for adding new classes to the speaker classifier.•Speaker classification scores are also effective for the speaker verification. | 
|---|---|
| ISSN: | 0952-1976 1873-6769  | 
| DOI: | 10.1016/j.engappai.2022.105597 |