A2: Extracting cyclic switchings from DOB-nets for rejecting excessive disturbances
•Proposed an Attention-based Abstraction (A2) approach to analyze a Disturbance OBserver network (DOB-net) that actively rejects excessive external disturbances.•Quantized and abstracted the learned DOB-net via A2 and then obtained a key Moore machine network that partially reveals the interplay bet...
        Saved in:
      
    
          | Published in | Neurocomputing (Amsterdam) Vol. 400; pp. 161 - 172 | 
|---|---|
| Main Authors | , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier B.V
    
        04.08.2020
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0925-2312 1872-8286  | 
| DOI | 10.1016/j.neucom.2020.03.014 | 
Cover
| Summary: | •Proposed an Attention-based Abstraction (A2) approach to analyze a Disturbance OBserver network (DOB-net) that actively rejects excessive external disturbances.•Quantized and abstracted the learned DOB-net via A2 and then obtained a key Moore machine network that partially reveals the interplay between the learned control strategy and disturbances.•Found switching mechanisms in the resultant control for rejecting various unobservable (in a statistical sense) disturbances.•Analyzed the captured switching mechanisms via an analogy to hybrid approaches for often-saturated systems and found that the discrete-event subsystem can be obtained by the proposed A2.
[Display omitted]
Reinforcement Learning (RL) is limited in practice by its poor explainability, which is responsible for insufficient trustiness from users, unsatisfied interpretation for human intervention, inadequate analysis for future improvement, etc. This paper seeks to partially characterize the interplay between dynamical environments and a previously-proposed Disturbance OBserver net (DOB-net). The DOB-net is trained via RL and offers optimal control for a set of Partially Observable Markovian Decision Processes (POMDPs). The transition function of each POMDP is largely determined by the environments (excessive external disturbances). This paper proposes an Attention-based Abstraction (A2) approach to extract a finite-state automaton, referred to as a Key Moore Machine Network (KMMN), to capture the switching mechanisms exhibited by the DOB-net in dealing with multiple such POMDPs. A2 first quantizes the controlled platform by learning continuous-discrete interfaces. Then it extracts the KMMN by finding the key hidden states and transitions that attract sufficient attention from the DOB-net. Within the resultant KMMN, three patterns of cyclic switchings (between key hidden states) are found, and saturated controls are shown synchronized with unknown disturbances. Interestingly, the found switchings have previously appeared in the control design for often-saturated systems. They are interpreted via an analogy to the discrete-event subsystem of hybrid control. | 
|---|---|
| ISSN: | 0925-2312 1872-8286  | 
| DOI: | 10.1016/j.neucom.2020.03.014 |