Decentralized ËAstrÃ¶m-Wittenmark Self-Tuning ...

Viewer
Transcript

˚ om-Wittenmark Self-Tuning Regulator of a Decentralized Astr¨ Multi-agent Uncertain Coupled ARMAX System Hongbin Ma Temasek Laboratories, National University of Singapore, Singapore 117508 [email protected]

Kai-Yew Lum Temasek Laboratories, National University of Singapore, Singapore 117508 [email protected]

Abstract— Adaptive control for a multi-agent uncertain dynamical system is studied in this paper. The system studied has the following characteristics: (i) there are many agents in this system and the state of each agent dynamically evolves with time; (ii) for each agent, its state evolves like an ARMAX model with unknown coefficients; (iii) each agent is locally intervened by neighborhood agents with unknown linear reactions; (iv) each agent can only use its history information and local information on its neighborhood agents to design its control law aimed at achieving its own local goal, i.e. tracking a local signal ˚ om-Wittenmark self-tuning sequence. In this paper, the Astr¨ regulator, which is a special case (with known high-frequency gain) of extended least squares (ELS) algorithm, is adopted by each agent to estimate the local unknown parameters (including internal parameters and coupling coefficients) and control the local states based on the “certainty equivalence” principle. ˚ om-Wittenmark self-tuning regulator For the decentralized Astr¨ discussed here, its stability and optimality are established rigorously in this paper. Simulation studies demonstrate the effectiveness of the local ELS learning and control algorithm. Index Terms— Multi-agent system, decentralized adaptive ˚ om-Wittenmark self-tuning regcontrol, ARMAX, ELS, Astr¨ ulator, uncertainty.

I. I NTRODUCTION In the last two decades, complex systems (especially complex networks) have attracted the attention of many researchers from different disciplines, ([1], [2], [3], [4], [5], [6]), partly because of their wide range of applications. Usually, complex systems have characteristics such as nonlinearity, multi-hierarchy and uncertainty, which in some sense can be classified into architecture complexity and information uncertainty. Due to these complexity characteristics, research on the control of complex systems has become a challenging direction and few efforts have been devoted in this area. In this paper, we shall consider adaptive control of a particular discrete-time dynamic network. Previous studies of traditional adaptive control ([7], [8], [9], [10], [11]) focus on strategy of centralized control, which may not meet emerging demands for the control of complex systems. In fact, complex control systems with uncertainties are very common in practice, such as power control of cellphones, which motivated some work on decentralized control by the robust control approach ([12], [13], [14], [15], [16]). The methodology of robust control is essentially a worstcase analysis in some sense, thus can only deal with a relatively small range of uncertainties compared with the approach of adaptive control. The latter can usually deal with uncertainties of a large range because a process of estimation or learning is inevitably involved for adaptive controllers. To the best knowledge of the authors, few efforts are made

S. S. Ge Department of Electrical and Computer Engineering, National University of Singapore, Singapore 117576 [email protected]

on adaptive control of complex systems except for some related results on decentralized adaptive control for largescale uncertain systems (e.g. [17]), which usually only focus on weakly interconnected dynamical systems. Besides the lack of basic theory on adaptive control for complex systems, wide practical background also calls for the study on this topic, where smart agents with local control goals are required. We shall take a simple example to make the idea of background clear: the drivers of the cars running on a crowed road must control their cars to follow their own paths while avoiding traffic accidents. In this example, we cannot design control laws for all drivers in a centralized approach and each driver must take actions to adapt to his/her car and the local environment. Obviously such problems cannot be resolved in the framework of traditional adaptive control or decentralized robust control. To facilitate mathematical study on these problems, we give the following simple yet non-trivial problem framework: (i) There are several (maybe many) agents in the whole system, whose evolution can be governed by a corresponding dynamic equation, or state equation. Different agents may have different structures or parameters. (ii) The evolution of each agent may be influenced by other agents, which means that the dynamic equations of the agents are coupled in general. Such influence between agents is usually restricted in a local range, and the extent or intensity of reaction can be parameterized. (iii) There exist communication limits (or called information limits) for all the agents — (a) Each agent knows its internal structure and values of internal parameters, however it does not have access to the internal structure or parameters of the other agents; (b) Each agent does not know the influence from others; and (c) Every agent can observe the states of neighborhood agents as well as its own state. (iv) Under the above-mentioned information limits, each agent may utilize all the information in hand to estimate the intensity of influence and to design local control to change its own state, consequently to influence neighborhood agents. In this framework, how to design the local controllers and what is the relationship between local goals and global goal are fundamental problems. The framework above provides a basis for the study of adaptive control of complex networks with uncertainties. It can be further extended; for example, every agent does not know some of its internal parameters and it must design its control law by estimating its internal parameters, which is the main task in traditional adaptive

control. This framework is different from most existing research on complex network dynamics, in which the dynamics of agents are usually fixed and the agents have no ability to adapt. This framework of adaptive control for complex systems distinguishes itself from the theory of complex adaptive systems (CAS), which is proposed by John Holland [6] and widely used in the research on complex systems, because our framework puts more emphasis on adaptive control technologies with solid mathematical foundations rather than the methodology of multi-agent modeling and computer simulations by rule-based generation systems. In this paper, we shall study a basic adaptive control problem in the framework above. The remainder of this paper is organized as follows: In Section II we shall formulate the problem of adaptive control in our framework, and the local ELS algorithm for each agent will be presented in Section III. Then in Section IV, we present the assumptions used and some preliminary lemmas. For the decentralized ˚ om-Wittenmark self-tuning regulator, i.e. the local ELS Astr¨ algorithm with the high-frequency gains known a priori, we shall rigorously establish its closed-loop stability and the optimality in Section V. Later, we shall illustrate a simulation example in Section VI, and finally some concluding remarks are made in Section VII. II. P ROBLEM F ORMULATION We consider a system with N agents, each of which has the following state equation xi (t + 1) + =

qi P

pi P

aik xi (t − k + 1)

k=1

bik ui (t − k + 1) +

k=1

+[wi (t + 1) +

li P

P

gij xj (t)

j∈Ni

(II.1)

cik wi (t − k + 1)]

∆

Ji (t) = 1t

where xi (t) is the state of agent i, ui (t) is the control signal at time t of agent i, {wi (t)} is the driving noise existing in the local environment of agent i and Ni = {ni1 , ni2 , · · · , ni,mi }, is the set of agent i’s neighborhood agents. From (II.1), we can see that each agent is influenced by its neighborhood agents. By definition, the coupling coefficient, gij , is the intensity of influence of agent j towards agent i. Naturally we assume that agent i can observe the states of agents in Ni . By introducing the following polynomial with respect to the backward shift operator z 1 + ai1 z + ai2 z 2 + · · · + ai,pi z pi bi1 + bi2 z + · · · + bi,qi z qi −1 1 + ci1 z + ci2 z 2 + · · · + ci,li z li

we can rewrite the dynamic model of agent i as Ai (z)xi (t + 1) = Bi (z)u P i (t) + Ci (z)wi (t + 1) gij xj (t), + j∈Ni

(II.2) P With the exception of the coupling item j∈Ni gij xj (t), Eq. (II.2) is the well-known ARMAX model, and integers pi ≥ 0, qi ≥ 1, li ≥ 0 are the orders of the ARMAX model. Because of the broad background and wide applications of

t P i=1

|xi (t) − x∗i (t)|2

is asymptotically minimized, where the deterministic signal {x∗i (t)} is the local tracking goal of agent i. We hope to provide an algorithm for designing the local control laws ui (t) and, then, to analyze its stability and optimality. Remark 1: In this paper, for simplicity, each subsystem is assumed as a single-input-single-output (SISO) ARMAX model. However, the ideas in the proof can be generalized to multiple-input-multiple-output (MIMO) ARMAX subsystems without essential difficulties. III. L OCAL ELS A LGORITHM For agent i, we can rewrite its dynamic model as the following regression model xi (t + 1) = θiτ φ0i (t) + wi (t + 1), with φ0i (t) =

k=1

Ai (z) = Bi (z) = Ci (z) =

the ARMAX model, it has been extensively studied in the traditional adaptive control theory ([18], [8], [19], [11], [20]). However, to the best knowledge of the authors, few research efforts have been devoted to the study of aforementioned coupled ARMAX model, which can be viewed as the first pace to understand adaptive control of general complex systems. Assume that the topological structure of the dynamic network is time-invariant, i.e. the sets N1 , N2 , · · · , NN do not change with time and {aik , 1 ≤ k ≤ pi }, {bik , 1 ≤ k ≤ qi }, {cik , 1 ≤ k ≤ li }, {gij , j ∈ Ni } are unknown parameters for agent i. For agent i, its objective is, at any time t, to design a local feedback control ui (t) based on the history information {xi (0), · · · , xi (t), ui (0), · · · , ui (t − 1)} and its neighbors’ states {xj (t), j ∈ Ni }, so that the average tracking error

θi

=

[xi (t), · · · , xi (t − pi + 1), ui (t), · · · , ui (t − qi + 1), ¯ τ (t)]τ wi (t), · · · , wi (t − li + 1), X i τ τ τ τ τ [−ai , bi , ci , gi ] ,

where aτi bτi cτi giτ ¯ τ (t) X i

= = = = =

[ai1 , · · · , ai,pi ] [bi1 , · · · , bi,qi ] [ci1 , · · · , ci,li ] [gi,ni1 , gi,ni2 , · · · , gi,ni,mi ] [xni1 (t), xni2 (t), · · · , xni,mi (t)].

For the above regression model, the LS (least-squares) algorithm is the most commonly-used identification method to estimate the parameter vector θ; however, since φ0i (t) includes unobservable noise wi (t), in general we cannot obtain φ0i (t) at time t. To resolve this problem, instead of using wi (t) directly, we can use the following posterior estimation (III.1) w ˆi (t + 1) = xi (t + 1) − θˆτ (t + 1)φi (t) i

and, consequently, we obtain an observable vector φi (t) by replacing wi (k) in φ0i (t) with w ˆi (k): φi (t) = θi

=

[xi (t), · · · , xi (t − pi + 1), ui (t), · · · , ui (t − qi + 1), ¯ τ (t)]τ w ˆi (t), · · · , w ˆi (t − li + 1), X i [−aτi , bτi , cτi , giτ ]τ . (III.2)

for some β > 2 and

Then, by the following LS algorithm = θˆi (t) + ai (t)Pi (t)φi (t) ×[xi (t + 1) − φτi (t)θˆi (t)] Pi (t + 1) = Pi (t) − ai (t)Pi (t)φi (t)φτi (t)Pi (t) ai (t) = [1 + φτi (t)Pi (t)φi (t)]−1 (III.3) we can obtain the estimated values θˆi (t) of θi at time t. The initial values can be taken as follows: Pi (0) = α0 I (0 < α0 < 1e ) and θˆi (0) can be chosen arbitrarily. The algorithm (III.3) together with posterior estimation (III.1) is called ELS (extended least-squares) algorithm [11]. For agent i, it can then design its control law ui (t) by the “certainty equivalence” principle, that is to say, it can choose ui (t) such that

t P 1 |wi (k)|2 t t→∞ k=1

θˆi (t + 1)

θˆiτ (t)φi (t) = x∗i (t + 1)

(III.4)

where the estimate θˆi (t) is given by the ELS algorithm. Consequently we obtain ui (t) =

1 {x∗i (t ˆ bi1 (t)

+ 1) +[ˆ ai1 (t)xi (t) + · · · + a ˆi,pi (t)xi (t − pi + 1)] −[ˆbi2 (t)ui (t − 1) + · · · + ˆbi,qi (t)ui (t − qi + 1)] ˆi (t − li + 1)] −[ˆ ci1 (t)wˆi (t) + · · · + cˆi,li (t)w ¯ i (t)}. −ˆ giτ (t)X (III.5) In particular, when the high-frequency gain bi1 is known a priori, let θ¯i denote the parameter vector θi without compo¯i (t) the regression vector φi (t) without component bi1 , φ nent ui (t), and similarly we introduce notations a ¯i (t), P¯i (t) corresponding to ai (t) and Pi (t), respectively. Then, the estimate θ¯i (t) at time t of θ¯i can be updated by the following algorithm: θ¯i (t + 1)

θ¯i (t) + a ¯i (t)P¯i (t)φ¯i (t) ¯τ (t)θ¯i (t)] ×[xi (t + 1) − bi1 ui (t) − φ i τ ¯ ¯ ¯ ¯ ¯ Pi (t + 1) = Pi (t) − a ¯i (t)Pi (t)φi (t)φi (t)P¯i (t) ¯τ (t)P¯i (t)φ ¯i (t)]−1 a ¯i (t) = [1 + φ i (III.6) ˚ om-Wittenmark self-tuning which is called decentralized Astr¨ regulator. Remark 2: In regulation problems, the reference signal is always taken as zero, which is also required by the traditional ˚ om-Wittenmark self-tuning regulator [21]. However, in Astr¨ this paper we just use this terminology to refer to the special case where the high-frequency gains are known, and no limitations are imposed on the reference signal sequences except that they must be bounded and deterministic. In the subsequent parts, for simplicity, we only discuss the ˚ om-Wittenmark self-tuning regulator, but decentralized Astr¨ we shall use notations θi , θˆi (t), φi (t), φ0i (t), Pi (t), ai (t) ¯i (t), φ ¯0 (t), P¯i (t) and a instead of θ¯i , θ¯i (t), φ ¯i (t). i

lim

Assumption 2: (minimum phase condition) Bi (z) 6= 0, ∀z ∈ C : |z| ≤ 1. Assumption 3: (reference signal) {x∗i (t)} is a bounded deterministic signal. Assumption 4: (strict-positive-real condition) |Ci (z) − 1| < 1, ∀z ∈ C : |z| = 1. Remark 3: Assumptions 1-4 are standard in stochastic control and they are met in most cases of interest. Assumption 1 allows unbounded random driving noise; Assumption 2 roughly means causality and stability when the inverse system is unique; Assumption 4 is a standard technical condition on coloured noise. To prove the main result in this paper, we need the following lemmas: Lemma 1: Under Assumption 1, we have |wi (t)| = O(di (t)) where {di (t)} is an increasing sequence and can be taken as tδ (δ can be any positive number). Proof: In fact, by using Markov inequality, we obtain that ∞ P

Suppose that the following are satisfied for each agent i: Assumption 1: (noise condition) {wi (t), Ft } is a martingale difference sequence, with {Ft } being a sequence of nondecreasing σ-algebras, such that sup E[|wi (t + 1)|β |Ft ] < ∞, a.s. t≥0

P (|wi (t + 1)|2 ≥ t2δ |Ft ) ≤

t=1

∞ P t=1

E[|wi (t+1)|β |Ft ] tβδ

<∞

holds almost surely. By applying the Borel-Cantelli-Levy lemma, immediately we have |wi (t + 1)| = O(tδ ), a.s. 2 Lemma 2: If ξ(t + 1) = B(z)u(t), ∀t > 0, where polynomial (q ≥ 1) B(z) = b1 + b2 z + · · · + bq z q−1 satisfies B(z) 6= 0, ∀z : |z| ≤ 1,

=

IV. P RELIMINARY A SSUMPTIONS A ND L EMMAS

= Ri > 0, a.s.

(IV.1)

then there exists a constant λ ∈ (0, 1) such that |u(t)|2 = O(

t+1 P

λt+1−k |ξ(k)|2 ).

(IV.2)

k=0

Proof: Let U (t) = [u(t), u(t − 1), · · · , u(t − q + 2)]τ , then by ξ(t + 1) = B(z)u(t), we can rewrite the expression of u(t) into form of U (t) = AU (t − 1) + Ψ(t + 1), where expressions of A and Ψ(t+1) are omitted here to save space. Consequently it is true because ρ(A) < 1 is implied by Eq. (IV.1). 2 Lemma 3: (Martingale Estimation Theorem) Let {wn+1 , Fn } be a martingale difference sequence and P 1 {fn , Fn } an adapted sequence. Denote sn = ( ni=1 fi2 ) 2 . α If sup E[|wn+1 | |Fn ] < ∞, a.s. for a constant α > 2, then n n P fi wi+1 = O(sn (log(s2n + e))δ ), a.s. for any δ > 21 . i=1

Proof: See [22, Theorem 1.3.10]. 2 Lemma 4: Under Assumption 1, for i = 1, 2, · · · , N , we have t P k=1

|xi (k)|2 → ∞, lim inf t→∞

1 t

t P

|xi (k)|2 ≥ Ri > 0, a.s.

k=1

(IV.3)

Proof: By (III.5) and (II.1), we have xi (t + 1) = gi (t) + wi (t + 1)

(IV.4)

where gi (t) ∈ Ft . Then this lemma can be obtained by t P [xi (k + 1)]2 with the help estimating lower bound of k=1

Step 1: In this step, we analyze dynamics of each agent. We consider agent i for i = 1, 2, · · · , N . By putting the control law into (II.1), we have

of Assumption 1 and Lemma 3. 2 Lemma 5: Under Assumption 1, for i = 1, 2, · · · , N , the ELS algorithm has the following properties almost surely: (a)

xi (t + 1) = bi1 ui (t) + θiτ φi (t) +θiτ (φ0i (t) − φi (t)) + wi (t + 1) = x∗i (t + 1) − θˆiτ (t)φi (t) + θiτ φi (t) +θiτ (φ0i (t) − φi (t)) + wi (t + 1) = x∗i (t + 1) + θ˜iτ (t)φi (t) +θiτ (φ0i (t) − φi (t)) + wi (t + 1)

θ˜iτ (t + 1)Pi−1 (t + 1)θ˜i (t + 1) = O(log ri (t)) (b)

t+1 P

By Lemma 1, we have |wi (t)|2 = O(di (t)). Noticing also

|w ˆi (k) − wi (k)|2 = O(log ri (t))

|θ˜i (t)φi (t)|2

k=1

(c)

t P

αi (t)[1 + φτi (t)Pi (t)φi (t)] αi (t)[1 + φτi (t)Pi (t + 1)φi (t)] +αi (t)φτi (t)[Pi (t) − Pi (t + 1)]φi (t) αi (t)[2 + δi (t)||φi (t)||2 ];

= =

αi (k) = O(log ri (t))

≤

k=1

||φ0i (t) − φi (t)||2

where ∆

δi (t)

=

tr(Pi (t) − Pi (t + 1))

ai (k)

=

∆

[1 + φτi (k)Pi (k)φi (k)]−1 ai (k)|θ˜iτ (k)φt (k)|2 t P 1+ φτi (k)φi (k)

∆

αi (k) = ∆

=

ri (t)

(IV.5)

k=1

AND

2

t→∞

t P

[|xi (k + 1)|2 + |ui (k)|2 ] < ∞,

a.s.

and t P 1 |xi (k t t→∞ k=0

lim

|xi (t + 1)|2

+ 1) − x∗i (k + 1)|2 = Ri ,

a.s.

Remark 4: Theorem 1 states that although each agent only aims to track a local reference signal by local adaptive controller based on ELS algorithm, the whole system achieves global stability. The optimality can also be understood intuitively because in the presence of noise, even when all the parameters are known, the limit of Ji (t) cannot be smaller than Ri . Proof: To prove Theorem 1, we shall apply the main idea, utilized in [11], [22], to estimate the bounds of signals by analyzing some linear inequalities. However, there are some difficulties in analyzing the closed-loop system ˚ om-Wittenmark self-tuning regulator. of decentralized Astr¨ Noting that each agent only uses local estimate algorithm and control law, but the agents are coupled, therefore for a fixed agent i, we cannot estimate the bounds of state xi (t) and control ui (t) without knowing the corresponding bounds for its neighborhood agents. This is the main difficulty of this problem. To resolve this problem, we first analyze every agent, and then consider their relationship globally, finally the estimation of state bounds for each agent can be obtained through both the local and global analyses. In the following analysis, δi (t), ai (k), αi (k) and ri (t) are defined as in Eq. (IV.5).

|w ˆi (t) − wi (t)|2 )

k=1

(V.1)

O(log ri (t));

2αi (t)δi (t)||φi (t)||2 +O(di (t)) + O(log ri (t)).

≤

(V.2)

Now let us estimate ||φi (t)||2 . By Lemma 2, there exists λi ∈ (0, 1) such that |ui (t)|2

k=0

t P

and the boundedness of x∗i (t + 1), we can obtain that

O PTIMALITY

Our main result is stated in the following theorem: Theorem 1: Under Assumptions 1-4, the closed-loop system is stable and optimal, that is to say, for i = 1, 2, · · · , N , we have lim sup 1t

O(

=

Proof: See [23, Lemma 2.5]. V. S TABILITY

=

=

O(

t+1 P

k=0

¯ i (k)||2 (|xi (k)|2 + ||X λt+1−k i

+|wi (k + 1)|2 )). It holds for all i = 1, 2, · · · , N , but we cannot estimate |ui (t)|2 directly because it involves {xj (k), j ∈ Ni } in ¯ i (k). X Let ρ = max(λ1 , · · · , λN ) ∈ (0, 1) X(k) = [x1 (k), · · · , xN (k)]τ ¯ d(k) = max(d1 (k), · · · , dN (k)). Obviously we have ¯ i (k)||2 = O(||X(k)||2 ). |xi (k)|2 = O(||X(k)||2 ), ||X Now define ∆

Lt =

t P

ρt−k ||X(k)||2 .

k=0

Then, for i = 1, 2, · · · , N , we have |ui (t)|2

t+1 P

¯ ρt+1−k d(k))

=

O(Lt+1 ) + O(

=

¯ + 1)). O(Lt+1 ) + O(d(t

k=0

By φi (t) =

[xi (t), · · · , xi (t − pi + 1), ui (t − 1), · · · , ui (t − qi + 1), ¯ τ (t)]τ w ˆi (t), · · · , w ˆi (t − li + 1), X i

and w ˆi (k) = (w ˆi (k) − wi (k)) + wi (k) we can obtain ||φi (t)||2

= =

¯ O(||X(t)||2 ) + O(Lt ) + O(d(t)) +O(log ri (t) + di (t)) ¯ O(Lt + log r¯(t) + d(t))

where

Putting this into (V.4), we can obtain ¯ r (t)). Lt+1 = O(log ri (t)[log r¯(t) + d(t)]¯

∆

r¯(t) = max(r1 (t), r2 (t), · · · , rN (t)). Hence by (V.2), for agent i, there exists Ci > 0 such that |xi (t + 1)|2

≤

Ci αi (t)δi (t)Lt ¯ +O(αi (t)δi (t)[log r¯(t) + d(t)]) +O(di (t) + log ri (t)).

Then noticing αi (t)δi (t) = O(log ri (t)) we obtain that

then by the arbitrariness of , we have ¯ r (t)), ∀ > 0. Lt+1 = O(d(t)¯ Then for i = 1, 2, · · · , N , we obtain that ¯ r (t)) ||X(t + 1)||2 ≤ Lt+1 = O(d(t)¯ 2 ¯ ¯ r (t)) |ui (t)| = O(Lt+1 + d(t + 1)) = O(d(t)¯ 2 ¯ ¯ r (t)). ||φi (t)|| = O(Lt + log r¯(t) + d(t)) = O(d(t)¯ (V.5) Step 3: By Lemma 4, we have

¯ |xi (t + 1)|2 ≤ Ci αi (t)δi (t)Lt + O(log ri (t)[log r¯(t) + d(t)]). lim inf rit(t) ≥ Ri > 0, a.s. t→∞ (V.3) Step 2: Because (V.3) holds for i = 1, 2, · · · , N , we have Thus t = O(ri (t)) = O(¯ ¯ r (t)) together with d(t) = 2 δ ¯ O(t ), ∀δ ∈ ( β , 1) then we conclude that d(t) = O(¯ r (t)). N P |xi (t + 1)|2 ||X(t + 1)||2 = Putting this into (V.5), and by the arbitrariness of , we obtain i=1 N P

≤ [

||φi (t)||2 = O(¯ rδ (t)), ∀δ ∈ ( β2 , 1).

Ci αi (t)δi (t)]Lt

i=1

¯ log r¯(t)) + O(N log2 r¯(t)). +O(N d(t)

Thus by the definition of Lt , we have

Therefore t P |θ˜iτ (k)φi (k)|2 k=0

= ρLt + ||X(t + 1)||2 N P αi (t)δi (t)]Lt ≤ [ρ + C

Lt+1

k=0

i=1

2

= O(¯ rδ (t) log r¯(t)), ∀δ

αi (t)δi (t), then ¯ log r(t) + N log2 r¯(t)) O(N d(t) t t−1 P Q [ρ + Cη(l)] +O(N k=0 l=k+1

δi (k) =

∞ P

(V.4)

2

[trPi (k) − trPi (k + 1)] ≤ trPi (0) < ∞,

we have δi (k) → 0 as k → ∞. By Lemma 5, ∞ P

αi (k) = O(log ri (k)) = O(log r¯(k)).

Then, for i = 1, 2, · · · , N and arbitrary > 0, there exists k0 > 0 such that t P

αi (k)δi (k) ≤

k=t0

1 ¯(t) N log r

t P

Then, by the inequality 1 + x ≤ e , ∀x ≥ 0 we have exp{ρ−1 C

t P k=t0

k=t0

≤

=

O(¯ rδ (t)) + O(t) + O(log r¯(t))

=

O(¯ rδ (t)) + O(t)

t P

|ui (k − 1)|2

=

O(¯ rδ (t)) + O(t)

k=0 t P

|w ˆi (k)|2

=

O(log r¯(t)) + O(t)

k=0

From above, we know that for i = 1, 2, · · · , N , t P

||φi (k)||2 k=0 ∈ ( β2 , 1).

= 1+

= O(¯ rδ (t)) + O(t)

Hence

x

[1 + ρ−1 Cη(k)] ≤

θ˜iτ (t)φi (t) + x∗i (t + 1) + wi (t + 1) +θiτ (φ0i (t) − φi (t))

we have t P |xi (k + 1)|2

∀δ

η(k) ≤ log r¯(t).

k=t0

t Q

xi (t + 1) =

ri (t)

for all t ≥ t0 ≥ k0 . Therefore ρ−1 C

Since

k=0

k=0

ρ−1 C

(V.6)

k=0

k=0

k=0

αi (k)) k=0 ∈ ( β2 , 1).

t P |θ˜iτ (k)φi (k)|2 = O(¯ rδ (t)), ∀δ ∈ ( β2 , 1).

¯ log r¯(k) + log r¯(k)]). ×[d(k) Since ∞ P

t P

Then, by the arbitrariness of δ, we have

i=1

=

αi (k)||φi (k)||2 )

= O(log r¯(t)) + O(¯ rδ (t)

C = max(C1 , C2 , · · · , CN ).

Lt+1

t P

k=0

where

Let η(t) =

αi (k)[1 + φτi (k)Pi (k)φi (k)]

= O(log ri (t)) + O(

¯ log r¯(t)) + O(N log r¯(t)) +O(N d(t)

N P

t P

=

η(k)}

exp{ log r¯(t)} = r¯ (t).

r¯(t)

= =

max{ri (t), 1 ≤ i ≤ N } O(¯ rδ (t)) + O(t), ∀δ ∈ ( β2 , 1).

Furthermore, we can obtain r¯(t) = O(t)

which means that the closed-loop system is stable. Step 4: Now we give the proof of the optimality. t P |xi (k + 1) − x∗i (k + 1)|2 k=0

=

t P

[wi (k + 1)]2 +

k=0

+2

t P

t P

[ψi (k)]2

(V.7)

k=0

ψi (k)wi (k + 1)

k=0

where ∆ ψi (k) = θ˜iτ (k)φi (k) + θiτ (φ0i (k) − φi (k)).

By (V.6), (V.1) and the martingale estimate theorem, we can obtain that the orders of last two items in (V.7) are both O(¯ rδ (t)), ∀δ ∈ ( β2 , 1). Then we can obtain t 1 P |xi (k t t→∞ k=0

lim

+ 1) − x∗i (k + 1)|2 = Ri ,

Furthermore t P |xi (k) − x∗i (k) − wi (k)|2

=

t P

a.s.

||ψi (k)||2

k=0

k=0

=

O(¯ rδ (t)) = o(t),

a.s. 2

VI. S IMULATION S TUDY In this simulation, there are 5 agents in the system. For each agent, the driving noise sequence is taken from independent and identically-distributed normal distribution N (0, 1) and its parameters are randomly taken such that Assumptions 2 and 4 hold. The reference signal sequence t . for each agent is taken as x∗i (t) = 10 sin 10 The parameters of agents are randomly generated such that Assumptions 2 and 4 are satisfied. To save space, only results for agent 1 are illustrated in Fig.1, where the states x1 (t) and the reference signals x∗1 (t) are plotted in the topleft subfigure; the controls u1 (t) in the top-right subfigure; the tracking errors e1 (t) in the bottom-left subfigure and t P the averaged tracking errors J1 (t) = 1t [e1 (k)]2 in the k=1

bottom-right subfigure. x and x * 1

u

1

1

15

30

10

20

5

10

0

0

−5

−10

−10

−20

−15

−30 0

50

100

150

0

50

e1

100

150

100

150

J1

4

10

2

8

0 6 −2 4 −4 2

−6 −8

0 0

50

100

150

0

50

Fig.1. Simulation results for agent 1 From the simulation results, we can see that by using the local adaptive controllers, all agents can track their local reference signals successfully and the average tracking errors are bounded. These simulations verified our theoretical results.

VII. C ONCLUSION In this paper, a framework of adaptive control problem for multi-agent complex dynamical system with various uncertainties was presented. For problems in this framework, theoretical study on the closed-loop system is usually somewhat difficult due to the complexities resulting from the couplings between agents and nonlinearity of local adaptive controllers. To explore this new direction, an uncertain coupled ARMAX system was studied in this paper. Each agent can utilize its own history information and local information of its neighborhood agents to design its local controller, the so˚ om-Wittenmark self-tuning regulator, to track a called Astr¨ local signal sequence. The main contribution of this paper is that we show that local controllers can guarantee closedloop stability and optimality under some local conditions. These conditions are necessary even for a single subsystem. We have given a rigorous proof of our results by using the techniques initiated by Guo and Chen. For general cases in which the high-frequency gains bi1 (i = 1, 2, · · · , N ) are unknown, more difficulties will be encountered. However, the ideas of stability analysis adopted in this paper can still be applied. R EFERENCES [1] R. Albert, H. Jeong, and A. L. Barab¨asi, “Error and attack tolerance of complex networks,” Nature, vol. 406, pp. 378–382, 2000. [2] S. Dorogovtsev and J. Mendes, “Evolution of networks,” Adv. Phys., vol. 51, no. 4, pp. 1079–1187, 2002. [3] M. E. J. Newman, “The structure and function of complex networks,” SIAM Rev., vol. 45, pp. 167–256, 2003. [4] E. Ravasz and A.-L. Barab¨asi, “Hierarchical organization in complex networks,” Phys. Rev. E, vol. 67, p. 026112., 2003. [5] S. H. Strogatz, “Exploring complex network,” Nature, vol. 410, pp. 268–276, 2001. [6] J. H. Holland, Hidden Order : How Adaptation Builds Complexity. New York: Addison-Wesley, 1996. [7] Y. D. Landau, Adaptive control: The model reference approach. New York: Dekker, 1979. [8] G. Goodwin and K. Sin, Adaptive Filtering, Prediction and Control. Englewood Cliffs, NJ: Prentice-Hall, 1984. [9] K. Astr¨om and B. Wittenmark, Adaptive Control. Addison-Wesley Pupl. Comp., 1989. [10] P. A. Ioannou and J. Sun, Robust adaptive control. Englewood, Cliffs, NJ: Prentice Hall, 1996. [11] H. F. Chen and L. Guo, Identification and Stochastic Adaptive Control. Boston, MA: Birkh¨auser, 1991. [12] C. Y. Wen, “Decentralized robust control of class unknown interconnected systems,” Automatica, vol. 30, no. 3, pp. 543–544, 1994. [13] G. H. Yang and S. Y. Zhang, “Decentralized robust control for interconnected systems with time-varying uncertainties,” Automatica, vol. 31, no. 11, pp. 1603–1608, 1996. [14] Y. Y. Wang, G. X. Guo, and D. J. Hill, “Robust decentralized nonlinear controller design for interconnected systems for multimachine power systems,” Automatica, vol. 33, no. 9, pp. 1725–1733, 1997. [15] M. S. Mahmoud and S. Bingulac, “Robust design of stabilizing controllers for interconnected time-delay systems,” Automatica, vol. 34, no. 6, pp. 795–800, 1998. [16] Z. H. Guan, G. R. Chen, X. H. Yu, and Y. Qin, “Robust decentralized stabilization for a class of large-scale time-delay uncertain impulsive dynamical systems,” Automatica, vol. 38, pp. 2075–2084, 2002. [17] J. S. Reed and P. A. Ioannou, “Discrete decentralized adaptive control,” Automatica, vol. 24, no. 3, pp. 419–421, 1988. [18] G. Goodwin, P. Ramage, and C. P.E., “Discrete time multivariable adaptive control,” IEEE Trans. Automatic Control, vol. 25, pp. 449– 456, 1980. ˚ om-Wittenmark self-tuning regulator [19] L. Guo and H. F. Chen, “The Astr¨ revisited and ELS-based adaptive trackers,” IEEE Transactions on Automatic Control, vol. 36, no. 7, pp. 802–812, 1991. [20] L. Guo, “Convergence and logarithm laws of self-tuning regulators,” Automatica, vol. 31, no. 3, pp. 435–450, 1995. [21] K. Astr¨om and B. Wittenmark, “On self-tuning regulators,” Automatica, vol. 9, pp. 185–199, 1973. [22] L. Guo, Time-varing stochastic systems. Ji Lin Science and Technology Press, 1993, (in Chinese). [23] ——, “Further results on least squares based adaptive minimum variance control,” SIAM J. Control and Optimization, vol. 32, no. 1, Jan 1994.