Paper “Distributed Value Function Approximation for Collaborative Multi-Agent Reinforcement Learning“, by M.S. Stanković, M. Beko and S.S. Stanković, has been published in IEEE Transactions on Control of Network Systems (IEEE TCNS)!
In the paper, several new distributed gradient-based temporal difference algorithms for decentralized multi-agent off-policy learning of the value function in Markov decision processes were proposed, rigorously theoretically analyzed and verified using extensive simulations.