Reinforcement learning-based adaptive transmission for the underwater full-duplex relay network with energy-harvesting
Files
Date
Authors
Keywords
Degree Level
Advisor
Degree Name
Volume
Issue
Publisher
Abstract
The acoustic wave is the only known effective method for long-haul underwater wireless communication, as compared to radio-frequency waves and light waves. The demands for oceanic environment monitoring, disaster surveillance, and business applications have propelled the growth of the underwater acoustic communication market. However, underwater acoustic communication is still in its infancy due to the challenge characteristic{narrow effective bandwidth. To address this challenge, underwater cooperative communication, which introduces relay nodes to forward messages from the source node to the destination node, can increase the effective bandwidth. The nature of long-term operational communication networks is dynamic in time scale. For example, energy arrivals in energy harvesting communication are stochastic and the channel conditions are time-varying in wireless communication. In this thesis, we focus on system optimization for the long-term operational communication network. To this end, the optimization problem is formulated as maximizing or minimizing the accumulated utility function from the current to a future time instant. Given that the causal information of the system is available, this type of problem is known as stochastic optimization problem in which some parameters are random variables, and thus, traditional optimization tools cannot directly be applied to solve the problem. Instead, the solution is provided by the reinforcement learning technique that describes how an agent interacts with the environment over time to maximize the accumulative reward. In this thesis, the long-term operational underwater relay network is investigated, which consists of one sensor node, one relay node, and one destination node. The relay node operates in full-duplex mode, and can transmit and receive signals at the same frequency and time. Also, the relay node relies on the harvested energy from the ambient environment, whereas the source and the destination nodes have fixed power supplies. We evaluate the network performance with respect to the end-to-end spectral efficiency and average energy efficiency and aim to improve these performance metrics in the long-term. Due to the stochastic characteristic of harvested energy and channel state information, we develop adaptive transmission policies for the considered system to optimize system performance. Considering that the practical condition in which the causal knowledge of the system is known, the problem is then formulated as an online sequential decision-making problem and the reinforcement learning technique is used to obtain the transmission policies. Two major benefits of the reinforcement learning framework are: 1) it obtains an optimal solution, and 2) it does not require the knowledge of future information. On the other hand, one can apply the conventional optimization approach; however, this focuses on maximizing only the current reward, not the future reward, and hence, is not optimal. Simulation results show that the proposed transmission policies improve the system performance when compared with the benchmark policy.
