Date Approved
6-16-2026
Embargo Period
6-16-2026
Document Type
Thesis
Degree Name
M.S. Computer Science
Department
Computer Science
College
College of Science & Mathematics
Advisor
Shen-Shyang Ho, Ph.D.
Committee Member 1
Vasil Hnatyshin, Ph.D.
Committee Member 2
Charalampos (Babis) Papachristou, Ph.D.
Keywords
Imitation Learning;Inverse Reinforcement Learning;Multi-Agent Reinforcement Learning;Multi-Agent Systems;Policy Optimization;Reinforcement Learning
Abstract
Increasingly, automated agents operating in the physical world are expected to perform previously unseen tasks at levels comparable to human experts. Solutions to these challenges typically fall along a spectrum between model-based and model-free reinforcement learning (RL) approaches. Encouraging expert-like behavior in reinforcement learning is often difficult because reward functions can be challenging to specify accurately. Inverse reinforcement learning addresses this problem by inferring reward functions directly from expert demonstrations rather than relying on manually engineered objectives. This thesis examines the performance of both on-policy and off-policy model-free reinforcement learning algorithms, including Proximal Policy Optimization (PPO) and Deep Deterministic Policy Gradient (DDPG), within a cooperative environment. These approaches are compared against Multi-Agent Deep Deterministic Policy Gradient (MADDPG) and Multi-Agent Adversarial Inverse Reinforcement Learning (MA-AIRL). The work presents an empirical study based on a smart building and electric vehicle charging station operating under both cooperative and competitive conditions. Results demonstrate that MA-AIRL can recover reward functions that correlate strongly with expert behavior in high-dimensional cooperative environments, while also identifying conditions under which reward recovery remains tractable and circumstances in which it degrades. Furthermore, the findings suggest that when an agent applies a learned cooperative policy within an environment that does not support cooperative behavior, the resulting actions may be interpreted as a form of “coping mechanism”: constrained cooperation arising from environmental mismatch rather than a failure of optimization.
Recommended Citation
Jiosi, Jordan, "Empirical Analysis On Multi-Agent Reinforcement Learning Frameworks" (2026). Theses and Dissertations. 3534.
https://rdw.rowan.edu/etd/3534