A reinforcement learning system is provided, including at least one agent and a server. According to a set condition, the
at least one agent transmits a plurality of state sets related to a state of the environment through a network, receives a plurality of action sets for performing an action, and transmits, to the server, a plurality of feedback messages generated after interacting with the environment. The server configures a predetermined ratio of the memory space as at least one workstation according to the set condition, and selects an untrained model to be temporarily stored in the at least one workstation. The at least one workstation imports a current state set, a current action set, and a current feedback message, as parameters, into the untrained model for reinforcement learning, and generates a next action set until a goal is achieved.