We present the use of modern machine learning approaches to suppress self-sustained collective oscillations typically signaled by ensembles of degenerative neurons in the brain. The proposed hybrid model relies on two major components: an environment of oscillators and a policy-based reinforcement learning block. We report a model-agnostic synchrony control based on proximal policy optimization and two artificial neural networks in an Actor-Critic configuration. A class of physically meaningful reward functions enabling the suppression of collective oscillatory mode is proposed. The synchrony suppression is demonstrated for two models of neuronal populations-for the ensembles of globally coupled limit-cycle Bonhoeffer-van der Pol oscillators and for the bursting Hindmarsh-Rose neurons using rectangular and charge-balanced stimuli.