Model transformations (MTs) are a key technology of model-driven engineering (MDE), where models are at the center of engineering processes. They are used for various tasks in the whole development lifecycle such as the verification, debugging, and simulation of systems or to generate artifacts for documentational and deployment purposes. In-place transformations in particular are characterized by direct modification of a model's composition and features. Given a set of possible modification options and means to assess a model's quality, determining the right transformations in the right order is subject towards optimizing models. Employing techniques to carry out the search for quality-improving changes unites search-based optimization and MDE, where concepts from the latter can be used to model optimization problems. In order to solve such problems, existing approaches rely primarily on meta-heurstic search. In this work we apply for the first time reinforcement learning (RL) for in-place MTs. We identify the preliminaries to employ different RL approaches like the requirement of a model encoding for policy gradient methods. Furthermore, we provide a selection of algorithms for single- and multi-objective scenarios and evaluate them on several case studies. To this extent, a framework for model-driven optimization was extended to support value-based and policy-based methods. Evaluation results suggest that RL algorithms can compete with existing approaches performance-wise and motivate further investigation and research lines to embrace the benefits of machine learning approaches, such as transfer learning and generalization.