Practitioners and researchers have been developing expertise to generate train schedules since the railway came into existence. Here we motivate an approach that does not require human-expert knowledge and has the capability to adapt to the new scheduling requirements out-of-the-box. The requirements encompass assigning the initial and/or target station, as well as the earliest departure and/or latest arrival time for multiple trains. Our multi-agent reinforcement learning model uses self-play to learn how to dynamically generate train trajectories consisting of the departure times from the initial station and online paths generation leading to the target station within a railway network. We train a deep neural network which at each timestep predicts train actions and a measure of schedule feasibility. This neural network increases the quality of a Monte-Carlo tree search, resulting in better train actions during the next self-play iteration. We show that our model can reason about the shortest paths whenever it leads to a feasible schedule. Furthermore, in the instances where the shortest path solution causes train collisions, our model shows the ability to avoid them and navigate the trains to the feasible solution.