This thesis addresses the problem of finding a robust test suite for software testing by the use of mutation testing and machine learning. The goal is to find out a test suite that is immune to most of the mutants that can occur. The focus is set on finding those tests which are more successful in detecting mutations in the software, simultaneously learn which mutants are least likely to be detected and thereby of the highest strength. Learning to select the most reliable tests allows saving time during future test iterations, compared to retest-all approach. The project is constructed in a two-player game setting, having the attacker, which selects mutants of a program, playing against the defender, which selects test cases to find the mutants; their objective is to win by the kill factor. Each player is implemented as a contextual bandit to learn the patterns for successful selection. In our experiments, we evaluate different parameters, including algorithms to balance exploration and exploitation. We find reliable parameters on a set of four Java programs; however, algorithms and size matter, and therefore, scaling them for each program is vital. The results show the capability to learn test and mutant selection, although the effect is parameter-dependent and not equally strong on all program sizes in our experiments. Our findings identify the ability to learn test selection from a game-play setting. We discuss the results and position them in the context of future work on learning to select software test cases efficiently.