Multi-armed bandit problem is an important optimization game requiring an exploration-exploitation tradeoff to achieve optimal total reward. We consider a setting where the rewards of bandit machines are associated with covariates, and focus on the approach of nonparametric estimation of the reward functions together with a randomized allocation to balance the exploration and exploitation. To overcome the curse of dimensionality in nonparametric learning, we propose using dimension reduction methods such as sliced inverse regression (SIR) and likelihood acquired directions (LAD) to reduce the dimension of the covariates. To simultaneously achieve variable selection and dimension reduction, we use coordinate-independent sparse estimation (CISE) for the dimension reduction step. Not knowing which individual dimension reduction method is the best, we show that adaptively combining these dimension reduction methods works really well.