This paper studies the sensors tasking and management problem for optical Space Object (SO) tracking. The tasking problem is formulated as Markov Decision Process (MDP) and solved using Reinforcement Learning (RL). This RL problem is solved using actor-critic policy gradient approach. This approach is used to find the optimal policy for tasking optical sensors to estimate SO orbits. The reward function is based reducing the uncertainty for the overall catalog to a given upper bound. The reward is negative as long as a SO exist that is about the desired catalog uncertainty. This work tests this approach in simulation and good performance is found using the actor-critic policy gradient approach.