This paper proposes and analyzes an asynchronous communication-efficient distributed optimization framework for a general type of machine learning and signal processing problems. At each iteration, worker machines compute gradients of a known empirical loss function using their own local data, and a master machine solves a related minimization problem to update the current estimate. We establish that the proposed algorithm converges with a sublinear rate over the number of communication rounds, coinciding with the best theoretical rate that can be achieved for nonconvex nonsmooth problems. Moreover, under a strong convexity assumption of the smooth part of the loss function, linear convergence is established. Extensive numerical experiments show that the performance of the proposed approach indeed improves - sometimes significantly - over other state-of-the-art algorithms in terms of total communication efficiency.