We study the joint feature selection problem when learning multiple related classification or regression tasks. By imposing an automatic relevance determination prior on the hypothesis classes associated with each of the tasks and regularizing the variance of the hypothesis parameters, similar feature patterns across different tasks are encouraged and features that are relevant to all (or most) of the tasks are identified. Our analysis shows that the proposed probabilistic framework can be seen as a generalization of previous result from adaptive ridge regression to the multi-task learning setting. We provide a detailed description of the proposed algorithms for simultaneous model construction and justify the proposed algorithms in several aspects. Our experimental results show that this approach outperforms a regularized multi-task learning approach and the traditional methods where individual tasks are solved independently on synthetic data and the real-world data sets for lung cancer prognosis.