Abstract
Housekeeping genes are expressed across a wide variety of tissues. Since repetitive sequences have been reported to influence the expression of individual genes, we employed a novel approach to determine whether housekeeping genes can be distinguished from tissue-specific genes by their repetitive sequence context. We show that Alu elements are more highly concentrated around housekeeping genes while various longer (> 400-bp) repetitive sequences ("repeats"), including Long Interspersed Nuclear Element-1 (LINE-1) elements, are excluded from these regions. We further show that isochore membership does not distinguish housekeeping genes from tissue-specific genes and that repetitive sequence environment distinguishes housekeeping genes from tissue-specific genes in every isochore. The distinct repetitive sequence environment, in combination with other previously published sequence properties of housekeeping genes, was used to develop a method of predicting housekeeping genes on the basis of DNA sequence alone. Using expression across tissue types as a measure of success, we demonstrate that repetitive sequence environment is by far the most important sequence feature identified to date for distinguishing housekeeping genes.
Original language | English (US) |
---|---|
Pages (from-to) | 153-165 |
Number of pages | 13 |
Journal | Gene |
Volume | 390 |
Issue number | 1-2 |
DOIs | |
State | Published - Apr 1 2007 |
Bibliographical note
Funding Information:C.D.E. was supported by a UCLA-IGERT bioinformatics traineeship (NSF DGE-9987641). M.R. was supported by a Tumor Cell Biology Fellowship (USHHS Institutional National Research Service Award #T32 CA09056). Y.M. was supported in part by National Institutes of Health Grants GM6100701 and HD041451-02.
Keywords
- Alu
- Isochores
- LINE
- Random forest
- Repeat
- SINE
- Tissue-specific genes