Researchers face many challenges in representing biological data, including: (1) inherent complexity of biological data, (2) domain knowledge barrier, (3) constantly evolving knowledge, and (4) lack of expert data-modeling skills. We have studied how to represent biological sequences and sequence-related genomics concepts using logical data structure. From our multiple experiences in genomic data modeling, we present results in three areas: genomic schema elements, genomic schema fragments, and genomic data modeling lessons. A genomic schema element is a data model that contains only one basic biological sequence notion. Genomic schema elements provide biology data modelers with baseline thoughts in genomic data modeling. A genomic schema fragment is a data model that contains only one genomic topic area. Genomic schema fragments provide biology data modelers with successful design solutions that they can adapt to fit their own problem's needs. Genomic data modeling lessons address issues particularly important to genomic data modeling such as modeling contextual information, modeling intermediate and derived data, modeling inconsistent data, and modeling categorical rules. Genomic data modeling lessons provide novice biology data modelers with enriched principles from content-neutral data modeling techniques. In all, we have demonstrated how to manage evolving genomic knowledge concepts and discovery results using data modeling techniques extended into the genomics domain.
- Data modeling
- Knowledge representation