Infrastructure The DiaComp web portal was developed using an n-tier architecture design (illustrated above). The user interface is an ASP.NET application written in C# using the .NET framework. The middle object layer is a .NET dynamic link library. This layer has a number of biological domains and provides object oriented access to the data. The data layer is a relational database (SQL Server 2000) used to store all the data generated by the consortium. These systems have been developed to minimize maintenance, but provide a robust scalable model for future growth and interactions at the national level with other organism databases.
Requirements Analysis The requirements analysis for the DiaComp consortium was accomplished over several months through conversations with the investigators, reading the documentation produced by the validation committees and face to face meetings during the biannual steering committee meetings. Use-case diagrams were produced for those interactions described in the analysis and a careful assessment of the data model requirements was conducted. Because this portal is membership driven, there are two broad categories of data that we have to persist and use. The system stores both the administrative data dealing with the details of the consortium membership as well as the animal model data generated by the DiaComp membership. In addition to the data requirements, we also developed the requirements for the object model (API) necessary to support the data model in a production environment. The object model was developed to not only support the local programming environment, but also to be deployable as a SOAP or Web Service for machine to machine interactions. This is important in order for outside institutions (eg. NCBI, NCI,etc) to download load the animal model data from the DiaComp in an automated fashion.
Ontologies/Controlled Vocabulary We use controlled vocabularies extensively throughout the website. All the drop down boxes used during data entry or data query use either existing controlled vocabularies or ones developed by the consortium. We have populated our ontology “terms” table with Gene Ontology terms (Stanford) and anatomy terms from the Computational Biology and Informatics Laboratory(University of Pennsylvania). For mice, we have used the Standard Anatomical Nomenclature Database (University of Edinburgh/The Jackson Laboratory) to relate these terms to developmental (Theiler) stages. In addition, we have imported mouse gene information from the Jackson Laboratory for those genes currently associated with GO terms. This enables searches using the gene name, symbol, or GO concept. We are currently completing programs for automatic, scheduled retrieval and import of externally maintained data (such as GO). Note that we do not reproduce these databases; rather, we import enough information to permit structured data entry and searches. Should users require further details about a term or gene, we will provide hyperlinks to the appropriate web resource. The ontologies required for microarray data exchange were imported from Chris Stoeckert’s work in the MGED Ontology Workgroup. The DiaComp CBU has been very careful to align our vocabulary with existing groups whenever possible. This will be very important for automated data exchange between the DiaComp and other biological databases.