Completely sequencing the entire genome, known as Whole-genome sequencing (WGS), is a thorough approach to examining genomes. The utilization of genomic data has been crucial in detecting hereditary disorders, defining mutations that induce cancer advancement, and tracing the spread of diseases during outbreaks. With the continuous decrease in sequencing expenses and the capacity to generate vast amounts of information with modern sequencers, whole-genome sequencing has emerged as an effective technique for genomic research.
The genome is a complex entity containing all the hereditary information of an organism. The human genome comprises 23 pairs of chromosomes, each containing approximately 150 million bases. Therefore, it is indeed a daunting task to completely sequence all the genes within an individual. Whole-genome sequencing has been performed on many organisms, including marine and plant species and humans. A whole-genome sequencing project usually consists of three steps: generation and assembly of sequence data, comparative genomics, and functional analysis of identified genes.
The next step is to align the generated sequences with genomic sequences of different species. In this way, genes conserved among different species can be identified. Those unique to a particular genome can be further analyzed and classified based on their functions.
For a gene database to serve as an effective research tool, it must include all available information regarding genes, like their chromosomal location, sequence variation in different individuals of the same species, gene annotation, etc. Furthermore, it also needs to provide a way for researchers to access and retrieve this information easily. These databases must also be updated regularly, with the last update date stating the date of their latest release. Most gene database websites are hosted by the National Center for Biotechnology Information (NCBI). Still, some independent database facilities have occasionally unveiled their gene databases in the public domain.
To distinguish between genes and non-coding DNA, researchers design several bioinformatics tools. The Eukaryotic Promoter Database (EPD) is one such database that contains promoter sequences of eukaryotic genes that are located between -200 and +50 from the transcriptional start site.
NCBI hosts many databases that provide information on human genes, but other organizations also work towards the same goal. The table below provides information on databases that store information regarding human genes.
Many resources across the web provide WGS data of different organisms. For example, the NCBI’s website links reference materials for various species, including mammals, birds, invertebrates, plants, and more. Databases such as GenBank also offer data from complete genome sequences of several species apart from humans. The table below provides information on databases that store complete genome sequences or assemblies.