The Human Genome: Unlocking The Gene Count Mystery

how many genes are believed to constitute the human genome

The human genome is a complete set of nucleic acid sequences, encoded as DNA within each of the 23 distinct chromosomes in the cell nucleus. The Human Genome Project (HGP), which lasted from 1990 to 2003, aimed to determine the DNA sequence and the location of the estimated 100,000 human genes. However, the number of genes in the human genome remains a subject of debate among researchers due to differing definitions of what constitutes a gene. While the HGP's original paper in 2001 acknowledged the presence of thousands of non-coding genes, the exact number of genes is still uncertain.

Characteristics Values
Number of genes in the human genome Between 20,000 and 25,000
Number of protein-coding genes 20,000
Number of noncoding genes 15,000 to 20,000
Number of noncoding RNA genes 706
Number of genes with unknown functions Thousands
Year the human genome was sequenced 2022
Year the Human Genome Project was completed 2003
Year the first human reference genome was released 2000
Year the current human reference genome was released 2013
Year the Y chromosome was sequenced 2022
Number of base pairs in the human genome 3.055 billion
Number of mega-base pairs in the GRCh38 reference assembly 151

cycivic

The Human Genome Project

The human genome is a complete set of nucleic acid sequences encoded as DNA within each of the 23 distinct chromosomes in the cell nucleus. The genome includes protein-coding DNA sequences and various types of non-coding DNA, such as DNA coding for non-translated RNA, promoters, and their associated gene-regulatory elements.

One of the early goals of the physical mapping component of the Human Genome Project was to isolate contiguous DNA fragments spanning at least 2 million nucleotides. Scientists have made significant progress in this area, with sets of contiguous DNA fragments now ranging from 20 to 50 million nucleotides in length. The project also developed detailed genetic, physical, and sequence maps, which are critical for understanding the biological basis of complex disorders influenced by both genetic and environmental factors, such as diabetes, heart disease, cancer, and psychiatric illnesses.

cycivic

Protein-coding genes

The human genome is a complete set of nucleic acid sequences encoded as DNA within each of the 23 distinct chromosomes in the cell nucleus. The human genome includes both protein-coding and non-protein-coding DNA sequences.

The HGP's original paper, published in 2001, acknowledged that thousands of human genes produce noncoding RNAs, although the paper itself reported just 706 noncoding RNA genes. The invention of high-throughput RNA sequencing and other technological breakthroughs have led to an explosion in the number of reported non-coding RNA genes, although most of them do not yet have any known function.

The number of protein-coding genes has gradually converged, with estimates shrinking to fewer than 20,000. When the first draft of the human genome sequence was published in 2001, there were approximately 30,000-40,000 protein-coding sequences.

The concept of gene therapy refers to a vast series of applications, both in vitro and in vivo, with the utilization of nucleic acids for therapeutic intentions. The proteins encoded by therapeutic genes have very different functions and activities, ranging from the substitution of a missing cellular protein to the modulation of immune systems.

cycivic

Non-coding genes

The human genome is a complete set of nucleic acid sequences encoded as DNA within each of the 23 distinct chromosomes in the cell nucleus. The human genome includes protein-coding DNA sequences and various types of DNA that do not encode proteins. The latter is a diverse category that includes DNA coding for non-translated RNA, such as ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and several types of regulatory RNAs. Noncoding genes are an important part of non-coding DNA and they include genes for transfer RNA and ribosomal RNA.

Non-coding DNA (ncDNA) sequences are components of an organism's DNA that do not encode protein sequences. Some non-coding DNA is transcribed into functional non-coding RNA molecules (e.g. transfer RNA, microRNA, piRNA, ribosomal RNA, and regulatory RNAs). Other functional regions of the non-coding DNA fraction include regulatory sequences that control gene expression; scaffold attachment regions; origins of DNA replication; centromeres; and telomeres. Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses. Regions that are completely nonfunctional are called junk DNA.

In bacteria, the coding regions typically take up 88% of the genome. The remaining 12% does not encode proteins, but much of it still has biological functions through genes where the RNA transcript is functional (non-coding genes) and regulatory sequences, which means that almost all of the bacterial genome has a function. In the human genome, somewhere between 1-2% is coding DNA, with 98-99% consisting of non-coding DNA. This includes many functional elements such as non-coding genes and regulatory sequences.

The number of genes does not seem to correlate with perceived notions of complexity. For example, the genome of the unicellular Polychaos dubium contains more than 200 times the amount of DNA in humans. The pufferfish Takifugu rubripes genome is only about one-eighth the size of the human genome, yet it seems to have a comparable number of genes. Genes take up about 30% of the pufferfish genome and the coding DNA is about 10%. The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.

The human genome has somewhere between 100,000 and more than 100,000 genes, according to different sources. The Human Genome Project (HGP) acknowledged in 2001 that "thousands of human genes produce noncoding RNAs as their ultimate product," although the paper itself reported just 706 noncoding RNA genes. Databases of lncRNAs (and other RNA genes such as microRNAs) have grown dramatically in the decade since, and current human gene catalogs now contain more RNA genes than protein-coding genes.

cycivic

Genetic disorders

There are numerous genetic disorders, and they can manifest in various ways. Some examples include:

  • Tay-Sachs disease: an inherited blood disorder that affects development, vision, and speech.
  • Turner syndrome: caused by a missing or partially missing X chromosome.
  • Von Willebrand disease: a genetic bleeding disorder where the blood clotting factor is absent, low, or defective.
  • Haemophilia: another bleeding disorder where blood has difficulty clotting.
  • Fragile X syndrome: the most common inherited cause of intellectual disability, also causing physical, behavioural, and emotional challenges.
  • Gilbert's syndrome: an inherited liver disorder that causes bilirubin buildup and jaundice but is generally harmless.
  • Haemochromatosis: a disorder where the body absorbs too much iron, leading to potential organ damage.
  • Apert syndrome: a rare disorder that causes changes to the shape of the skull.
  • Charcot-Marie-Tooth (CMT) disease: a progressive nervous system disorder.
  • Congenital adrenal hyperplasia: affects the adrenal glands, causing imbalances in hormone levels.

The human genome, which is the complete set of human genes, consists of 23 distinct chromosomes in the cell nucleus. The human genome includes protein-coding DNA sequences and DNA that does not encode proteins. While the sequence of the human genome has been determined, it is not yet fully understood, and the biological functions of their protein and RNA products are still being studied.

The Human Genome Project (HGP), which lasted from 1990 to 2003, aimed to determine the DNA sequence and location of the estimated 100,000 human genes. However, the number of human genes remains a subject of debate, with various human gene databases containing thousands of differences. The definition of a "gene" has evolved, and the focus has shifted from protein-coding genes to include noncoding RNA genes as well.

cycivic

Human gene databases

The human genome is a complete set of nucleic acid sequences, encoded as DNA within each of the 23 distinct chromosomes in the cell nucleus. The human genome includes protein-coding DNA sequences and various types of non-protein-coding DNA sequences. The latter includes DNA coding for non-translated RNA, such as ribosomal RNA, transfer RNA, ribozymes, small nuclear RNAs, and regulatory RNAs. It also comprises promoters, gene-regulatory elements, and DNA with structural and replicatory roles, such as scaffolding regions, telomeres, centromeres, and origins of replication. The human genome also contains mobile elements, endogenous retroviruses, and integrated viral DNA sequences.

The Human Genome Project (HGP), which lasted from 1990 to 2003, aimed to determine the DNA sequence and the location of the estimated 100,000 human genes. However, the problem of finding all human genes is complex, and there is a lack of agreement among current databases. The definition of a "gene" has evolved, and the focus has shifted from protein-coding genes to include noncoding genes as well. Many genes are noncoding, producing noncoding RNAs, and databases of long non-coding RNAs (lncRNAs) have grown significantly.

To address the challenges in human gene identification, various human gene databases have been developed. GeneCards is a searchable, integrative database that provides comprehensive information on annotated and predicted human genes. It integrates gene-centric data from approximately 200 web sources, covering genomic, transcriptomic, proteomic, genetic, clinical, and functional aspects. Another database, CHESS, was created in 2017 and uses a massive RNA-seq collection to assemble transcripts from a broad survey of human tissues. The CHESS gene set adds over 100,000 new gene isoforms and new genes to existing databases, including all the protein-coding genes from Gencode and RefSeq.

The effort to understand the human genome has led to the exploration of genetic influences on common diseases such as diabetes, asthma, migraine, and schizophrenia. Genetic variations can influence medical conditions, such as genetic disorders, and non-medical aspects, such as height, eye colour, and sensory abilities. While the human genome sequence has been determined, ongoing work is focused on elucidating the biological functions of their protein and RNA products. The dynamic nature of gene discovery and the evolution of gene definitions contribute to the ongoing development and refinement of human gene databases.

Frequently asked questions

It is estimated that the human genome contains between 20,000 and 25,000 genes, with around 20,000 of these being protein-coding genes.

There are a few reasons for this uncertainty. Firstly, the definition of a "gene" is the subject of scientific debate, and different definitions will include or exclude certain segments of DNA. Secondly, there are competing human gene databases, and these databases have many thousands of differences among them.

The Human Genome Project was a publicly funded initiative that took place between 1990 and 2003. Its primary goals were to determine the DNA sequence of the human genome and the location of its estimated 100,000 genes.

Written by
Reviewed by
Share this post
Print
Did this article help you?

Leave a comment