HIGDB - Haemophilus influenzae Genome Database

General characteristics

History

In 1892, Richard Pfeiffer had identified a bacillus in the sputum of patients with influenza during the 1889-92 pandemic, claiming it as the etiologic agent, and one year later he isolated the organism on blood-containing media. He had serious doubts about its importance developed during the 1918-19 pandemic when the bacillus could not be found in the early victims, but Pfeiffer's claims would be reflected in the nomenclature chosen in 1920, Haemophilus (blood-lover) influenzae (Hirschmann et al., 1979).

Haemophilus influenzae is a fastidious Gram- negative bacillus, non-motile, pleomorphic and non encapsulated. The organism is frequently found in the upper respiratory tract (URT) of healthy humans. In addition to various systemic life-threatening infections, it is a common cause of serious diseases of the upper and lower respiratory tract. Examples of Haemophilus respiratory tract infections include acute otitis media, acute maxillary sinusitis, epiglottitis, acute exacerbations of bronchitis in patients with underlying chronic obstructive pulmonary disease, and pneumonia in children and adults (Chapin et al., 1983, Kilian et al., 1972). Thus, the organism is a major community-acquired pathogen causing significant morbidity and mortality worldwide.

Genetics
The 1.83 Mbp genome of Haemophilus influenzae was the first completely sequenced genome of a free-living organism. From the sequence, 1703 predicted protein coding regions were identified. Of these, 58% were assigned to one of 102 biological categories (Fleischmann et al., 1995). The remaining 42% were novel proteins that either matched other hypothetical proteins in the databases or did not possess significant levels of similarity with other known genes. A revised analysis of the genome and encoded proteins found that a general function could be predicted for 83% of the H. influenzae proteins (Tatusov et al., 1996). H. influenzae DNA has a base composition of 37 mol% G+C; and therefore restriction enzymes with recognition sequences containing only C and G cut the DNA less frequently than enzymes whose sites contain all four bases (Butler et al., 1990).