Inflammation and Intestinal Metaplasia of the Distal Esophagus Are Associated With Alterations in the Microbiome” (Gastroenterology 2009;137:588-597)

Gastric acid reflux causes esophagitis, Barrett’s esophagus (BE), and adenocarcinoma, but it is unknown how this disease sequence is initiated. Based on analyses of 16S rRNA gene clones in 34 subjects with normal esophagus, esophagitis, or BE, here we show that samples of microbiomes from the distal esophagus can be classified into two different microbiomes. The type I microbiome was dominated by the genus Streptococcus and concentrated in the phenotypically normal esophagus. Conversely, the type II microbiome was more abundant in Gram-negative anaerobes/microaerophiles and primarily correlated with esophagitis (OR: 15.4) and BE (OR: 16.5). These findings raise the issue of a possible role for dysbiosis in the pathogenesis of reflux-related disorders.

-Supported by grants from the National Cancer Institute and the National Institute for Allergy and Infectious Diseases (UH2CA140233, R01CA97936, R01AI063477)

 Editorial Comments:  

  - "Microbiome Analysis in the Esophagus" (Suerbaum S. Gastroenterology. 2009;137:419-21)

It seems likely that the profound change of the composition of the microbial community in the esophagus might influence epithelial function. The change from     micobiome type I to type II might thus prove to be an important step in the pathogenesis of esophageal tumorigenesis, and represent a biologically more plausible microbial component in this disease than the absence of H. pylori from the stomach which has been reported to be associated with an increased risk of esophageal adenocarcinoma.

- "Different Microbiome Patterns in Normal, Inflammed, and Barrett's Esophagus" (Tack J and Carethers JM.Gastroenterology. 2009;137:398-399)

The study indicates that there is a shift from a Gram-positive microbiome in normal esophagus to that of a Gram-negative anaerobic microbiome in inflamed/Barrett's esophagus. It is not known if the type II microbiome plays a causitive role in reflux, but lipopolysaccharide from Gram-negative bacteria, assuming its presence before pathologic reflux, might induce lower esophageal sphincter relaxation via activation of inducible nitric oxide. Alternatively the type II microbiome may develop as a result of reflux. However, the study demonstrates a complex microbiome that rivals that of the skin and mouth, and opens up a new avenue for investigation of the distal esophagus and reflux.

Diversity of 23S rRNA Genes within Individual Prokaryotic Genomes”  (PLoS ONE May 5, 2009;4:e5437)

There has been renewed interest in the use of the 23S rRNA gene in taxonomic classification, driven by the decrease in sequencing costs with next generation DNA sequencing technology (454 sequencing) and demand from the new Roadmap Initiative in the Human Microbiome Project ( Compared to 16S rRNA genes, 23S rRNA genes contain more characteristic sequence stretches due to a greater length, unique insertions and/or deletions, and possibly better phylogenetic resolution because of higher sequence variation. In this study, we analyzed the diversity among individual rRNA genes within a genome. Of 184 prokaryotic species containing multiple 23S rRNA genes, diversity was observed in 113 (61.4%) genomes (mean 0.40%, range 0.01%–4.04%). Intervening sequences (IVS), ranging between 9 and 1471 nt in size, were found in 7 species. T. tengcongensis was the only species in which intragenomic diversity >3% was observed among 4 paralogous 23S rRNA genes. Although classification using primary 23S rRNA sequences could be erroneous, significant diversity among paralogous 23S rRNA genes was observed only once in the 184 species analyzed, indicating little overall impact on the mainstream of 23S rRNA gene-based prokaryotic taxonomy.

-Supported by grants from the National Cancer Institute and the National Institute for Allergy and Infectious Diseases (UH2CA140233, R01CA97936, R01AI063477)

"PyNAST: A Flexible Tool for Aligning Sequences to a Template Alignment" (Bioinformatics November 13, 2009;1-2)

The Nearest Alignment Space Termination (NAST) tool is commonly used in sequence-based microbial ecology community analysis, but due to the limited portability of the original implementation, it has not been as widely adopted as possible. PyNAST is a complete re-implementation of NAST, which includes three convenient interfaces: a Mac OS X GUI, a command line interface, and a simple API. The availabilty of PyNAST will make the popular NAST algorithm more portable and thereby applicable to data sets orders of magnitude larger by allowing users to install PyNAST on their own hardware. Additionally, because users can align to arbitrary template alignments, a feature not available via the original NAST web interface, the NAST algorithm will be readily applicable to novel tasks outside of microbial community analysis. PyNAST is available at

- Partially supported by grants from the National Cancer Institute (UH2CA140233)

" Design of 16S rRNA gene primers for 454 pyrosequencing of human foregut microbiome " ( World Journal Gastroenterol September 7, 2010; 16(33):4135-4144)

AIM: To design and validate broad-range 16S rRNA primers for use in high throughput sequencing to classify bacteria isolated from the human foregut microbiome.

METHODS: A foregut microbiome dataset was constructed using 16S rRNA gene sequences obtained from oral, esophageal, and gastric microbiomes produced by Sanger sequencing in previous studies represented by 219 bacterial species. Candidate primers evaluated were from the European rRNA database. To assess the effect of sequence length on accuracy of classification, 16S rRNA genes of various lengths were created by trimming the full length sequences. Sequences spanning various hypervariable regions were selected to simulate the amplicons that would be obtained using possible primer pairs. The sequences were compared with full length 16S rRNA genes for accuracy in taxonomic classification using online software at the Ribosomal Database Project (RDP). The universality of the primer set was evaluated using the RDP 16S rRNA database composed which is comprised of 433 306 16S rRNA genes, represented by 36 phyla.

RESULTS: Truncation to 100 nucleotides (nt) downstream from the position corresponding to base 28 in the Escherichia coli 16S rRNA gene caused misclassification of 87 (39.7%) of the 219 sequences, compared with misclassification of only 29 (13.2%) sequences with truncation to 350 nt. Among 350-nt sequence reads within various regions of the 16S rRNA gene, the reverse read of an amplicon generated using the 343F/798R primers had the least (8.2%) effect on classification. In comparison, truncation to 900 nt mimicking single pass Sanger reads misclassified 5.0% of the 219 sequences. The 343F/798R amplicon accurately assigned 91.8% of the 219 sequences at the species level. Weighted by abundance of the species in the esophageal dataset, the 343F/798R amplicon yielded similar classification accuracy without a significant loss in species coverage (92%). Modification of the 343F/798R primers to 347F/803R increased their universality among foregut species. Assuming that a typical PCR can tolerate 2 mismatches between a primer and a template, the modified 347F and 803R primers should be able to anneal 98% and 99.6% of all 16S rRNA genes in the RDP database.

CONCLUSION: 347F/803R is the most suitable pair of primers for classification of foregut 16S r RNA genes but also possess university suitable for analyses of other complex microbiomes.

-Supported by grants from the National Cancer Institute  (UH2CA140233)

                    Reviewers' Comments:

This is a very carefully designed approach for the selection and optimization of a primer pair to PCR amplify fragments from the 16S rRNA for deep sequencing, and thereby high throughput analyses of the gut microbiome. The method provides an important tool for the elucidation of the role of the human foregut microbiome in health and disease.

The study design and approach are sound and may help to replace the labor-intensive, time-consuming Sanger sequencing method. More importantly, this may lead to a detection method for foregut disease conditions.

The findings of this study are significant, as it provides a comparatively cheap and rapid way of identifying foregut bacteria.

"Diversity of 16S rRNA genes within individual prokaryotic genome"(Applied and Environmental Microbiology 2010; 76: no. 6: 3886-3897)

Abstract: Analysis of intragenomic variation of 16S rRNA genes is a unique approach to examining the concept of ribosomal constraints on rRNA genes; the degree of variation is an important parameter to consider for estimation of the diversity of a complex microbiome in the recently initiated Human Microbiome Project ( The current GenBank database has a collection of 883 prokaryotic genomes representing 568 unique species, of which 425 speices contained 2 to 15 copies of 16S rRNA genes per genome (2.22±0.81). Sequence diversity among the 16S rRNA genes in a genome was found in 235 species (0.06%±20.38%, 0.55%±1.46%). Compared with the 16S rRNA-based threshold for operational definition of species (1-1.3% identify), the diversity was borderline (between 1% and 1.3%) in 10 species and >1.3% in 14 species. The diversified 16S rRNA genes in Haloarcula marismortui (diversity 5.63%) and Thermoanaerobacter tengcongensis (6.70%) were highly conserved at the 2º structure level, while the diversified gene in B. afzelii (20.38%) appears to be a pseudogene. The diversified genes in the remaining 21 species were also conserved, except for a truncated 16S rRNA gene in Candidatus Protochlamydia amoebophila. Thus, this survey of intragenomic diversity of 16S rRNA genes provides strong evidence supporting the theory of ribosomal constraint. Taxonomic classification using the 16S rRNA-based operational threshold could misclassify a number of species into more than one species, leading to an overestimation of the diversity of a complex microbiome. This phenomenon is especially seen in 7 bacterial species associated with the human microbiome or diseases.

-Supported by grants from the National Cancer Institute  (UH2CA140233)

Reviewer's Comments:

In Bacteria and Archaea, RNA genes are often present in multiple copies and these copies are not necessarily completely identical. This manuscript presents an in-depth analysis of intragenomic variation of the 16S rRNA gene. Although there are not surprises (in the vast majority of organisms, the variation is minimal), a handful of organisms exhibit significantly more variation.

In this straightforward but useful analysis, the authors analyze the set of complete genomes currently available and ask whether the diversity among 16S rRNA sequences within a single genome is likely to cause problems with estimates of microbial diversity. In general the answer is no, although there are a few species in which variability between copies is high so some caution is warranted. The work is high-quality, interesting, especially with its role in supporting the Human Microbiome Project and environmental sequencing projects.

The manuscript by Pei and colleagues is a fascinating analysis of intra-genomic 16S rRNA gene heterogeneity. This is a very rich dataset and opportunity to address several interesting questions.        

"Simrank: Rapid and sensitive general-purpose k-mer search tool"(Submitted to BMC Ecology July 2010; ID 755231864419022)

Background: Terabyte-scale collections of string-encoded data are expected from consortia efforts such as the Human Microbiome Project( Intra- and Inter-project data similarity searches are enabled by rapid k-mer matching strategies. Software applications for sequence database partitioning, guide tree estimation, molecular classification and alignment acceleration have benefited from embedded k-mer searches as sub-routines. However, a rapid, general-purpose, opensource, flexible, stand-alone k-mer tool has not been available.

Results: Here we present a stand-alone utility, Simrank, which allows users to rapidly identify database strings most similar to query strings. Performance testing of Simrank and related tools against DNA, RNA, Protein and human-languages found Simrank 10X to 928X faster depending on the dataset.

Conclusions: Simrank provides molecular ecologists with a high-throughput choice for comparing large sequence sets to find similarity.