DiASPora

Digital Approaches for the Synthesis of Poorly Accessible Biodiversity Information

Genomes for BacDive strains

As part of the DiASPora project, genomes from various ressources were associated to BacDive strains. This was necessary to provide the basis for genome-based predictions, as they will be developed in WP 3 and WP 4.

The matching is based on culture collection numbers and NCBI Taxonomy-IDs. A validation step uses species names and all of their known basonyms to verify the correctness of the matching procedure.

Result

We were able to associate almost 255,000 genomes from NCBI, PATRIC, and IMG to almost 10,000 BacDive strains. The following table gives an overview on all the data that were gained in this process. Beforehand, only 1,333 BacDive strains had genome associations, signifying a 7.5-fold increase. It is noticable that almost two-thirds of all bacterial type strains have a sequenced genome (If you do not know what a type strain is, read more here).

Data type Number of entries
Genomes from all ressources 254,966
Genomes from NCBI 112,799
Genomes from PATRIC 136,863
Genomes from IMG 5,304
Strains with genomes 9,970
Complete genomes 18,477
Strains with complete genome 2,738
Type strains in BacDive 14,091
Type strains with genome 8,823