Digital Approaches for the Synthesis of Poorly Accessible Biodiversity Information
Genomes for BacDive strains
As part of the DiASPora project, genomes from various ressources were associated to BacDive strains. This was necessary to provide the basis for genome-based predictions, as they will be developed in WP 3 and WP 4.
The matching is based on culture collection numbers and NCBI Taxonomy-IDs. A validation step uses species names and all of their known basonyms to verify the correctness of the matching procedure.
Result
We were able to associate almost 255,000 genomes from NCBI, PATRIC, and IMG to almost 10,000 BacDive strains. The following table gives an overview on all the data that were gained in this process. Beforehand, only 1,333 BacDive strains had genome associations, signifying a 7.5-fold increase. It is noticable that almost two-thirds of all bacterial type strains have a sequenced genome (If you do not know what a type strain is, read more here).
Data type | Number of entries |
---|---|
Genomes from all ressources | 254,966 |
Genomes from NCBI | 112,799 |
Genomes from PATRIC | 136,863 |
Genomes from IMG | 5,304 |
Strains with genomes | 9,970 |
Complete genomes | 18,477 |
Strains with complete genome | 2,738 |
Type strains in BacDive | 14,091 |
Type strains with genome | 8,823 |