A recent article reported the recovery of close to 8000 genomes from metagenomes. These can be found at NCBI‘s RefSeq genome database. Anyway, here’s how I’ve been checking the number of genomes available.

The pieces of information needed are:

OK. So here’s how I do this. I run these commands, and here’s what they give me today:

% wget ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt
--2017-10-18 09:12:05--  ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt
           => ‘prokaryotes.txt’
Resolving ftp.ncbi.nlm.nih.gov... 130.14.250.12, 2607:f220:41e:250::7
Connecting to ftp.ncbi.nlm.nih.gov|130.14.250.12|:21... connected.
Logging in as anonymous ... Logged in!
==> SYST ... done.    ==> PWD ... done.
==> TYPE I ... done.  ==> CWD (1) /genomes/GENOME_REPORTS ... done.
==> SIZE prokaryotes.txt ... 36361516
==> PASV ... done.    ==> RETR prokaryotes.txt ... done.
Length: 36361516 (35M) (unauthoritative)

prokaryotes.txt        100%[==========================>]  34.68M  2.94MB/s    in 13s     

2017-10-18 09:12:18 (2.71 MB/s) - ‘prokaryotes.txt’ saved [36361516]

% ls -l prokaryotes.txt 
-rw-r-----  1 xxx  staff  36361516 Oct 18 09:12 prokaryotes.txt
% grep PRJNA348753 prokaryotes.txt | wc -l
    7898
%

So there you have it! Today, October 18th, 2017, there’s 7898 of the 7903 genomes reported. Almost all of them! When I started counting, last week, I found around 5000. On October 5th there was only 2800. Things move fast.

Best!
-SuperGabo

Leave a comment