A recent article reported the recovery of close to 8000 genomes from metagenomes. These can be found at NCBI‘s RefSeq genome database. Anyway, here’s how I’ve been checking the number of genomes available.
The pieces of information needed are:
- The project number for these genomes (found in the article): PRJNA348753.
- The address for NCBI’s RefSeq genome database: ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt
- A UNIX terminal. 🙂
OK. So here’s how I do this. I run these commands, and here’s what they give me today:
% wget ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt --2017-10-18 09:12:05-- ftp://ftp.ncbi.nlm.nih.gov/genomes/GENOME_REPORTS/prokaryotes.txt => ‘prokaryotes.txt’ Resolving ftp.ncbi.nlm.nih.gov... 130.14.250.12, 2607:f220:41e:250::7 Connecting to ftp.ncbi.nlm.nih.gov|130.14.250.12|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /genomes/GENOME_REPORTS ... done. ==> SIZE prokaryotes.txt ... 36361516 ==> PASV ... done. ==> RETR prokaryotes.txt ... done. Length: 36361516 (35M) (unauthoritative) prokaryotes.txt 100%[==========================>] 34.68M 2.94MB/s in 13s 2017-10-18 09:12:18 (2.71 MB/s) - ‘prokaryotes.txt’ saved [36361516] % ls -l prokaryotes.txt -rw-r----- 1 xxx staff 36361516 Oct 18 09:12 prokaryotes.txt % grep PRJNA348753 prokaryotes.txt | wc -l 7898 %
So there you have it! Today, October 18th, 2017, there’s 7898 of the 7903 genomes reported. Almost all of them! When I started counting, last week, I found around 5000. On October 5th there was only 2800. Things move fast.
Best!
-SuperGabo