Task: Download all technorati politics blogs listings, import them into FM pro and parse each blog
First, figure out how many pages actually hold blogs. Not all pages in the listing will have actual blogs. In the politics set, There are 697 pages listed but the last blog is on page 691.
In terminal, create a folder and download the file using wget
wget http://www.technorati.com/blogs/directory/politics/page-{1..691}
Filemaker won't import a folder without the files having certain endings, a little bash work and that's done
for i in *; do mv "$i" "$i.txt"; done
Now off to filemaker. I create a new database and import a folder, choosing text files as the file type and tell it to create a new table. The text content of each file is loaded into the new database along with the name of the file and the location of the import. Once that's done, all that's left is to parse out the individual blogs.
But that's a tale for part II
No comments:
Post a Comment