The Article Scraper (Part 1 of the BH MMS)
Here’s the first part of a four-part-series about building your own blackhat money-making system (BH MMS).
The Article Scraper simply scrapes articles from sites and outputs them in SQL format ready to be imported into your Engine database. First, let’s cover the Scraper user-interface and then usage details after.

1. Table Name: this is simply a name of a table under which your articles will be stored. You’ll create a new table for every niche you target. For instance, you could create tables for: golf, playstation, mortgage, insurance, internet_marketing, furniture, jewellery, etc.
2. SQL Output File Name: The name of the output file with an SQL extension.
3. Maximum Pages Per URL: This simply limits the number of pages scraped from a specific site. Some sites contain forums, so limiting the number of pages spidered prevents the Scraper from scraping an entire forum which could take days!
4. Maximum Page Size: Some sites have very large pages which would certainly slow down the Scraper, so this option limits the size of scraped pages.
5. URLs: A list of sites to be scraped. One URL per line with the ‘http’ prefix.
6. Start (button): After entering the above settings, the Start button starts the process of scraping. When the scraping process is complete and the SQL file created, a ‘finished’ dialog box is displayed.
The Scraper can be downloaded here, unzip and then click Setup to install. If you have problems with the SpiderXlib ActiveX control, download it from here and install.
Your firewall may block the Scraper from accessing the Web, in which case you might get an ‘Invalid URL’ message which indicates that a specified URL is invalid or it cannot access the Web, if this happens, modify your firewall settings.
Here’s the Article Scraper source code (written in vb.net). Note there is little error checking, and I’ve kept it as simple as possible.
If you start playing with the Scraper soon, you’ll be able to use it properly when the other parts of the BH MMS are published. Simply, create a MySQL database, add a table with two fields – ID and articles: ‘ID’ is auto-increment and the primary key, while ‘articles’ is simply a text field to hold the articles. Then scrape a few sites and import the articles into the database.
When you’ve got some experience of using the Scraper, your goal is to scrape about several thousand articles per niche, then import them into the Engine datbase. This will provide the Engine algorithm with enough content to generate hundreds of thousands (or even millions) of unique posts. I’ll explain more when I publish Part 2.
Update: I recommend you install the SpiderXlib ActiveX control here before running the Scraper (and obviously you must have .NET 2.0 installed as well). Let me know how you get on.
