The WebGrabber object is meant to help you to parse table structured data into a database. This is not only usefull for html files with data in a table but it can also be used to parse a csv file into the database. An early version of this code has been used in my BeursMonitor application for grabbing stock data from the web.

Besides the base methods URL, TextToArray and AddTableToRecordset there are lots of functions that can help you cleaning the result up. In the sample below you can already see a selection of these functions. See the samples an the source code for more of them.

Click here to download the complete source code, the compiled DLL and the demo.

set WebGrabber = CreateObject("EVICT_Webgrabber.WebGrabber")
' Setting the URL will also trigger the download from the web.
WebGrabber.URL = "http://www.microsoft.com/Seminar/MMCFeed/MMCDisplay.asp?lang=en&pf=100504&frame=true"
MsgBox "Ready with downloading the web page."
' Parse the table inside the string to an array
WebGrabber.TextToArray "<TR", "<TD", True
' This will put the date into an extra column
WebGrabber.SplitColumn 1, "</A>>lt;/B><BR>"
' This will put the speed into an extra column
WebGrabber.SplitColumn 1, "target=_self>"
' This will extract a text and add this into an extra column
WebGrabber.ExtractTextBetween 1, "('", "');"
' We only want the lines with the seminairs
WebGrabber.RemoveLinesNotContaining 10, "/Seminar/"
' This will insert a text before an other text
WebGrabber.ReplaceText 10, "/Seminar/", "http://www.microsoft.com/Seminar/"
' Just for fun grap all the domain names and put it in an extra column.
WebGrabber.RegularExpresssion 10, "[a-z0-9_-]{1,64}\.[a-z0-9_-]{1,64}\.[a-z0-9_-]{1,64}", ";"
' This will split the title and the discription
WebGrabber.SplitColumn 3, vbCrLf & vbTab
' Remove all the HTML tags from the table
WebGrabber.RemoveHTMLtags
' Use this database connect string
WebGrabber.DSN = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=Seminairs.mdb;Mode=ReadWrite"
' Use this reccordset
WebGrabber.Recordset = "tblSeminairs"
' First clear the old data
WebGrabber.RunSQL "Delete * from tblSeminairs"
' Then add the data in the table table to the recordset
WebGrabber.AddTableToRecordset 0, 0, "Quality=1;Title=3;Published=5;Category=6;Level=7;Duration=9;URL=10;Description=11"
MsgBox "Ready with parsing the webpage into the database."
  • Share/Bookmark