Requirements
Windows Vista/7/8/10
64 MB RAM
100 MB Hard Disk Space
Stable Internet Connection
WDE Versions
- v8.3 (Released 27.12.2011)
- v8.2 (Released 06.07.2011)
- v8.1 (Released 27.12.2010)
- v8.0 (Released 01.07.2010)
- v7.3 (Released 17.06.2009)
- v7.2 (Released 10.04.2009)
- v7.1 (Released 15.04.2008)
- v6.0 (Released 17.04.2007)
- v5.0 (Released 06.07.2006)
- v4.3 (Released 20.09.2005)
- v4.0 (Released 11.12.2004)
WDE Pro Versions
- v3.10 (Released 06.01.2020)
- v3.9 (Released 30.12.2018)
- v3.8 (Released 29.12.2017)
- v3.7 (Released 28.02.2017)
- v3.6 (Released 22.08.2016)
- v3.5 (Released 28.10.2015)
- v3.4 (Released 03.09.2015)
- v3.3 (Released 05.05.2015)
- v3.2 (Released 30.12.2014)
- v3.1 (Released 05.09.2014)
- v3.0 (Released 23.06.2014)
- v2.3 (Released 08.01.2014)
- v2.2 (Released 15.05.2013)
- v2.1 (Released 27.12.2012)
- v2.0 (Released 29.08.2012)
- v1.2 (Released 07.06.2012)
- v1.1 (Released 04.04.2012)
- v1.0 (Released 12.03.2012)
What are people saying?
Student License
Recently I was looking for a url extractor program because I am building a search engine and I have tried near 20 of them, all with some missing features. But WDE seems to be the best one right now. I am just a student and the normal price is way high for me. Do you have any student license to offer? I would really appreciate it.
Best Regards, Jeffrey Richardson, Sweden
Web Addresses
We downloaded and ran the trial version of your web link extractor. I compared it to another email extractor program and yours kicked it's butt. Your's scanned 9000 files while finding over 1500 links vs. the other only scanned 1200 file, and found only about 400 links. (This was using the exact same search file).
Thanks, Mark Jeter
Email List Management
A perfect tool for email marketing mailing lists creation, processing and management.
ListMotor
URL, Meta Tag Extractor Module:
WDE - URL, Meta Tag Extractor module is designed to extract URL, meta tag (title, description, keyword) from web-pages, search results, open web directories, list of urls from local file. It is an industrial strength, fast and reliable way to extract specific information from the websites, web dirs - google, dmoz, yahoo, etc.
It has various limiters of scanning range - url filter, page text filter, domain filter - using which you can extract only the links or data you actually need from web pages, instead of extracting all the links present there, as a result, you create your own custom and targeted data base of urls/links collection. It has option to save extracted links directly to disk file, so there is no limit in number of link extraction per session. It supports operation through proxy-server and works very fast, as it is able of loading several pages simultaneously, and requires very few resources. Powerful link spider harvester, extraction tools for responsible website promotion and internet marketing.
You can setup different type of extraction with this UNIQUE spider, link extractor:
Search Keyword | WebSite | Web Directories | List of URLs from File
Key words:
WDE spiders 18+ Search engines for right web sites and get data from them.
Quick Start:
Select "Search Engines" source - Enter keyword - Click OK
What WDE Does: WDE will query 18+ popular search engines, extract all matching URLs from search results, remove duplicate URLs and finally visits those websites and extract data from there.
You can tell WDE how many search engines to use. Click "Engines" button and uncheck listing that you do not want to use. You can add other engine sources as well.
WDE send queries to search engines to get matching website URLs. Next it visits those matching websites for data extraction. How many deep it spiders in the matching websites depends on "Depth" setting of "External Site" tab.
DEPTH:
Here you need to tell WDE - how many levels to dig down within the specified website. If you want WDE to stay within first page, just select "Process First Page Only". A setting of "0" will process and look for data in whole website. A setting of "1" will process index or home page with associated files under root dir only.
For example: WDE is going to visit URL http://www.xyz.com/product/milk/ for data extraction.
Lets say www.xyz.com has following text/html pages:
- http://www.xyz.com/
- http://www.xyz.com/contact.htm
- http://www.xyz.com/about.htm
- http://www.xyz.com/product/
- http://www.xyz.com/product/support.htm
- http://www.xyz.com/product/milk/
- http://www.xyz.com/product/water/
- http://www.xyz.com/product/milk/baby/
- http://www.xyz.com/product/milk/baby/page1.htm
- http://www.xyz.com/product/milk/baby/page2.htm
- http://www.xyz.com/product/water/mineral/
- http://www.xyz.com/product/water/mineral/news.htm
WDE is powerful and fully featured unique spider! You need to decide how deep you want WDE to look for data.
WDE can retrieve: | Set options: |
---|---|
Only matching URL page of search ( URL #6 ) | Select "Process First Page Only" |
Entire milk dir (URL #6 - 10 ) | Select "Depth=0" and check "Stay within Full URL" |
Entire www.xyz.com site | Select "Depth=0" |
Only www.xyz.com page | Select "Process First Page Only" and check "Spider Base URL Only" |
Only root dir file (URL #1 - 3) | Select "Depth=1" |
Only URL #1 - 5 | Select "Depth=2" |
Spider Base URL Only:
With this option you can tell WDE to process always the Base URLs of external sites. For example: in above case, if an external site found like http://www.xyz.com/product/milk/ then WDE will grab only base www.xyz.com. It will not visit http://www.xyz.com/product/milk/ unless you set such depth that covers also milk dir.
Ignore Case of URLs:
Set this option to avoid duplicate URLs like
http://www.xyz.com/product/milk/
http://www.xyz.com/Product/Milk/
These 2 URLs are same. When you set to ignore URLs case, then WDE convert all URLs to lowercase and can remove duplicate URLs like above. However - some servers are case-sensitive and you should not use this option on those special sites.
WebSites:
Enter website URL and extract all data found in that site.
Quick Start:
Select 2nd option "WebSite/Dir/Groups" - Enter website URL - Select Depth - Click OK
What WDE Does: WDE will retrieve html/text pages of the website according to the Depth you specified and extract all data found in those pages.
# By default, WDE will stay only the current domain. |
---|
# WDE can also follow external sites! If you want WDE to retrieve files of external sites that are linked from starting site specified in "General" tab, then you need to set "Follow External URLs" of "External Site" tab. In this case, by default, WDE will follow external sites only once, that is - (1) WDE will process starting address and (2) all external sites found in starting address. It will not follow all external sites found in (2) and so on... WDE is powerful, if you want WDE to follow external sites with unlimited loop, select "Unlimited" in "Spider External URls Loop" combo box, and remember you need to manually stop WDE session, because this way WDE can travel entire internet. |
Directories:
Choose yahoo, google, dmoz or other directory and get all data from there.
Quick Start & What WDE Does:
Lets say you want to extract data of all companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
Action #1A: Select 2nd option "WebSite/Dir/Groups" - enter this URL in "Starting Address" box - select "Process First Page Only"
Or, lets say you want to extract data of all companies listed at
http://directory.google.com/Top/Computers/Software/Freeware/
plus all down level folders like
http://directory.google.com/Top/Computers/Software/Freeware/windows
http://directory.google.com/Top/Computers/Software/Freeware/windows/browser
http://directory.google.com/Top/Computers/Software/Freeware/linux
etc....
Action #1B: Select 2nd option "WebSite/Dir/Groups" - enter URL http://directory.google.com/Top/Computers/Software/Freeware/ in "Starting Address" box - select Depth=0 and "Stay within Full URL" option.
With these actions WDE will download http://directory.google.com/Top/Computers/Software/Freeware/page and optionally all down level pages and will build a URLs list of companies listed there.
Now you want WDE to visit all those URLs and extract all data found in those sites.
Action #2: So after either above action you must move to "External Site" tab and check "Follow External URLs" option. (Remember: this setting tells WDE to process/follow/visit all URLs found while processing "Starting Address" of "General" tab).
List of URL:
Enter hundreds/thousands of URLs to extract data found on those sites.
Quick Start:
Select 3rd option "URLs from File" - Enter file name that contains all URLs list - Select Depth - Click OK
What WDE Does: WDE will scan the contents of specified file. This file must have URL line-by-line, other format is not supported, WDE will accept only lines that starts with http:// text. Also it will not accept URLs that point to image/binary files, because those files will not have any data.
After building unique URL list form above file, WDE will process website one-by-one according to the depth you specify.
Frequently Asked Questions
Does this extractor require 'Internet Explorer'?
I set-up a project with "URLs from File" extraction, enter the filename - but WDE can not find any link in the file?
When I aim this extractor at http://dmoz.org/Kids_and_Teens/Computers/Internet/ I would expect to see all links listed there with descriptions. How come?
When I run WDE link extractor, it sucks all my computer power, screen is hardly refreshing?
Can I resume an interrupted session in WDE?
How I can add search engine listing other than those specified in Engine Listing dialog?
Does this extractor require 'Internet Explorer'?
No. It doesn't require any third party software/library.
I set-up a project with "URLs from File" extraction, enter the filename - but WDE can not find any link in the file?
Make sure the file exist in disk. The file must have URL line-by-line, other format is not supported, WDE will accept only lines that starts with http:// text. Also WDE will not accept URLs that point to image/binary files, because those files will not have any text data to extract.
When I aim this extractor at http://dmoz.org/Kids_and_Teens/Computers/Internet/ I would expect to see all links listed there with descriptions. How come?
After entering http://dmoz.org/Kids_and_Teens/Computers/Internet/ in starting address box, move to "External Site" tab and check "Follow External URLs" option. This option tells 'Web Data Extractor' to visit all linked site and extract title and other info.
When I run WDE link extractor, it sucks all my computer power, screen is hardly refreshing?
It seems you are using high number of threads. Decrease the thread value to "5" in "New Session - Other" tab. WDE can launch multiple threads simultaneously. But remember, too high a thread setting may be too much for your computer and/or internet connection to handle it and also puts an unfair load on the host server which may slow the process down.
Can I resume an interrupted session in WDE?
Yes. Use 'File - Open' command to open a previously stopped session.
How I can add search engine listing other than those specified in Engine Listing dialog?
It is easy. In "URL" field type the search query URL. Replace the search keyword part with WDE syntax {SEARCH_KEYWORD}
For Example: an AOL query URL with "Flower Shop" search is:
http://search.aol.com/dirsearch.adp?query=Flower+Shop
You just replace Flower+Shop part with {SEARCH_KEYWORD} like following:
http://search.aol.com/dirsearch.adp?query={SEARCH_KEYWORD}
After adding the new engine list, click "Save" button.