Tuesday, September 21, 2010

Get HTML Source Code

The HTML source code of a website can be readout by the class WebClient. This provides different functions to send and receive internet ressources and is located in the namespace System.Net, using System.Net is required.
To get the source code 2 functions can be used, DownloadString(string url) and DownloadFile(string url, string filename).
The first one downloads the source code of the given address and returns this as a string, the second saves the source code in the given file.
The following example reads the source code of the RSS feed of this blog:

WebClient Webclient1 = new WebClient();
string SourceCode = Webclient1.DownloadString("http://csharp-tricks.blogspot.com/feeds/posts/default?orderby=updated");

In some cases for the right display the coding has to be changed. The WebClient downloads the requested resource in the form of byte arrays, these then are converted with some coding (the most common one for example is UTF-8) into a string.
To change the coding, before downloading the property Encoding of the class WebClient has to be changed.

No comments:

Post a Comment