RSS

Monthly Archives: August 2009

Extracting HTML source from a URL website

Was just thinking of trying something short and sweet and thought of trying out a snippet for extracting code from the entered url.
Following is the code have not declared the namespaces on top but used them directly in the code to bring more clarity on which namespace the object comes from.

The code is self explanatory so wont add any explanations over here..

</// <summary>
/// Extracts the source from the url entered.
/// </summary>
/// <param name="url">url to fetch the source from.</param>
/// <returns>string: source for the url entered.</returns>
public static string GetHtmlPageSource(string url)
{

System.IO.Stream st = null;
System.IO.StreamReader sr = null;

try
{
// make a Web request
System.Net.WebRequest req = System.Net.WebRequest.Create(url);

// get the response and read from the result stream
System.Net.WebResponse resp = req.GetResponse();
st = resp.GetResponseStream();
sr = new System.IO.StreamReader(st);
// read all the text in it
return sr.ReadToEnd();
}
catch (Exception ex)
{
return string.Empty;
}
finally
{
// close the stream & reader objects.
sr.Close();
st.Close();
}
}

UPDATE:

If you need to authenticate the request use the following just before you make the request to read the source

// authenticate using the credentials passed for getting access to the page.
if (username != null && password != null)
req.Credentials = new System.Net.NetworkCredential(username, password);
// get the response and read from the result stream
.
.
.

 
1 Comment

Posted by on August 16, 2009 in .NET, Code Snippets, Problem Solving

 

Tags: , , ,