Home > C# > Advance Screen Scrapping

Advance Screen Scrapping

February 16th, 2009

In the Previous article we disscuss the Simple Screen Scraping task using C#. Now we are moving towards advance Screen Scrapping in this we will Scrap the data which required user to login first. This is a very interesting and challenging task. In the following code we will scrap data from the elance site.

// Create the Web Request Object and pass it the URL of the Login page.
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(”https://secure.elance.com/php/reg/main/signInAHR.php”);

// Set that request is coming from some browser e.g. Fire Fox.

req.UserAgent = “User-Agent=Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.5) Gecko/2008120122 Firefox/3.0.5″;

// assign a new cookie to the cookie container.

req.CookieContainer = new CookieContainer();

// Set values for the request

req.Method = “POST”;
req.ContentType = “application/x-www-form-urlencoded”;
req.KeepAlive = true;
req.Accept = “text/javascript, text/html, application/xml, text/xml, */*”;
req.Referer = “https://secure.elance.com/php/reg/main/signInIframe.php”;

Now the most trickey section we have to send the user name and password to the requested page.I almost every case the user name and password are sent as post data. so we have to create a post data accordingly.

strNewValue = “referrer=http://www.elance.com/p/landing/provider.html”;
strNewValue += “&mode=signin”;
strNewValue += “&login_name=elance_User”;
strNewValue += “&password=elance_pwd”;
strNewValue += “&email1=”;
strNewValue += “&email2=”;

byte[] byteArray = Encoding.UTF8.GetBytes(strNewValue);

// Set the ContentLength property of the WebRequest.
req.ContentLength = byteArray.Length;

// Get the request stream.
Stream dataStream = req.GetRequestStream();

// Write the data to the request stream.
dataStream.Write(byteArray, 0, byteArray.Length);
dataStream.Close();

// Do the request to get the response
HttpWebResponse wr = (HttpWebResponse)req.GetResponse();

// get the Login Cookie
CookieContainer ccTemp = req.CookieContainer;

Now you can use this cookie Container with every further request you send to the site and in response you will get the logged-in data.

C# , , , , ,

  1. May 1st, 2009 at 08:56 | #1

    Nice Article! Just what I needed - good to see a screen scraper article go into more detail.

  1. No trackbacks yet.