Automatic login for scraping process

Beau Cowan asked on March 25, 2014 16:58

I'm working on a project where I'm basically needing to scrape the documents from a selected point in the document tree and export the raw html code to a tmp directory. I can create the directories just fine, but whenever I try to write the documents themselves, I get the login page instead.

All pages have permissions on them from certain users, including the administrator. This needs to remain in place. I'm not sure if there is something that I can do with either a Stream or HtmlAgilityPack that allows me to login automatically and get the content that I need, or if I need to find a different method to obtain the results.

Recent Answers

Filip Ligač answered on April 12, 2014 17:49

Hi Beau,

You mentioned permissions for users need to stay in place but do you need this process to be made under some other account than administrator's? Couldn't the reason for login page showing up simply be caused by the fact that you are maybe trying to execute the actions under a public user account?

Also, if you could place a code snippet with what you are trying to accomplish here, it would be quite helpful, too.

Thanks.

0 votesVote for this answer Mark as a Correct answer

Beau Cowan answered on April 14, 2014 12:41

I found the solution to my problem a while ago, sorry that I didn't reply to my own question when I found the answer.

I basically did a WebRequest and injected the current cookies that I'm using as an administrator into that request, so I will not have to try to login to get the pages that I need, regardless of permissions. I can then edit the pages as I need to for the export using HtmlDocument.

Thank you for your help.

0 votesVote for this answer Mark as a Correct answer