Following WebSite Redirects


In order to check header values from a HTTP get sometimes you need to follow a Re-direct to be able to inspect those headers.

This post will show how you how to follow web site redirects:

First we start with creating a Web request object from System.Net.WebRequest

$url = "some url that has redirects"
$request = [System.Net.WebRequest]::Create($url)

With the object for webrequest in the variable request we can set the properties of that object and set the property AllowAutoRedirect to false.

$url = "https://www.google.com"
$request = [System.Net.WebRequest]::Create($url)

Now ask the $request object for a response.

$response=$request.GetResponse()

if($response.Statuscode.toString() -eq "Found")
{}

For each type of response (HttpWebResponse), we can find the methods and properties for that class and in turn take action on them. The first one to take action is the status code. There are a number of response types that you can check for in our case we want to check for any of the status’ that would cause a redirect. Which for the sake of demonstration we’ll start with “Redirect — 302”.

elseif($response.statuscode.tostring() -eq 'Redirect')
{
  $stream = $response.getresponseStream()
  $streamReader = [System.IO.Streamreader]::new($stream)
  $html = $streamReader.ReadtoEnd()
}

When a redirect is detected, the html that is with the redirect must be read to know where the redirect is going to. This is done through creation of a Stream reader to read the stream on the response object. To find the redirect path a small funtion is introduced to read the html object and find the path to where we need to follow the next redirect:

function Get-RedirectPath
{
param($html)
$h = convertfrom-html $html
if((($h.all.tags('Title)) | select-Object -expandProperty text).toString() -eq "Object moved")
{
   ($H.all.tags('a')) | select-object -first 1 -Expandproperty pathname
}
elseif((($h.all.tags('Title)) | select-Object -expandProperty text).toString() -eq "Document moved")
{
  ($H.all.tags('a')) | select-object -first 1 -Expandproperty pathname
}
else
{$null}
}

This function takes the html raw text and converts it to html. When converted to Html we can then look through the tags find the title and look for Document or object moved and get the value in the ‘a’ tag for where the redirect is to. In addition to following the redirect the other requirement was to get the headers in each redirect. Get-Headers is a function created for just this purpose.

Function Get-headers
{
   param([System.Net.HttpWebResponse]$HttpWebResponse)
   $headerHash = @{}
  foreach($header in $HttpWebResponse.headers)
  {
   $headerhash += @{$header = $response.GeteEsponseHeader($header)}
  }
 $headerHash
}

Get-Headers expects and object type of System.Net.HttpWebResponse. Since we have that object type we can call one of its methods to get the response headers. And add each header to a hash table.

If there are no headers then an empty Hashtable will be returned.

Lastly for pass back to the caller the other requirement was to include the $html, $headerhash, $redirect status code, and the date time. In a custom object so decisions could be made on whether or not to follow into the next redirect. Redirect status code is only added when we have a Redirect any other status code that isn’t a redirect will not contain this property.

The full completed scripts are found in this GIST:

Function Get-RedirectedUrl
{
param($url)
function Get-RedirectPath
{
param($html)
$h = convertfrom-html $html
if((($h.all.tags('Title')) | select-Object -expandProperty text).toString() -eq "Object moved")
{
($H.all.tags('a')) | select-object -first 1 -Expandproperty pathname
}
elseif((($h.all.tags('Title')) | select-Object -expandProperty text).toString() -eq "Document moved")
{
($H.all.tags('a')) | select-object -first 1 -Expandproperty pathname
}
else
{$null}
}
Function Get-headers
{
param([System.Net.HttpWebResponse]$HttpWebResponse)
$headerHash = @{}
foreach($header in $HttpWebResponse.headers)
{
$headerhash += @{$header = $response.GetresponseHeader($header)}
}
$headerHash
}
function ConvertFrom-html
{
param([string]$html)
$h = new-object -com "HTMLFile"
$H.IHTMLDocument2_write($html)
$h
}
$request = [System.Net.WebRequest]::Create($url)
$response=$request.GetResponse()
$stream = $response.GetREsponseStream()
$streamREader = [System.IO.StreamREader]::new($stream)
$html = $streamREader.ReadToEnd()
if($response.Statuscode.toString() -eq "Found")
{
$headers = get-headers $response
$headers += @{html = Convertfrom-html $html}
$headers +=@{datetime = Get-Date}
[pscustomobject]$headers
}
elseif($response.Statuscode.toString() -eq "MovedPermanently")
{
$headers = get-headers $response
$headers +=@{redirect = $redirect}
$headers += @{html = Convertfrom-html $html}
$headers +=@{datetime = Get-Date}
[pscustomobject]$headers
}
elseif($response.Statuscode.toString() -eq "Redirect")
{
$headers = get-headers $response
$headers +=@{redirect = $redirect}
$headers += @{html = Convertfrom-html $html}
$headers +=@{datetime = Get-Date}
[pscustomobject]$headers
}
elseif($response.Statuscode.toString() -eq "OK")
{
$headers = get-headers $response
$headers +=@{redirect = $redirect}
$headers += @{html = Convertfrom-html $html}
$headers +=@{datetime = Get-Date}
[pscustomobject]$headers
}
if($streamREader)
{$streamREader.close()}
if($response)
{$response.close()}
}
function get-redirectedUrls
{
Param($url)
$urlcheckobject=@{}
$Uri = [uri]$url
$url2check = $url
do{
$value = get-redirectedUrl -url $url2check
if($value.redirect)
{
$Url2check = "$(uri.scheme)://$($uri.Host)/$($value.redirect)"
$value |add-member -MemberType NoteProperty -name url -value $Url2check
$urlcheckobject += $value
}
else
{
$url2check = $Null
}
}
until ($url2check -eq $null)
$urlcheckobject
}

Leave a comment