Capture Web Page / HTML to JPG


I’m a member of an American Legion and as such I’ve been working with them on displaying images on screens for schedules and such.  So for a while I’ve been using various programs to capture an image from a website and save it to a jpg file.  So that got me to thinking there has to be a way to do this in Script.  So this article is about how I did just that.

First thing is I needed to find an easy way to bring in a webpage / html into memory for conversion to a Jpg.  After doing much searching I found this nice handy dandy module called NReco. Now that I have a Dll that I can import I can add this to my PowerShell script by doing an add-type:

Add-Type -Path ".\nreco\NReco.ImageGenerator.dll"

I chose for simplicity to put the dll in a sub folder where my script resides.  Now that I have the dll imported now on to seeing what the DLL can do for me.  According to the article the dll will convert an html to a jpg in one line of code.  So what  I chose to do is take advantage of the Invoke-WebRequest and just point it to www.powershell.org to see if it’d save a page for it.

$html = invoke-webrequest -uri 'https://powershell.org/forums/'
$h2image = new-object NReco.ImageGenerator.HtmlToImageConverter
$imageFormat = [NReco.ImageGenerator.ImageFormat]::Jpeg
$jpeg = $h2image.generateImage($html, $imageformat)
$dataStream = New-Object System.IO.MemoryStream(,$jpeg)
$img = [System.Drawing.Image]::FromStream($dataStream)
$img.save('c:\temp\image.jpg')

So the $h2image this is an object of the dll we pulled in which allows us to convert the webpage to a Jpg. Depending on the size of the page it may take a little while for this function to return.

$h2image = new-object NReco.ImageGenerator.HtmlToImageConverter

The next line of code the image format this tells the Dll what type of file we want to save it to. Through intellisense in the ISE you’ll notice there are 3 types included in this Enumeration.

nreco
For what I needed I chose JPG.

Now that I have the type of file and the type added I can now stream this webpage into memory:

$dataStream = New-Object System.IO.MemoryStream(,$jpeg)

This one took me a while to figure out if it hadn’t been for this article I may have never figured it out: http://piers7.blogspot.com/2010/03/3-powershell-array-gotchas.html

solution for getting the array to be streamed is in this tidbit:

Cup(Of T): 3 PowerShell Array Gotchas

The (somewhat counter-intuitive) solution here is to wrap the array – in an array. This is easily done using the ‘comma’ syntax (a comma before any variable creates a length-1 array containing the variable):

PS > $bytes = 0x1,0x2,0x3,0x4,0x5,0x6
PS > $stream = new-object System.IO.MemoryStream (,$bytes)
PS > $stream.length
6

Now that I have the html in a streamed variable I can now write this to a file using another dot net Class System.drawing.image 

$img = [System.Drawing.Image]::FromStream($dataStream)
$img.save('c:\temp\image.jpg')

And walla my web page is saved as a JPG.

image2

Full script:

Add-Type -Path ".\nreco\NReco.ImageGenerator.dll" 
$html = invoke-webrequest -uri 'https://powershell.org/forums/'
$h2image = new-object NReco.ImageGenerator.HtmlToImageConverter
$imageFormat = [NReco.ImageGenerator.ImageFormat]::Jpeg
$jpeg = $h2image.generateImage($html, $imageformat)
$dataStream = New-Object System.IO.MemoryStream(,$jpeg)
$img = [System.Drawing.Image]::FromStream($dataStream)
$img.save('c:\temp\image.jpg')

PowerShell Posse // Thom Schumacher – PowerShellPosse / DevOps

I hope this helps someone ..

Until then keep scripting

Thom

 

4 thoughts on “Capture Web Page / HTML to JPG

  1. Mark

    Thanks! How do you install the NReco Module? I searched PowerShell with Get-PackageSource but could not find it. I’m assuming there must be another way.

    Like

  2. Shoshana Tzi

    Thanks for your post, I failed in the row:
    $jpeg = $h2image.generateImage($html, $imageformat)
    with error that wkhtmltoimage is needed (in path: C:\Windows\System32\WindowsPowerShell\v1.0\wkhtmltoimage)
    How can I install this?

    Like

Leave a comment