- Excel Version
- 365
- 2021
- 2019
- 2016
It is possible to take website snapshots quickly and easily with any chromium-based browser (this includes Chrome (obviously), MS Edge, Brave, and Opera) as they all include headless browser functionality - this means that you are able to use VBA give your browser instructions to generate screenshots (in PNG) and full-page PDF captures. Google set out how to go about it from the command line here, but I have cobbled together the following function to demonstrate how you could make use of this feature with VBA. The code below comprises four procedures:-
1. TestSnapshot - this is a basic subroutine demonstrating how to use the function. Please be sure to adjust the output path as necessary and change the URL to something you'd actually like to take a snapshot of;
2. GetSnapshot - this is the key procedure. It requires as arguments a target URL and an output path. By default, it will produce a PNG snapshot, but you can opt for a PDF capture instead. Also by default, it will use the Chrome browser;
3. DoubleQuote - a helper function to wrap a string in quotation marks, with the option of adding a trailing space; and
4. GetProgramLocation - returns to full location of the browser executable file.
all of which should be placed in a single module.
Note that it may take a few moments for a browser to generate the file.
Improvements
The above function is just a basic demonstration of how to execute the headless browser functionality - once you become more familiar with the code, there are a number of improvements you can make to it:
There are a multitude of other (and better) options, but this one doesn't require making API calls, or installing software or charge fees. Another option you tend to see on this forum is Selenium. On the one hand, it can be tricky to set up, it appears as though no-one is updating the software, and reviews of its user-friendliness are mixed; but on the other hand, it does provide greater control over the browser. If interested, here is a brief guide to installing and setting up Selenium on your computer, and here I reproduced the sample selenium script to generate a website snapshot.
As always, feedback and comments are encouraged and always appreciated. Thank you.
1. TestSnapshot - this is a basic subroutine demonstrating how to use the function. Please be sure to adjust the output path as necessary and change the URL to something you'd actually like to take a snapshot of;
2. GetSnapshot - this is the key procedure. It requires as arguments a target URL and an output path. By default, it will produce a PNG snapshot, but you can opt for a PDF capture instead. Also by default, it will use the Chrome browser;
3. DoubleQuote - a helper function to wrap a string in quotation marks, with the option of adding a trailing space; and
4. GetProgramLocation - returns to full location of the browser executable file.
all of which should be placed in a single module.
VBA Code:
Option Explicit
Enum SnapShotType
ScreenShotPNG
FullPagePDF
End Enum
Enum PreferredBrowser
MSEdge
Chrome
Brave
End Enum
#If VBA7 Then
Private Declare PtrSafe Function MakeSureDirectoryPathExists Lib "imagehlp.dll" (ByVal DirPath As String) As Long
#Else
Private Declare Function MakeSureDirectoryPathExists Lib "imagehlp.dll" (ByVal DirPath As String) As Long
#End If
Sub TestSnapshot()
Dim URL As String, OutputPath As String
URL = "https://news.microsoft.com/2001/04/11/farewell-clippy-whats-happening-to-the-infamous-office-assistant-in-office-xp/"
OutputPath = "D:\TEMP\"
Debug.Print GetSnapShot(URL, OutputPath, ScreenShotPNG, Chrome)
End Sub
Function GetSnapShot(ByVal TargetURL As String, _
ByVal OutputPath As String, _
Optional ByVal Snap As SnapShotType = ScreenShotPNG, _
Optional ByVal BrowserName As PreferredBrowser = Chrome)
Const ARGUMENT = " --headless --disable-gpu --blink-settings=scriptEnabled=true --window-size=768,1280 "
Dim BrowserPath As String, Browser As String, Filename As String, Extension As String, Ret As Long, PID As Long
Dim CommandLine As String, CLArguments As String, AdditionalParameter As String
Browser = Switch(BrowserName = MSEdge, "msedge.exe", BrowserName = Chrome, "chrome.exe", BrowserName = Brave, "brave.exe")
BrowserPath = DoubleQuote(GetProgramLocation(Browser))
If Len(Trim(BrowserPath)) <= 2 Then
MsgBox "Unable to locate the designated browser." & vbNewLine & "Exiting the procedure.", _
vbCritical Or vbOKOnly, "Cannot locate browser."
Exit Function
End If
' The MakeSureDirectoryPathExists API will check whether a given path exists, and if not, will create the necessary directories.
Ret = MakeSureDirectoryPathExists(OutputPath)
Extension = IIf(Snap = FullPagePDF, ".pdf", ".png")
AdditionalParameter = IIf(Snap = FullPagePDF, "--print-to-pdf=", " --screenshot=")
Filename = DoubleQuote(OutputPath & "SnapShot_" & Format(Now, "yyyymmdd-hhmmss") & Extension, True)
CLArguments = ARGUMENT & AdditionalParameter & Filename & TargetURL
CommandLine = BrowserPath & CLArguments
' Execute the instructions in a minimised window - other options include: vbHide, vbNormalFocus
Debug.Print "Executing: " & CommandLine
Shell CommandLine, vbMinimizedNoFocus
GetSnapShot = mid(Trim(Filename), 2, Len(Trim(Filename)) - 2)
End Function
Function DoubleQuote(Optional ByVal SourceText As String, Optional ByVal TrailingSpace As Boolean = False)
DoubleQuote = Chr(34) & SourceText & Chr(34) & IIf(TrailingSpace, Chr(32), vbNullString)
End Function
Function GetProgramLocation(ByVal ExeFilename As String)
' This function will check the registry for any registered applications with the given filename;
' it will return the full path if successful.
Const REGISTRYADDRESS = "HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\App Paths\"
On Error Resume Next
GetProgramLocation = CreateObject("WScript.Shell").RegRead(REGISTRYADDRESS & ExeFilename & "\")
End Function
Note that it may take a few moments for a browser to generate the file.
Improvements
The above function is just a basic demonstration of how to execute the headless browser functionality - once you become more familiar with the code, there are a number of improvements you can make to it:
- For starters, you likely won't need to query the registry to find out the full path of your preferred browser every time you run this, so you can perhaps replace this
GetProgramLocation
function with a hardcoded path of the executable file. - Once you learn to trust yourself to provide an output path that actually exists, you can likely stop relying on the API call too - personally, I don't trust myself to do this correctly, so I'll probably keep on using the API
- When I use this routine, it's during research projects, so I usually add the project name to any generated filenames in order to easily and quickly differentiate groups of files.
There are a multitude of other (and better) options, but this one doesn't require making API calls, or installing software or charge fees. Another option you tend to see on this forum is Selenium. On the one hand, it can be tricky to set up, it appears as though no-one is updating the software, and reviews of its user-friendliness are mixed; but on the other hand, it does provide greater control over the browser. If interested, here is a brief guide to installing and setting up Selenium on your computer, and here I reproduced the sample selenium script to generate a website snapshot.
As always, feedback and comments are encouraged and always appreciated. Thank you.