Automate Image Downloading Using Google Apps Script
Previous posts on the topic of automation using using Google Apps Script: "Scripting the Talmud: Automated Talmudic Text Extraction and Formatting"; "Scripting the Talmud Part#2: Automated Rashi Text Extraction and digital layout of tzurat daf". See also: "Data Transformation with the Amazing New ChatGpt4 ‘Code Interpreter’, with Google Colab"
Google Apps Script offers a powerful way to enhance the functionalities of Google Sheets. It’s a powerful and flexible tool that can be used for a wide variety of operations, including web scraping and batch downloading. With the method detailed below, you can easily download (=scrape) a large number of images from URLs listed in a Google Sheet into a downloadable zip file.
To put it another way: Manually downloading a large number of image URLs is impractical. Google Sheets doesn't natively support advanced operations like web scraping or batch downloads. This is where Google Apps Script comes into play.
The script performs the following actions:
Initialize Spreadsheet and Data Retrieval: Accesses the active sheet of the current spreadsheet and identifies the last row with data.
Data Extraction: Extracts a range of data from three columns (in my case, Q to S), starting from row 2 to the last row with data, for processing.
Folder Creation in Drive: Generates a new folder in the root directory of Google Drive, naming it "Image request YYYY-MM-DD" based on the current date.
Data Preparation for Fetching: Filters out any rows from the extracted data that do not contain a URL in the first column (column Q) or if the URL is not a string, preparing them for asynchronous fetching.
Concurrent URL Fetching: Utilizes Google Apps Script's UrlFetchApp.fetchAll method to concurrently fetch all URLs provided in the data, with HTTP exceptions muted.
Blob Renaming and Organization: Processes the fetched URLs' responses by renaming the blobs based on corresponding data from the second and third columns (columns R and S), indicating original and new names, respectively.
Zip File Creation and Saving: Creates a zip file containing all the renamed blobs and saves this zip file in the newly created folder within Google Drive.
Here's the complete script to fetch, rename, and save image files:
function downloadImagesToDrive() {
var sheet = SpreadsheetApp.getActiveSpreadsheet().getActiveSheet();
var lastRow = sheet.getLastRow();
// Fetch URLs from Q, and rename data based on columns R and S
var data = sheet.getRange("Q2:S" + lastRow).getValues();
// Create a folder named "Image request YYYY-MM-DD" in the root directory of Google Drive.
var today = new Date();
var folderName = "Image request " + Utilities.formatDate(today, 'GMT', 'yyyy-MM-dd');
var folder = DriveApp.getRootFolder().createFolder(folderName);
// Filter out rows with empty URLs and prepare for asynchronous fetching.
var requests = data
.map((row, rowIndex) => {
if (row[0] && typeof row[0] === 'string') {
return {
url: row[0].trim(),
method: 'GET',
muteHttpExceptions: true, // To prevent exceptions for failed HTTP operations
rowIndex: rowIndex // Store the original rowIndex
};
}
return null;
})
.filter(request => request); // Remove null values
// Fetch all URLs concurrently.
var responses = UrlFetchApp.fetchAll(requests);
// Rename blobs based on columns R and S, then add to zip
var renamedBlobs = responses.map((response, index) => {
var originalRowIndex = requests[index].rowIndex;
var originalName = data[originalRowIndex][1]; // From column R
var newName = data[originalRowIndex][2]; // From column S
if (originalName && newName) {
var blob = response.getBlob();
blob.setName(newName);
return blob;
}
return null; // For rows where renaming data is not available
}).filter(blob => blob); // Remove null values
var zip = Utilities.zip(renamedBlobs, folderName + '.zip');
folder.createFile(zip);
}
The script completes in less than 5 seconds, for a batch of ~50 image URLs.