Automate Image Downloads with Google Image Crawler

github repo

Google-Image-Crawler is a Python-based tool designed to automate the process of downloading images from Google Image Search results. It leverages Selenium and BeautifulSoup to scrape images based on user-defined keywords and save them locally.

Features

  • Automated image scraping from Google Image Search
  • Supports keyword-based image queries
  • Configurable download limits and output directories
  • Uses Selenium WebDriver for dynamic page interaction
  • Supports batch processing via JSON configuration

Tech Stack

  • Python 3
  • Selenium WebDriver
  • BeautifulSoup4
  • urllib3

Getting Started

Prerequisites

  • Python 3.x installed
  • Google Chrome browser installed
  • ChromeDriver executable compatible with your Chrome version (download from https://sites.google.com/chromium.org/driver/)

Installation

  1. Clone the repository:
git clone https://github.com/justin-napolitano/Google-Image-Crawler.git
cd Google-Image-Crawler
  1. Install required Python packages:
pip install -r requirements.txt

Note: If requirements.txt is not present, install dependencies manually:

pip install selenium beautifulsoup4 urllib3
  1. Place the chromedriver.exe in the project directory or update the path in config.json.

Usage

  • Modify config.json to set keywords, download limits, ChromeDriver path, and output directories.
  • Run the crawler script (e.g., seleniumCrawler.py or googlecrawler.py) to start downloading images.

Example running seleniumCrawler.py:

python seleniumCrawler.py

Project Structure

Google-Image-Crawler/
├── chromedriver.exe          # ChromeDriver executable for Selenium
├── config.json              # Configuration file with keywords and settings
├── googlecrawler.py         # Script using BeautifulSoup for scraping
├── seleniumCrawler.py       # Script using Selenium WebDriver for scraping
├── seleniumCrawler.txt      # Possibly notes or logs related to Selenium crawler
├── README.md                # This file
  • seleniumCrawler.py: Uses Selenium to automate Chrome, scroll through image results, and download images.
  • googlecrawler.py: Uses requests and BeautifulSoup to scrape image URLs and download images.
  • config.json: Contains multiple records for batch image crawling with different keywords and settings.

Future Work / Roadmap

  • Improve error handling and logging for robustness.
  • Add command-line interface to specify keywords and settings dynamically.
  • Support more flexible output naming conventions.
  • Implement parallel downloads to improve speed.
  • Add support for other search engines or image sources.
  • Package as a Python module for easier integration.
  • Update scraping logic to handle changes in Google Image Search markup.

Assumptions: The project requires manual setup of ChromeDriver and Python dependencies. The README assumes basic familiarity with Python and Selenium.

hjkl / arrows · / search · :family · :tag · :datefrom · :dateto · ~/entries/slug · Ctrl+N/Ctrl+P for suggestions · Ctrl+C/Ctrl+G to cancel
entries 201/201 · entry -/-
:readyentries 201/201 · entry -/-