Skip to content

Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!

License

Notifications You must be signed in to change notification settings

hellogan/google-images-download

 
 

Repository files navigation

Google Images Download

Python Script for 'searching' and 'downloading' hundreds of Google images to the local hard disk!

Contents

This is a command line python program to search keywords/key-phrases on Google Images and optionally download images to your computer. You can also invoke this script from another python file.

This is a small and ready-to-run program. No dependencies are required to be installed if you would only want to download up to 100 images per keyword. If you would want more than 100 images per keyword, then you would need to install Selenium library along with chromedriver. Detailed instructions in the troubleshooting section.

This program is compatible with both the versions of python - 2.x and 3.x (recommended). It is a download-and-run program with no changes to the file. You will just have to specify parameters through the command line.

You can use one of the below methods to download and use this repository.

Using pip

$ pip install google_images_download

Manually using CLI

$ git clone https://github.com/hardikvasa/google-images-download.git
$ cd google-images-download && sudo python setup.py install

Manually using UI

Go to the repo on github ==> Click on 'Clone or Download' ==> Click on 'Download ZIP' and save it on your local disk.

If installed via pip or using CLI, use the following command:

$ googleimagesdownload [Arguments...]

If downloaded via the UI, unzip the file downloaded, go to the 'google_images_download' directory and use one of the below commands:

$ python3 google_images_download.py [Arguments...]
OR
$ python google_images_download.py [Arguments...]

If you would want to use this library from another python file, you could use it as shown below:

from google_images_download import google_images_download

response = google_images_download.googleimagesdownload()
absolute_image_paths = response.download({<Arguments...>})

Note: If single_image or url parameter is not present, then keywords is a mandatory parameter. No other parameters are mandatory.

You can either pass the arguments directly from the command as in the examples below or you can pass it through a config file. Below is a sample of how a config file looks.

You can pass more than one record through a config file. The below sample consist of two set of records. The code will iterate through each of the record and download images based on arguments passed.

{
    "Records": [
        {
            "keywords": "apple",
            "limit": 5,
            "color": "green",
            "print_urls": true
        },
        {
            "keywords": "universe",
            "limit": 15,
            "size": "large",
            "print_urls": true
        }
    ]
}
  • If you are calling this library from another python file, below is the sample code
from google_images_download import google_images_download   #importing the library

response = google_images_download.googleimagesdownload()   #class instantiation

arguments = {"keywords":"Polar bears,baloons,Beaches","limit":20,"print_urls":True}   #creating list of arguments
paths = response.download(arguments)   #passing the arguments to the function
print(paths)   #printing absolute paths of the downloaded images
  • If you are passing arguments from a config file, simply pass the config_file argument with name of your JSON file
$ googleimagesdownload -cf example.json
  • Simple example of using keywords and limit arguments
$ googleimagesdownload --keywords "Polar bears, baloons, Beaches" --limit 20
  • Using Suffix Keywords allows you to specify words after the main keywords. For example if the keyword = car and suffix keyword = 'red,blue' then it will first search for car red and then car blue
$ googleimagesdownload --k "car" -sk 'red,blue,white' -l 10
  • To use the short hand command
$ googleimagesdownload -k "Polar bears, baloons, Beaches" -l 20
  • To download images with specific image extension/format
$ googleimagesdownload --keywords "logo" --format svg
  • To use color filters for the images
$ googleimagesdownload -k "playground" -l 20 -co red
  • To use non-English keywords for image search
$ googleimagesdownload -k "北极熊" -l 5
  • To download images from the google images link
$ googleimagesdownload -k "sample" -u <google images page URL>
  • To save images in specific main directory (instead of in 'downloads')
$ googleimagesdownload -k "boat" -o "boat_new"
  • To download one single image with the image URL
$ googleimagesdownload --keywords "baloons" --single_image <URL of the images>
  • To download images with size and type constrains
$ googleimagesdownload --keywords "baloons" --size medium --type animated
  • To download images with specific usage rights
$ googleimagesdownload --keywords "universe" --usage_rights labeled-for-reuse
  • To download images with specific color type
$ googleimagesdownload --keywords "flowers" --color_type black-and-white
  • To download images with specific aspect ratio
$ googleimagesdownload --keywords "universe" --aspect_ratio panoramic
  • To download images which are similar to the image in the image URL that you provided (Reverse Image search).
$ googleimagesdownload -si <image url> -l 10
  • To download images from specific website or domain name for a given keyword
$ googleimagesdownload --keywords "universe" --specific_site example.com

===> The images would be downloaded in their own sub-directories inside the main directory (either the one you provided or in 'downloads') in the same folder you are in.


#~~~# SSL Errors

If you do see SSL errors on Mac for Python 3, please go to Finder —> Applications —> Python 3 —> Click on the ‘Install Certificates.command’ and run the file.

#~~~# googleimagesdownload: command not found

While using the above commands, if you get Error: -bash: googleimagesdownload: command not found then you have to set the correct path variable.

To get the details of the repo, run the following command:

$ pip show -f google_images_download

you will get the result like this:

Location: /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages
Files:
  ../../../bin/googleimagesdownload

together they make: /Library/Frameworks/Python.framework/Versions/2.7/bin which you need add it to the path:

$ export PATH="/Library/Frameworks/Python.framework/Versions/2.7/bin"

#~~~# [Errno 13] Permission denied creating directory 'downloads'

When you run the command, it downloads the images in the current directory (the directory from where you are running the command). If you get permission denied error for creating the downloads directory, then move to a directory in which you have the write permission and then run the command again.

#~~~# Permission denied while installing the library

On MAC and Linux, when you get permission denied when installing the library using pip, try doing a user install.

$ pip install google_images_download --user

You can also run pip install as a superuser with sudo pip install google_images_download but it is not generally a good idea because it can cause issues with your system-level packages.

#~~~# Installing the chromedriver (with Selenium)

If you would want to download more than 100 images per keyword, then you will need to install 'selenium' library along with 'chromedriver' extension.

If you have pip-installed the library or had run the setup.py file, Selenium would have automatically installed on your machine. You will also need Chrome browser on your machine. For chromedriver:

Download the correct chromedriver based on your operating system.

On Windows or MAC if for some reason the chromedriver gives you trouble, download it under the current directory and run the command.

On windows however, the path to chromedriver has to be given in the following format:

C:\\complete\\path\\to\\chromedriver.exe

On Linux if you are having issues installing google chrome browser, refer to this CentOS or Amazon Linux Guide or Ubuntu Guide

For All the operating systems you will have to use '--chromedriver' or '-cd' argument to specify the path of chromedriver that you have downloaded in your machine.

If on any rare occasion the chromedriver does not work for you, try downgrading it to a lower version.

Below diagram represents the algorithm logic to download images.

Anyone is welcomed to contribute to this script. If you would like to make a change, open a pull request. For issues and discussion visit the Issue Tracker.

The aim of this repo is to keep it simple, stand-alone, backward compatible and 3rd party dependency proof.

This program lets you download tons of images from Google. Please do not download or use any image that violates its copyright terms. Google Images is a search engine that merely indexes images and allows you to find them. It does NOT produce its own images and, as such, it doesn't own copyright on any of them. The original creators of the images own the copyrights.

Images published in the United States are automatically copyrighted by their owners, even if they do not explicitly carry a copyright warning. You may not reproduce copyright images without their owner's permission, except in "fair use" cases, or you could risk running into lawyer's warnings, cease-and-desist letters, and copyright suits. Please be very careful before its usage!

About

Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%