GitHubFileDownloader

class pyhelpers.ops.GitHubFileDownloader(repo_url, flatten_files=False, output_dir=None)[source]

Download files on GitHub from a given repository URL.

Parameters:
  • repo_url (str) – URL of a GitHub repository to download from; it can be a blob or tree path

  • flatten_files (bool) – whether to pull the contents of all subdirectories into the root folder, defaults to False

  • output_dir (str | None) – output directory where the downloaded files will be saved, when output_dir=None, it defaults to None

Variables:
  • repo_url (str) – URL of a GitHub repository to download from

  • flatten (bool) – whether to pull the contents of all subdirectories into the root folder, defaults to False

  • output_dir (str | None) – defaults to None

  • api_url (str) – URL of a GitHub repository (compatible with GitHub’s REST API)

  • download_path (str) – pathname for downloading files

  • total_files (int) – total number of files under the given directory

Examples:

>>> from pyhelpers.ops import GitHubFileDownloader

>>> test_output_dir = "tests/temp"

>>> # Download a single file
>>> test_url = "https://github.com/mikeqfu/pyhelpers/blob/master/tests/data/dat.csv"
>>> downloader = GitHubFileDownloader(repo_url=test_url, output_dir=test_output_dir)
>>> downloader.download()
Downloaded to: tests/temp/tests/data/dat.csv
1

>>> # Download a directory
>>> test_url = "https://github.com/mikeqfu/pyhelpers/blob/master/tests/data"
>>> downloader = GitHubFileDownloader(repo_url=test_url, output_dir=test_output_dir)
>>> downloader.download()
Downloaded to: tests/temp/tests/data/csr_mat.npz
Downloaded to: tests/temp/tests/data/dat.csv
Downloaded to: tests/temp/tests/data/dat.feather
Downloaded to: tests/temp/tests/data/dat.joblib
Downloaded to: tests/temp/tests/data/dat.json
Downloaded to: tests/temp/tests/data/dat.pickle
Downloaded to: tests/temp/tests/data/dat.txt
Downloaded to: tests/temp/tests/data/dat.xlsx
Downloaded to: tests/temp/tests/data/zipped.7z
Downloaded to: tests/temp/tests/data/zipped.txt
Downloaded to: tests/temp/tests/data/zipped.zip
Downloaded to: tests/temp/tests/data/zipped/zipped.txt
12

>>> downloader = GitHubFileDownloader(
...     repo_url=test_url, flatten_files=True, output_dir=test_output_dir)
>>> downloader.download()
Downloaded to: tests/temp/csr_mat.npz
Downloaded to: tests/temp/dat.csv
Downloaded to: tests/temp/dat.feather
Downloaded to: tests/temp/dat.joblib
Downloaded to: tests/temp/dat.json
Downloaded to: tests/temp/dat.pickle
Downloaded to: tests/temp/dat.txt
Downloaded to: tests/temp/dat.xlsx
Downloaded to: tests/temp/zipped.7z
Downloaded to: tests/temp/zipped.txt
Downloaded to: tests/temp/zipped.zip
Downloaded to: tests/temp/zipped.txt
12

Methods

create_url(url)

From the given url, produce a URL that is compatible with GitHub's REST API.

download([api_url])

Download a file or a directory for the given api_url.

download_single_file(file_url, dir_out)

Download a single file.