GitHubFileDownloader

class pyhelpers.ops.GitHubFileDownloader(repo_url, flatten_files=False, output_dir=None)[source]

Download files from GitHub repositories.

This class facilitates downloading files from a specified GitHub repository URL.

Parameters:
  • repo_url (str) – URL of the GitHub repository to download from; it can be a path to a specific blob or tree location.

  • flatten_files (bool) – Whether to flatten the directory structure by pulling all files into the root folder; defaults to False.

  • output_dir (str | None) – Output directory where downloaded files will be saved; defaults to None, meaning files will be saved in the current directory.

Variables:
  • repo_url (str) – URL of the GitHub repository.

  • flatten_files (bool) – Whether to flatten the directory structure (i.e. pull the contents of all subdirectories into the root folder); defaults to False.

  • output_dir (str | None) – Output directory path; defaults to None.

  • api_url (str) – URL of the GitHub repository compatible with GitHub’s REST API.

  • download_path (str) – Pathname for downloading files.

  • total_files (int) – Total number of files under the given directory.

Examples:

>>> from pyhelpers.ops import GitHubFileDownloader
>>> output_dir = "tests/temp"
>>> # Download a single file
>>> repo_url_ = 'https://github.com/mikeqfu/pyhelpers'
>>> repo_url = f'{repo_url_}/blob/master/tests/data/dat.csv'
>>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir)
>>> downloader.download()
Downloaded to: tests/temp/tests/data/dat.csv
1
>>> # Download a directory
>>> repo_url = f"{repo_url_}/blob/master/tests/data"
>>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir)
>>> downloader.download()
Downloaded to: tests/temp/tests/data/csr_mat.npz
Downloaded to: tests/temp/tests/data/dat.csv
Downloaded to: tests/temp/tests/data/dat.feather
Downloaded to: tests/temp/tests/data/dat.joblib
Downloaded to: tests/temp/tests/data/dat.json
Downloaded to: tests/temp/tests/data/dat.ods
Downloaded to: tests/temp/tests/data/dat.pickle
Downloaded to: tests/temp/tests/data/dat.txt
Downloaded to: tests/temp/tests/data/dat.xlsx
Downloaded to: tests/temp/tests/data/zipped.7z
Downloaded to: tests/temp/tests/data/zipped.txt
Downloaded to: tests/temp/tests/data/zipped.zip
Downloaded to: tests/temp/tests/data/zipped/zipped.txt
13
>>> downloader = GitHubFileDownloader(
...     repo_url, flatten_files=True, output_dir=output_dir)
>>> downloader.download()
Downloaded to: tests/temp/csr_mat.npz
Downloaded to: tests/temp/dat.csv
Downloaded to: tests/temp/dat.feather
Downloaded to: tests/temp/dat.joblib
Downloaded to: tests/temp/dat.json
Downloaded to: tests/temp/dat.ods
Downloaded to: tests/temp/dat.pickle
Downloaded to: tests/temp/dat.txt
Downloaded to: tests/temp/dat.xlsx
Downloaded to: tests/temp/zipped.7z
Downloaded to: tests/temp/zipped.txt
Downloaded to: tests/temp/zipped.zip
Downloaded to: tests/temp/zipped.txt
13
>>> import shutil
>>> shutil.rmtree(output_dir)

Methods

create_url(url)

Create a URL compatible with GitHub's REST API from the given URL.

download([api_url])

Download files from the specified GitHub api_url.

download_single_file(file_url, dir_out)

Download a single file from the specified file_url to the dir_out directory.