GitHubFileDownloader¶
- class pyhelpers.ops.GitHubFileDownloader(repo_url, flatten_files=False, output_dir=None)[source]¶
Download files from GitHub repositories.
This class facilitates downloading files from a specified GitHub repository URL.
- Parameters:
repo_url (str) – URL of the GitHub repository to download from; it can be a path to a specific blob or tree location.
flatten_files (bool) – Whether to flatten the directory structure by pulling all files into the root folder; defaults to
False
.output_dir (str | None) – Output directory where downloaded files will be saved; defaults to
None
, meaning files will be saved in the current directory.
- Variables:
repo_url (str) – URL of the GitHub repository.
flatten_files (bool) – Whether to flatten the directory structure (i.e. pull the contents of all subdirectories into the root folder); defaults to
False
.output_dir (str | None) – Output directory path; defaults to
None
.api_url (str) – URL of the GitHub repository compatible with GitHub’s REST API.
download_path (str) – Pathname for downloading files.
total_files (int) – Total number of files under the given directory.
Examples:
>>> from pyhelpers.ops import GitHubFileDownloader >>> output_dir = "tests/temp" >>> # Download a single file >>> repo_url_ = 'https://github.com/mikeqfu/pyhelpers' >>> repo_url = f'{repo_url_}/blob/master/tests/data/dat.csv' >>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir) >>> downloader.download() Downloaded to: tests/temp/tests/data/dat.csv 1 >>> # Download a directory >>> repo_url = f"{repo_url_}/blob/master/tests/data" >>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir) >>> downloader.download() Downloaded to: tests/temp/tests/data/csr_mat.npz Downloaded to: tests/temp/tests/data/dat.csv Downloaded to: tests/temp/tests/data/dat.feather Downloaded to: tests/temp/tests/data/dat.joblib Downloaded to: tests/temp/tests/data/dat.json Downloaded to: tests/temp/tests/data/dat.ods Downloaded to: tests/temp/tests/data/dat.pickle Downloaded to: tests/temp/tests/data/dat.txt Downloaded to: tests/temp/tests/data/dat.xlsx Downloaded to: tests/temp/tests/data/zipped.7z Downloaded to: tests/temp/tests/data/zipped.txt Downloaded to: tests/temp/tests/data/zipped.zip Downloaded to: tests/temp/tests/data/zipped/zipped.txt 13 >>> downloader = GitHubFileDownloader( ... repo_url, flatten_files=True, output_dir=output_dir) >>> downloader.download() Downloaded to: tests/temp/csr_mat.npz Downloaded to: tests/temp/dat.csv Downloaded to: tests/temp/dat.feather Downloaded to: tests/temp/dat.joblib Downloaded to: tests/temp/dat.json Downloaded to: tests/temp/dat.ods Downloaded to: tests/temp/dat.pickle Downloaded to: tests/temp/dat.txt Downloaded to: tests/temp/dat.xlsx Downloaded to: tests/temp/zipped.7z Downloaded to: tests/temp/zipped.txt Downloaded to: tests/temp/zipped.zip Downloaded to: tests/temp/zipped.txt 13 >>> import shutil >>> shutil.rmtree(output_dir)
Methods
create_url
(url)Create a URL compatible with GitHub's REST API from the given URL.
download
([api_url])Download files from the specified GitHub
api_url
.download_single_file
(file_url, dir_out)Download a single file from the specified
file_url
to thedir_out
directory.