xlsx_to_csv

pyhelpers.store.xlsx_to_csv(path_to_xlsx, path_to_csv=None, engine=None, if_exists='replace', vbscript=None, sheet_name='1', ret_null=False, verbose=False, **kwargs)[source]

Convert Microsoft Excel spreadsheet (in the format .xlsx/.xls) to a CSV file.

See also [STORE-XTC-1].

Parameters:
  • path_to_xlsx (str | os.PathLike) – pathname of an Excel spreadsheet (in the format of .xlsx)

  • path_to_csv (str | os.PathLike | None) – pathname of a CSV format file; when csv_pathname=None (default), the target CSV file is generated as a tempfile.NamedTemporaryFile; when csv_pathname="", the target CSV file is generated at the same directory where the source Excel spreadsheet is; otherwise, it could also be a specific pathname

  • engine (str | None) – engine used for converting .xlsx/.xls to .csv; when engine=None (default), a Microsoft VBScript (Visual Basic Script) is used; when engine='xlsx2csv', the function would rely on xlsx2csv

  • if_exists (str) – how to proceed if the target csv_pathname exists, defaults to 'replace'

  • vbscript (str | None) – pathname of a VB script used for converting .xlsx/.xls to .csv, defaults to None

  • sheet_name (str) – name of the target worksheet in the given Excel file, defaults to '1'

  • ret_null – whether to return something depending on the specified engine, defaults to False

  • verbose (bool | int) – whether to print relevant information in console, defaults to False

  • kwargs – [optional] parameters of the function subprocess.run

Returns:

the pathname of the generated CSV file or None, when engine=None; io.StringIO buffer, when engine='xlsx2csv'

Return type:

str | _io.StringIO | None

Examples:

>>> from pyhelpers.store import xlsx_to_csv, load_csv
>>> from pyhelpers.dirs import cd
>>> import os

>>> path_to_test_xlsx = cd("tests\data", "dat.xlsx")

>>> path_to_temp_csv = xlsx_to_csv(path_to_test_xlsx, verbose=True)
Converting "tests\data\dat.xlsx" to a (temporary) CSV file ... Done.
>>> os.path.isfile(path_to_temp_csv)
True
>>> data = load_csv(path_to_temp_csv, index=0)
>>> data
             Longitude    Latitude
City
London      -0.1276474  51.5073219
Birmingham  -1.9026911  52.4796992
Manchester  -2.2451148  53.4794892
Leeds       -1.5437941  53.7974185

>>> # Set `engine='xlsx2csv'`
>>> temp_csv_buffer = xlsx_to_csv(path_to_test_xlsx, engine='xlsx2csv', verbose=True)
Converting "tests\data\dat.xlsx" to a (temporary) CSV file ... Done.
>>> # import pandas as pd; data_ = pandas.read_csv(io_buffer, index_col=0)
>>> data_ = load_csv(temp_csv_buffer, index=0)
>>> data_
            Longitude   Latitude
City
London      -0.127647  51.507322
Birmingham  -1.902691  52.479699
Manchester  -2.245115  53.479489
Leeds       -1.543794  53.797418

>>> data.astype('float16').equals(data_.astype('float16'))
True

>>> # Remove the temporary CSV file
>>> os.remove(path_to_temp_csv)