iblutil.io.binary

Functions

`convert_to_parquet`	Convert a binary file to a Parquet file using a specified NumPy structured data type.
`load_as_dataframe`	Load a binary file into a pandas DataFrame using a specified NumPy structured data type.
`write_array`	Write a structured NumPy array to a binary file.

load_as_dataframe(filepath_bin: PathLike | str, dtype: dtype, count: int = -1, offset: int = 0) → DataFrame[source]

Load a binary file into a pandas DataFrame using a specified NumPy structured data type.

Parameters:

filepath_bin (Path or str) – The path to the binary file to be loaded. Can be a string or a Path object.
dtype (np.dtype) – A NumPy structured data type that defines the format of the data in the binary file. Must be a structured datatype with fields.
count (int, optional) – The number of items to read from the binary file. Default is -1, which means all items.
offset (int, optional) – The number of bytes to skip at the beginning of the file before reading data. Default is 0.

Returns:

A pandas DataFrame containing the data read from the binary file.

Return type:

pd.DataFrame

Raises:

convert_to_parquet(filepath_bin: PathLike | str, dtype: dtype, delete_bin_file: bool = False) → Path[source]

Convert a binary file to a Parquet file using a specified NumPy structured data type.

Parameters:

filepath_bin (Path or str) – The path to the binary file to be converted. Can be a string or a Path object.
dtype (np.dtype) – A NumPy structured data type that defines the format of the data in the binary file. Must be a structured datatype with fields.
delete_bin_file (bool, optional) – If True, the original binary file will be deleted after conversion. Default is False.

Returns:

The path to the newly created Parquet file. The new filename will be constructed from the original filename and a ‘.pqt’ suffix.

Return type:

Path

Raises:

Write a structured NumPy array to a binary file.

Parameters:

fid (bytes, str, IO) – The file path or file-like object where the structured array will be written.
array (npt.ArrayLike) – The input array to be written. It must have a maximum of two dimensions, and the last dimension must match the number of fields in the provided dtype.
dtype (np.dtype) – A structured NumPy datatype that defines the fields of the array. It must be a valid structured dtype with fields.

Raises:

ValueError – If dtype is not a structured NumPy datatype. If the input array has more than two dimensions. If the last dimension of array does not match the number of fields in dtype.
FileExistsError – If fid represents a Path and the respective file already exists.
TypeError – If fid is not a stream and cannot be converted to a Path.