iblutil.io.binary
Functions
Convert a binary file to a Parquet file using a specified NumPy structured data type. |
|
Load a binary file into a pandas DataFrame using a specified NumPy structured data type. |
|
Write a structured NumPy array to a binary file. |
- load_as_dataframe(filepath_bin: PathLike | str, dtype: dtype, count: int = -1, offset: int = 0) DataFrame [source]
Load a binary file into a pandas DataFrame using a specified NumPy structured data type.
- Parameters:
filepath_bin (Path or str) – The path to the binary file to be loaded. Can be a string or a Path object.
dtype (np.dtype) – A NumPy structured data type that defines the format of the data in the binary file. Must be a structured datatype with fields.
count (int, optional) – The number of items to read from the binary file. Default is -1, which means all items.
offset (int, optional) – The number of bytes to skip at the beginning of the file before reading data. Default is 0.
- Returns:
A pandas DataFrame containing the data read from the binary file.
- Return type:
pd.DataFrame
- Raises:
FileNotFoundError – If the specified binary file does not exist.
IsADirectoryError – If the specified path is a directory instead of a file.
ValueError – If the provided dtype is not a NumPy structured datatype.
- convert_to_parquet(filepath_bin: PathLike | str, dtype: dtype, delete_bin_file: bool = False) Path [source]
Convert a binary file to a Parquet file using a specified NumPy structured data type.
- Parameters:
filepath_bin (Path or str) – The path to the binary file to be converted. Can be a string or a Path object.
dtype (np.dtype) – A NumPy structured data type that defines the format of the data in the binary file. Must be a structured datatype with fields.
delete_bin_file (bool, optional) – If True, the original binary file will be deleted after conversion. Default is False.
- Returns:
The path to the newly created Parquet file. The new filename will be constructed from the original filename and a ‘.pqt’ suffix.
- Return type:
Path
- Raises:
FileNotFoundError – If the specified binary file does not exist.
FileExistsError – If the output file already exists.
IsADirectoryError – If the specified path is a directory instead of a file.
ValueError – If the provided dtype is not a NumPy structured datatype.
- write_array(fid: BinaryIO | str | PathLike, array: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], dtype: dtype)[source]
Write a structured NumPy array to a binary file.
- Parameters:
fid (bytes, str, IO) – The file path or file-like object where the structured array will be written.
array (npt.ArrayLike) – The input array to be written. It must have a maximum of two dimensions, and the last dimension must match the number of fields in the provided dtype.
dtype (np.dtype) – A structured NumPy datatype that defines the fields of the array. It must be a valid structured dtype with fields.
- Raises:
ValueError – If dtype is not a structured NumPy datatype. If the input array has more than two dimensions. If the last dimension of array does not match the number of fields in dtype.
FileExistsError – If fid represents a Path and the respective file already exists.
TypeError – If fid is not a stream and cannot be converted to a Path.