iblutil.io.binary

Functions

convert_to_parquet

Convert a binary file to a Parquet file using a specified NumPy structured data type.

load_as_dataframe

Load a binary file into a pandas DataFrame using a specified NumPy structured data type.

write_array

Write a structured NumPy array to a binary file.

load_as_dataframe(filepath_bin: PathLike | str, dtype: dtype, count: int = -1, offset: int = 0) DataFrame[source]

Load a binary file into a pandas DataFrame using a specified NumPy structured data type.

Parameters:
  • filepath_bin (Path or str) – The path to the binary file to be loaded. Can be a string or a Path object.

  • dtype (np.dtype) – A NumPy structured data type that defines the format of the data in the binary file. Must be a structured datatype with fields.

  • count (int, optional) – The number of items to read from the binary file. Default is -1, which means all items.

  • offset (int, optional) – The number of bytes to skip at the beginning of the file before reading data. Default is 0.

Returns:

A pandas DataFrame containing the data read from the binary file.

Return type:

pd.DataFrame

Raises:
  • FileNotFoundError – If the specified binary file does not exist.

  • IsADirectoryError – If the specified path is a directory instead of a file.

  • ValueError – If the provided dtype is not a NumPy structured datatype.

convert_to_parquet(filepath_bin: PathLike | str, dtype: dtype, delete_bin_file: bool = False) Path[source]

Convert a binary file to a Parquet file using a specified NumPy structured data type.

Parameters:
  • filepath_bin (Path or str) – The path to the binary file to be converted. Can be a string or a Path object.

  • dtype (np.dtype) – A NumPy structured data type that defines the format of the data in the binary file. Must be a structured datatype with fields.

  • delete_bin_file (bool, optional) – If True, the original binary file will be deleted after conversion. Default is False.

Returns:

The path to the newly created Parquet file. The new filename will be constructed from the original filename and a ‘.pqt’ suffix.

Return type:

Path

Raises:
  • FileNotFoundError – If the specified binary file does not exist.

  • FileExistsError – If the output file already exists.

  • IsADirectoryError – If the specified path is a directory instead of a file.

  • ValueError – If the provided dtype is not a NumPy structured datatype.

write_array(fid: BinaryIO | str | PathLike, array: Buffer | _SupportsArray[dtype[Any]] | _NestedSequence[_SupportsArray[dtype[Any]]] | bool | int | float | complex | str | bytes | _NestedSequence[bool | int | float | complex | str | bytes], dtype: dtype)[source]

Write a structured NumPy array to a binary file.

Parameters:
  • fid (bytes, str, IO) – The file path or file-like object where the structured array will be written.

  • array (npt.ArrayLike) – The input array to be written. It must have a maximum of two dimensions, and the last dimension must match the number of fields in the provided dtype.

  • dtype (np.dtype) – A structured NumPy datatype that defines the fields of the array. It must be a valid structured dtype with fields.

Raises:
  • ValueError – If dtype is not a structured NumPy datatype. If the input array has more than two dimensions. If the last dimension of array does not match the number of fields in dtype.

  • FileExistsError – If fid represents a Path and the respective file already exists.

  • TypeError – If fid is not a stream and cannot be converted to a Path.