Skip to content

Schema ext

filter_columns_by_datatype(struct_type, data_type)

Filters and returns a StructType of StructField names from a given StructType schema that match the specified data type.

Parameters:

Name Type Description Default
struct_type StructType

The schema of the DataFrame.

required
data_type DataType

The data type to filter by.

required

Returns:

Type Description
StructType

T.StructType: A StructType of StructField names that match the specified data type.

Example
>>> schema = T.StructType([
...     T.StructField("id", T.IntegerType(), True),
...     T.StructField("name", T.StringType(), True),
...     T.StructField("age", T.IntegerType(), True)
... ])
>>> filtered_schema = filter_columns_by_datatype(schema, T.IntegerType())
>>> print(filtered_schema)
StructType([StructField('id', IntegerType(), True), StructField('age', IntegerType(), True)])
Source code in pysparky/schema_ext.py
@decorator.extension_enabler(T.StructType)
def filter_columns_by_datatype(
    struct_type: T.StructType, data_type: T.DataType
) -> T.StructType:
    """
    Filters and returns a StructType of StructField names from a given StructType schema
    that match the specified data type.

    Args:
        struct_type (T.StructType): The schema of the DataFrame.
        data_type (T.DataType): The data type to filter by.

    Returns:
        T.StructType: A StructType of StructField names that match the specified data type.

    Example:
        ```python
        >>> schema = T.StructType([
        ...     T.StructField("id", T.IntegerType(), True),
        ...     T.StructField("name", T.StringType(), True),
        ...     T.StructField("age", T.IntegerType(), True)
        ... ])
        >>> filtered_schema = filter_columns_by_datatype(schema, T.IntegerType())
        >>> print(filtered_schema)
        StructType([StructField('id', IntegerType(), True), StructField('age', IntegerType(), True)])
        ```
    """
    return T.StructType([field for field in struct_type if field.dataType == data_type])