Skip to content

conditions

cast_string_to_boolean(column_or_name)

Casts a column of string values to boolean values.

This function converts specific string representations of boolean values to their corresponding boolean types. The recognized string values for False are "False", "false", "F", "f", and "0". The recognized string values for True are "True", "true", "T", "t", and "1". Any other values will be converted to None.

Parameters:

Name Type Description Default
column Column

The input column containing string values to be cast.

required

Returns:

Name Type Description
Column Column

A column with boolean values where recognized strings are

Column

converted to their corresponding boolean values, and unrecognized

Column

strings are converted to None.

Source code in pysparky/functions/cast.py
def cast_string_to_boolean(column_or_name: ColumnOrName) -> Column:
    """
    Casts a column of string values to boolean values.

    This function converts specific string representations of boolean values
    to their corresponding boolean types. The recognized string values for
    `False` are "False", "false", "F", "f", and "0". The recognized string
    values for `True` are "True", "true", "T", "t", and "1". Any other values
    will be converted to None.

    Args:
        column (Column): The input column containing string values to be cast.

    Returns:
        Column: A column with boolean values where recognized strings are
        converted to their corresponding boolean values, and unrecognized
        strings are converted to None.
    """
    (column,) = ensure_column(column_or_name)

    false_string = ["False", "false", "F", "f", "0"]
    true_string = ["True", "true", "T", "t", "1"]

    return (
        F.when(column.isin(false_string), False)
        .when(column.isin(true_string), True)
        .otherwise(None)
    )

to_timestamps(column_or_name, formats)

Converts a column with date/time strings into a timestamp column by trying multiple formats.

This function iterates over a list of date/time formats and attempts to parse the input column using each format. The first format that successfully parses the value is used. If no format succeeds, the result for that row is NULL.

Parameters:

column_or_name : ColumnOrName The input Spark column containing date/time strings to be converted to timestamp format. or the column name

list[str]

A list of date/time format strings to try. Formats should follow the pattern conventions of java.text.SimpleDateFormat, such as "yyyy-MM-dd", "MM/dd/yyyy", etc.

Returns:

Column A Spark Column of type timestamp. If none of the formats match for a row, the value will be NULL.

Source code in pysparky/functions/cast.py
def to_timestamps(column_or_name: ColumnOrName, formats: list[str]) -> Column:
    """
    Converts a column with date/time strings into a timestamp column by trying multiple formats.

    This function iterates over a list of date/time formats and attempts to parse the input column
    using each format. The first format that successfully parses the value is used. If no format succeeds,
    the result for that row is `NULL`.

    Parameters:
    ----------
    column_or_name : ColumnOrName
        The input Spark column containing date/time strings to be converted to timestamp format.
        or the column name

    formats : list[str]
        A list of date/time format strings to try. Formats should follow the pattern
        conventions of `java.text.SimpleDateFormat`, such as "yyyy-MM-dd", "MM/dd/yyyy", etc.

    Returns:
    -------
    Column
        A Spark Column of type timestamp. If none of the formats match for a row, the value will be `NULL`.
    """
    (column,) = ensure_column(column_or_name)

    def reducer(acc, format):
        format_col = F.lit(format)
        return acc.when(
            # this will supress the error
            F.try_to_timestamp(column, format_col).isNotNull(),
            F.try_to_timestamp(column, format_col),
        )

    return reduce(reducer, formats, F).otherwise(
        # This follows spark.sql.ansi.enabled behavior
        F.to_timestamp(column)
    )