general
chain(self, func, *args, **kwargs)
Applies a given function to the current Column and returns the result.
This method allows for chaining operations on a Column object by applying a custom function with additional arguments. It's particularly useful for creating complex transformations or applying user-defined functions to a Column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self
|
Column
|
The current Column object. |
required |
func
|
callable
|
The function to apply to the Column. |
required |
*args
|
Variable length argument list to pass to the function. |
()
|
|
**kwargs
|
Arbitrary keyword arguments to pass to the function. |
{}
|
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A new Column object resulting from applying the function. |
Examples:
>>> df = spark.createDataFrame([("hello",)], ["text"])
>>> def custom_upper(col):
... return F.upper(col)
>>> result = df.withColumn("upper_text", df.text.chain(custom_upper))
>>> result.show()
+-----+----------+
| text|upper_text|
+-----+----------+
|hello| HELLO|
+-----+----------+
>>> def add_prefix(col, prefix):
... return F.concat(F.lit(prefix), col)
>>> result = df.withColumn("prefixed_text", df.text.chain(add_prefix, prefix="Pre: "))
>>> result.show()
+-----+-------------+
| text|prefixed_text|
+-----+-------------+
|hello| Pre: hello|
+-----+-------------+
Note
The function passed to chain
should expect a Column as its first argument,
followed by any additional arguments specified in the chain
call.
Source code in pysparky/functions/general.py
get_value_from_map(column_or_name, dict_)
Retrieves a value from a map (dictionary) using a key derived from a specified column in a DataFrame.
This function creates a map from the provided dictionary and then looks up the value in the map corresponding to the key that matches the value in the specified column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_or_name
|
str
|
The name of the column in the DataFrame whose value will be used as the key to look up in the map. |
required |
dict_
|
dict
|
A dictionary where keys and values are the elements to be used in the map. |
required |
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A PySpark Column object representing the value retrieved from the map. |
Examples:
>>> map = {1: 'a', 2: 'b'}
>>> column_name = 'key_column'
>>> df = spark.createDataFrame([(1,), (2,)], ['key_column'])
>>> df.withColumn('value', get_value_from_map(map, column_name)).show()
+----------+-----+
|key_column|value|
+----------+-----+
| 1| a|
| 2| b|
+----------+-----+
Source code in pysparky/functions/general.py
lower_(col)
replace_strings_to_none(column_or_name, list_of_null_string, customize_output=None)
Replaces empty string values in a column with None.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_or_name
|
ColumnOrName
|
The column to check for empty string values. |
required |
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A Spark DataFrame column with the values replaced. |
Source code in pysparky/functions/general.py
single_space_and_trim(column_or_name)
Replaces multiple white spaces with a single space and trims the column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_or_name
|
Column
|
The column to be adjusted. |
required |
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A trimmed column with single spaces. |
Source code in pysparky/functions/general.py
startswiths(column_or_name, list_of_strings)
Creates a PySpark Column expression to check if the given column starts with any string in the list.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_or_name
|
ColumnOrName
|
The column to check. |
required |
list_of_strings
|
List[str]
|
A list of strings to check if the column starts with. |
required |
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A PySpark Column expression that evaluates to True if the column starts with any string in the list, otherwise False. |
Source code in pysparky/functions/general.py
when_mapping(column_or_name, dict_)
Applies a series of conditional mappings to a PySpark Column based on a dictionary of conditions and values.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
Column
|
The PySpark Column to which the conditional mappings will be applied. |
required |
dict_
|
Dict
|
A dictionary where keys are the conditions and values are the corresponding results. |
required |
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A new PySpark Column with the conditional mappings applied. |