Utils
create_map_from_dict(dict_)
Generates a PySpark map column from a provided dictionary.
This function converts a dictionary into a PySpark map column, with each key-value pair represented as a literal in the map.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dict_
|
Dict[str, int]
|
A dictionary with string keys and integer values. |
required |
Returns:
Name | Type | Description |
---|---|---|
Column |
Column
|
A PySpark Column object representing the created map. |
Examples:
Source code in pysparky/utils.py
join_dataframes_on_column(column_name, *dataframes, how='outer')
Joins a list of DataFrames on a specified column using an outer join.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column_name
|
str
|
The column name to join on. |
required |
*dataframes
|
DataFrame
|
A list of DataFrames to join. |
()
|
how
|
str
|
The type of join to perform, passthrough to pyspark join (default is "outer"). |
'outer'
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
The resulting DataFrame after performing the outer joins. |
Source code in pysparky/utils.py
split_dataframe_by_column(sdf, split_column)
Splits a DataFrame into multiple DataFrames based on distinct values in a specified column.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sdf
|
DataFrame
|
The input Spark DataFrame. |
required |
column_name
|
str
|
The column name to split the DataFrame by. |
required |
Returns:
Type | Description |
---|---|
dict[str, DataFrame]
|
dict[str, DataFrame]: A dictionary where keys are distinct column values and values are DataFrames. |
Source code in pysparky/utils.py
union_dataframes(*dataframes)
Unions a list of DataFrames.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
*dataframes
|
DataFrame
|
A list of DataFrames to union. |
()
|
Returns:
Name | Type | Description |
---|---|---|
DataFrame |
DataFrame
|
The resulting DataFrame after performing the unions. |