Decorator
extension_enabler(cls)
This enable you to chain the class
Source code in pysparky/decorator.py
validate_columns(input_cols=None, required_cols=None, expected_cols=None, output_cols=None, added_cols=None, dropped_cols=None)
Decorator to validate the input and output columns of a PySpark DataFrame transformation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
input_cols
|
list[str]
|
The exact list of columns that must be present before transformation. |
None
|
required_cols
|
list[str]
|
The subset of columns that must be present before transformation. |
None
|
expected_cols
|
list[str]
|
The subset of columns that must be present after transformation. |
None
|
output_cols
|
list[str]
|
The exact list of columns that must be present after transformation. |
None
|
added_cols
|
list[str]
|
The exact list of newly added columns (output - input). |
None
|
dropped_cols
|
list[str]
|
The exact list of dropped columns (input - output). |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Callable |
Callable
|
The decorated function. |
Examples:
>>> from pyspark.sql import SparkSession
>>> spark = SparkSession.builder.getOrCreate()
>>> df = spark.createDataFrame([(1, 100)], ["user_id", "raw_revenue"])
>>> @validate_columns(required_cols=["user_id"], added_cols=["net_revenue"])
... def calculate_net_revenue(df: DataFrame) -> DataFrame:
... return df.withColumn("net_revenue", df["raw_revenue"] * 0.8)
>>> result = calculate_net_revenue(df)
>>> result.columns
['user_id', 'raw_revenue', 'net_revenue']