Quality
expect_any_to_one(col1, col2)
A decorator function that ensures an N:1 relationship between col1 and col2, meaning each value in col1 corresponds to only one distinct value in col2.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col1
|
str | Sequence[str]
|
Name of the column or a tuple of column names. |
required |
col2
|
str | Sequence[str]
|
Name of the column or a tuple of column names. |
required |
Source code in pysparky/quality.py
expect_criteria(criteria)
A decorator function that ensures a specific criterion on a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
criteria
|
Column
|
The filter criterion to be applied to the DataFrame. |
required |
Returns:
Name | Type | Description |
---|---|---|
function |
A decorated function that checks the criterion. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the filtered count and unfiltered count of the DataFrame are not equal. |
Source code in pysparky/quality.py
expect_one_to_one(col1, col2)
A decorator function that ensures a 1:1 relationship between col1 and col2, meaning each value in col1 corresponds to only one distinct value in col2 and vice-versa.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col1
|
str | Sequence[str]
|
Name of the column or a tuple of column names. |
required |
col2
|
str | Sequence[str]
|
Name of the column or a tuple of column names. |
required |
Source code in pysparky/quality.py
expect_type(col_name, col_type)
A decorator function that verifies the data type of a specified column in a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_name
|
str
|
The column's name. |
required |
col_type
|
DataType
|
The expected data type for the column. |
required |
Returns:
Name | Type | Description |
---|---|---|
function |
A decorated function that checks the column's data type. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the column's data type does not match the expected type. |
Source code in pysparky/quality.py
expect_unique(col_name)
A decorator function that ensures the uniqueness of a column in a Spark DataFrame.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
col_name
|
str
|
The column's name. |
required |
Returns:
Name | Type | Description |
---|---|---|
function |
A decorated function that checks the column's uniqueness. |
Raises:
Type | Description |
---|---|
AssertionError
|
If the column's count and distinct count are not equal. |