PySparky - PySpark Helper
Welcome to PySparky, a set of helper functions designed to simplify your work with PySpark. This library provides utilities to make data transformation and analysis with PySpark easier and more efficient. - GitHub
Introduction
PySparky is a collection of utility functions aimed at streamlining PySpark operations. Whether you're dealing with data transformation, cleaning, or analysis, PySparky offers helper functions that save you time and effort.
It is designed to replicate the structure of PySpark, making it highly accessible for users.
- The
functions
folder contains all PySpark functions, where both the input and output are Columns. - The
Spark_ext
houses functions that necessitate a Spark instance, such as creating a DataFrame. - The
transformation_ext
includes functions associated with DataFrame transformations, where both the input and output are DataFrames
Features
- Easy Installation: Quickly integrate PySparky into your PySpark projects.
- Utility Functions: A wide range of helper functions for common PySpark tasks.
- Well-Documented: Clear and comprehensive documentation for all functions.
Installation
To install PySparky, simply download the whl from the repository:
pip install pysparky-x.y.z.whl
# or
pip install git+https://github.com/PySparky/pysparky-pyspark-helper.git
# remove
pip uninstall pysparky