Skip to content

PySparky - PySpark Helper

Welcome to PySparky, a set of helper functions designed to simplify your work with PySpark. This library provides utilities to make data transformation and analysis with PySpark easier and more efficient. - GitHub

Introduction

PySparky is a collection of utility functions aimed at streamlining PySpark operations. Whether you're dealing with data transformation, cleaning, or analysis, PySparky offers helper functions that save you time and effort.

It is designed to replicate the structure of PySpark, making it highly accessible for users.

  • The functions folder contains all PySpark functions, where both the input and output are Columns.
  • The Spark_ext houses functions that necessitate a Spark instance, such as creating a DataFrame.
  • The transformation_ext includes functions associated with DataFrame transformations, where both the input and output are DataFrames

Features

  • Easy Installation: Quickly integrate PySparky into your PySpark projects.
  • Utility Functions: A wide range of helper functions for common PySpark tasks.
  • Well-Documented: Clear and comprehensive documentation for all functions.

Installation

To install PySparky, simply download the whl from the repository:

pip install pysparky-x.y.z.whl
# or
pip install git+https://github.com/PySparky/pysparky-pyspark-helper.git

# remove
pip uninstall pysparky

Check example for the usage

PySparky Example