PandasAPI: Streamlined ETL and Data Science Workflows

github repo

A collection of common functions designed to streamline ETL, ELT, and data science workflows using pandas. This library provides utility methods for loading, transforming, and writing pandas DataFrames.

Features

  • Load CSV files into pandas DataFrames with a wrapper function.
  • Create DataFrames from lists, series, or dictionaries.
  • Generate empty DataFrames.
  • Convert DataFrames to dictionaries with multiple orientation options.
  • Write DataFrames to CSV files with customizable parameters.

Tech Stack

  • Python 3
  • pandas

Getting Started

Prerequisites

  • Python 3.x installed
  • pandas library installed (pip install pandas)

Installation

Clone the repository:

git clone https://github.com/justin-napolitano/PandasAPI.git
cd PandasAPI

Usage

Import the PandasFunctions class from PandasFunctions.py and use its static methods:

from PandasFunctions import PandasFunctions

# Load CSV
df = PandasFunctions.Load.csv_to_df('file.csv')

# Create DataFrame
new_df = PandasFunctions.Load.create_df([{'col1': 1, 'col2': 2}])

# Transform DataFrame
records = PandasFunctions.Transform.df_to_record_dict(new_df, orient='records')

# Write DataFrame
PandasFunctions.Write.data_frame_to_csv(new_df, 'output.csv')

Project Structure

PandasAPI/
├── LICENSE
└── PandasFunctions.py   # Core utility functions for pandas operations

Future Work / Roadmap

  • Expand support for additional file formats (e.g., JSON, Excel).
  • Add more transformation utilities for common data cleaning tasks.
  • Enhance error handling and logging.
  • Provide parameter flexibility for write operations.
  • Include unit tests and example notebooks.
hjkl / arrows · / search · :family · :tag · :datefrom · :dateto · ~/entries/slug · Ctrl+N/Ctrl+P for suggestions · Ctrl+C/Ctrl+G to cancel
entries 201/201 · entry -/-
:readyentries 201/201 · entry -/-