Data Analysis Projects with Google Cloud and Python

github repo

A collection of data analysis projects and technical interview solutions by Justin Napolitano, primarily using Jupyter Notebooks and Python scripts interfacing with Google Cloud services such as BigQuery and Bigtable.

Features

  • SQL queries and Python scripts for analyzing NYC taxi trip data using BigQuery.
  • Weather data collection and streaming solutions using APIs and Google Cloud Bigtable.
  • Jupyter Book structured technical interview answers and cost of living models.
  • Automated build pipeline for Jupyter Book documentation.

Tech Stack

  • Python 3
  • Jupyter Notebook
  • Google Cloud BigQuery and Bigtable
  • Java (sample Bigtable external query)
  • Jupyter Book for documentation

Getting Started

Prerequisites

  • Python 3 installed
  • Google Cloud SDK configured with appropriate credentials
  • pip for Python package management

Installation

  1. Clone the repository:
git clone https://github.com/justin-napolitano/pmc-submission.git
cd pmc-submission
  1. Install dependencies:
pip install -r requirements.txt
  1. Set Google Cloud credentials environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/creds.json"

Running

  • To run the main Python script:
python main.py
  • To run test queries against BigQuery, explore test.py.

  • To build the Jupyter Book documentation:

python python_build.py

Project Structure

pmc-submission/
β”œβ”€β”€ ch4-emissions/                 # Folder likely related to methane emissions analysis
β”œβ”€β”€ jupyter-book/                  # Jupyter Book source and build files
β”‚   β”œβ”€β”€ _config.yml                # Jupyter Book configuration
β”‚   β”œβ”€β”€ _toc.yml                   # Table of contents
β”‚   β”œβ”€β”€ notebooks/                 # Markdown and notebooks for interview and analysis
β”‚   β”œβ”€β”€ python_build.py            # Build automation for Jupyter Book
β”œβ”€β”€ login.py                      # Google Cloud Bigtable login helper
β”œβ”€β”€ main.py                       # Main entry point
β”œβ”€β”€ propensity_scoring/           # Folder likely containing propensity scoring analysis
β”œβ”€β”€ python_build.py               # Build automation for main project
β”œβ”€β”€ query.py                      # Java-like imports, possibly incomplete BigQuery client code
β”œβ”€β”€ query_gooogle.java            # Java sample for Bigtable external query
β”œβ”€β”€ query_gooogle.json            # Duplicate of java file, likely misplaced
β”œβ”€β”€ test.py                      # Python scripts with BigQuery SQL queries
└── documentation.ipynb           # Possibly project documentation notebook

Future Work / Roadmap

  • Complete and clean up Java and Python BigQuery client code.
  • Consolidate or remove duplicate/misplaced files like query_gooogle.json.
  • Expand automated testing and CI/CD for data pipelines.
  • Enhance documentation with more detailed usage examples.
  • Integrate data pipeline automation for weather and taxi data ingestion.
  • Improve error handling and logging in scripts.

Assumptions: The project is a personal portfolio of data analysis and technical interview work using Google Cloud services. Some files appear incomplete or duplicated, suggesting ongoing development.

hjkl / arrows Β· / search Β· :family Β· :tag Β· :datefrom Β· :dateto Β· ~/entries/slug Β· Ctrl+N/Ctrl+P for suggestions Β· Ctrl+C/Ctrl+G to cancel
entries 201/201 Β· entry -/-
:readyentries 201/201 Β· entry -/-