A collection of data analysis projects and technical interview solutions by Justin Napolitano, primarily using Jupyter Notebooks and Python scripts interfacing with Google Cloud services such as BigQuery and Bigtable.
Features
- SQL queries and Python scripts for analyzing NYC taxi trip data using BigQuery.
- Weather data collection and streaming solutions using APIs and Google Cloud Bigtable.
- Jupyter Book structured technical interview answers and cost of living models.
- Automated build pipeline for Jupyter Book documentation.
Tech Stack
- Python 3
- Jupyter Notebook
- Google Cloud BigQuery and Bigtable
- Java (sample Bigtable external query)
- Jupyter Book for documentation
Getting Started
Prerequisites
- Python 3 installed
- Google Cloud SDK configured with appropriate credentials
pipfor Python package management
Installation
- Clone the repository:
git clone https://github.com/justin-napolitano/pmc-submission.git
cd pmc-submission
- Install dependencies:
pip install -r requirements.txt
- Set Google Cloud credentials environment variable:
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/your/creds.json"
Running
- To run the main Python script:
python main.py
-
To run test queries against BigQuery, explore
test.py. -
To build the Jupyter Book documentation:
python python_build.py
Project Structure
pmc-submission/
βββ ch4-emissions/ # Folder likely related to methane emissions analysis
βββ jupyter-book/ # Jupyter Book source and build files
β βββ _config.yml # Jupyter Book configuration
β βββ _toc.yml # Table of contents
β βββ notebooks/ # Markdown and notebooks for interview and analysis
β βββ python_build.py # Build automation for Jupyter Book
βββ login.py # Google Cloud Bigtable login helper
βββ main.py # Main entry point
βββ propensity_scoring/ # Folder likely containing propensity scoring analysis
βββ python_build.py # Build automation for main project
βββ query.py # Java-like imports, possibly incomplete BigQuery client code
βββ query_gooogle.java # Java sample for Bigtable external query
βββ query_gooogle.json # Duplicate of java file, likely misplaced
βββ test.py # Python scripts with BigQuery SQL queries
βββ documentation.ipynb # Possibly project documentation notebook
Future Work / Roadmap
- Complete and clean up Java and Python BigQuery client code.
- Consolidate or remove duplicate/misplaced files like
query_gooogle.json. - Expand automated testing and CI/CD for data pipelines.
- Enhance documentation with more detailed usage examples.
- Integrate data pipeline automation for weather and taxi data ingestion.
- Improve error handling and logging in scripts.
Assumptions: The project is a personal portfolio of data analysis and technical interview work using Google Cloud services. Some files appear incomplete or duplicated, suggesting ongoing development.