A Python-based tool designed to parse and process RSS feeds, primarily aimed at automating the posting of Hugo blog updates to social media or other platforms. This project includes utilities for interacting with Google Cloud services and supports deployment via Docker and Google Cloud Build.
Features
- Parses RSS feeds to extract and process new blog entries.
- Converts published dates to standardized formats for comparison and processing.
- Updates a backend service or database with new or updated feed entries via HTTP requests.
- Integrates with Google Cloud services including BigQuery, Cloud Storage, and Cloud Logging through reusable client utilities.
- Supports containerized deployment with Docker and automated builds using Google Cloud Build.
Tech Stack
- Python 3
- feedparser (for RSS parsing)
- requests (for HTTP requests)
- Google Cloud SDKs (BigQuery, Storage, Logging)
- Docker
- Google Cloud Build
Getting Started
Prerequisites
- Python 3.7 or higher
- Docker (optional, for containerized deployment)
- Google Cloud account and appropriate permissions
Installation
- Clone the repository:
git clone https://github.com/justin-napolitano/python-rss-reader.git
cd python-rss-reader
- (Optional) Set up a Python virtual environment:
python3 -m venv venv
source venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Configure environment variables and credentials:
-
Place your Google Cloud service account JSON in
secret.jsonor set the environment variableGOOGLE_APPLICATION_CREDENTIALS. -
Configure any other required environment variables as needed.
Running the RSS Scraper
python rss-scraper.py
This will parse the RSS feed (default: https://jnapolitano.com/index.xml) and attempt to update the backend service with new entries.
Using Docker
Build the Docker image:
docker build -t python-rss-reader .
Run the container:
docker run --env GOOGLE_APPLICATION_CREDENTIALS=/path/to/secret.json -v /local/path/to/secret.json:/path/to/secret.json python-rss-reader
Google Cloud Build
The cloudbuild.yaml file defines steps to build and push the Docker image to Google Container Registry. Uncomment and configure additional steps to deploy to Cloud Run or set up Cloud Scheduler jobs.
Run Cloud Build:
gcloud builds submit --config cloudbuild.yaml .
Project Structure
python-rss-reader/
βββ cloudbuild.yaml # Google Cloud Build configuration
βββ Dockerfile # Docker image definition
βββ gcputils/ # Google Cloud utility submodule
β βββ BigQueryClient.py # BigQuery client wrapper
β βββ GCSClient.py # Google Cloud Storage client wrapper
β βββ GoogleCloudLogging.py# Cloud Logging client wrapper
β βββ index.md # Documentation
β βββ readme.md # Documentation
βββ images/ # Image assets
βββ index.md # Project notes and thoughts
βββ last_run.txt # Stores last run timestamp
βββ readme.md # Project notes and thoughts (similar to index.md)
βββ requirements.txt # Python dependencies
βββ rss-scraper.py # Main RSS parsing and update script
βββ secret.json # Google Cloud service account credentials (sensitive)
Future Work / Roadmap
- Implement a dedicated API or batch processor for handling feed updates instead of a monolithic script.
- Add more robust error handling and retry mechanisms.
- Extend support for publishing parsed posts to various social media platforms.
- Enhance configuration management for cloud deployments.
- Add automated tests and CI/CD pipelines.
- Improve documentation and usage examples.
Note: This README is based on available source files and inferred project goals. Some assumptions were made regarding deployment and usage.