Note: Entry dates reflect development timeline rather than commit history, as this project underwent testing and documentation before GitHub upload.
Established automated ETL pipeline connecting Kaggle to Google BigQuery:
- Implemented Kaggle API integration for dataset retrieval
- Engineered data transformation layer using Pandas
- Developed automated BigQuery table creation and population system
Created comprehensive analytics framework in SQL_Queries.py
:
- Engineered modular query system for business intelligence
- Developed tables in Google Cloud's BigQuery covering:
- Order tracking and analysis
- Product performance metrics
- Seller performance evaluation
- Revenue analysis and forecasting
- Shipping optimization queries
After a significant hiatus/pause, I've decided to continue development of this project to continue learning and enhancing my skills
- Developed a Date Dimension table using pure SQL. I started with a series of Calendar Dates, and using SQL, performed significant calculations to create relative and time series dating, quarter names, etc.
- Created STORE MANAGERS mock data using Mockaroo to add Store Manager names to my Store ID file to create realistic store ID data for each store in our Superstore
- Developed a new table in SPM in GBQ, and merged the Store Manager data to the orders table using SQL