Main goals:
Engineer data streaming pipeline on Azure with a main purpose to ingest and process tweets and satelite images data from Hurricane Harvey natural disaster, and serve Power BI report.
This is a meta repository that contains documentation and links to two subfolders in this repository, each of them having a distinct purpose:
-
hurricane-proc-send-data. Pre-processing of tweets about the hurricane harvey events, combining it with satelite images of the building s with and without damage and simulating a streaming data source by building a python program that sends requests to a Azure API endpoint (#TODO fire CLI)
-
hurricane-streaming-az-funcs Azure data streaming pipeline that:
- Ingests tweets from the local source client via Azure API management having a Azure Function as backend
- Utilizes Azure Event Hub as a message queue service
- Azure Function that takes messages from Azure Event Hub and writes them to Azure Cosmos Database
- ⚙️ Data Engineering Project ⚙️
- 🌪️🌪️ Hurricane Harvey Tweets and Satelite Images - Azure Data Pipelines and Data Visualization 🌪️🌪️
- Introduction & Goals
- Data
- Used Tools
- Author: 👤 Kristijan Bakaric
- Follow Me On
Tools:
-
Local:
-
- as operating system for local development
-
Visual Studio Code with plugins for Azure Services
- local development and deployment do Azure (Azure Functions, Azure Web App)
-
Python and its libraries - Pandas, Requests
- data processing, and sending https requests to Azure API management
-
Azure SDK's
- for relevant Azure Services in Streamlit App use case - azure-cosmos
-
Power BI
- visualization of data from Azure Cosmos DB
-
-
Azure:
-
Azure Cosmos DB - SQL Core - Document Store
- Hurricane Harvey Tweets from Kaggle.
Tweets containing Hurricane Harvey from the morning of 8/25/2017. I hope to keep this updated if computer problems do not persist.
*8/30 Update This update includes the most recent tweets tagged "Tropical Storm Harvey", which spans from 8/20 to 8/30 as well as the properly merged version of dataset including Tweets from when Harvey before it was downgraded back to a tropical storm.
- Satellite Images of Hurricane Damage from Kaggle.
Overview The data are satellite images from Texas after Hurricane Harvey divided into two groups (damage and no_damage). The goal is to make a model which can automatically identify if a given region is likely to contain flooding damage.
Source Data originally taken from: https://ieee-dataport.org/open-access/detecting-damaged-buildings-post-hurricane-satellite-imagery-based-customized and can be cited with http://dx.doi.org/10.21227/sdad-1e56 and the original paper is here: https://arxiv.org/abs/1807.01688
- Azure API Management
API Management (APIM) is a way to create consistent and modern API gateways for existing back-end services.
- Azure Event Hubs
Azure Event Hubs is a big data streaming platform and event ingestion service. It can receive and process millions of events per second. Data sent to an event hub can be transformed and stored by using any real-time analytics provider or batching/storage adapters.
- Azure Function
Azure Functions is a serverless solution that allows you to write less code, maintain less infrastructure, and save on costs. Instead of worrying about deploying and maintaining servers, the cloud infrastructure provides all the up-to-date resources needed to keep your applications running.
-
Azure Blob storage is Microsoft's object storage solution for the cloud. Blob storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.
-
Azure Cosmos DB - SQL Core - Document Store
Azure Cosmos DB is a fully managed NoSQL database for modern app development. Single-digit millisecond response times, and automatic and instant scalability, guarantee speed at any scale. Business continuity is assured with SLA-backed availability and enterprise-grade security.
- Power BI Desktop Report
Rich, interactive reports with visual analytics.
- Website: personal-website
- Twitter: @kbakaric1
- Github: @baky0905
- LinkedIn: @kristijanb
Give a ⭐️ if this project helped you!