Skip to content

Latest commit

 

History

History
328 lines (249 loc) · 8.27 KB

data_engineer.md

File metadata and controls

328 lines (249 loc) · 8.27 KB

Data Engineer Journey

This is my Data Eng journy....

Junior

Fundamental SQL concepts

The table that I will be using here:

users_table

id name email
1 John Doe [email protected]
2 Jane Smith [email protected]
3 Robert Brown [email protected]

SELECT

SELECT id, name
FROM users_table
WHERE name like 'J%';

The above query would return

ID Name
1 John Doe
2 Jane Smith

INSERT

INSERT INTO users_table (id, name, email)
VALUES (4, 'Jonny Appleseed', '[email protected]');

The above would make the table as follows

id name email
1 John Doe [email protected]
2 Jane Smith [email protected]
3 Robert Brown [email protected]
4 Jonny Appleseed [email protected]

UPDATE

UPDATE users_table
SET name = 'Joe Bloggs', email = '[email protected]'
WHERE id = 4

This would return the following table as I have changed the name of user id 4

id name email
1 John Doe [email protected]
2 Jane Smith [email protected]
3 Robert Brown [email protected]
4 Joe Bloggs [email protected]

DELETE

DELETE FROM users_table where id = 4

This would return, as we have removed user with id 4 from the table.

id name email
1 John Doe [email protected]
2 Jane Smith [email protected]
3 Robert Brown [email protected]

ALTER

This is going to add a new col to the users_table a col called phone that is type varchar.

ALTER TABLE users_table
ADD phone varchar;

This renames the col phone to telephone

ALTER TABLE users_table
RENAME COLUMN phone to telephone;

This removes the col telephone from the users_table

ALTER TABLE users_table
DROP COLUMN telephone;

Database Basics

Relational Database Concepts: A relational database is a type of database that organizes data into tables, which are composed of rows (also known as records) and columns (also known as fields). Each row represents a unique instance of the entity being modeled, and each column represents a specific attribute or piece of data associated with that entity.

Primary Key: A primary key is a column in a table that uniquely identifies each row in that table. It ensures that each row is distinct and can be uniquely identified. Primary keys are essential for maintaining data integrity and for establishing relationships between tables

Foreign Key: A foreign key is a column in one table that refers to the primary key in another table. It establishes a relationship between two tables, where the values in the foreign key column of one table match the values in the primary key column of another table. This relationship allows data to be linked across tables, enabling the creation of meaningful associations between different entities.

Primary/Foreign Key Relationships: A primary key and a foreign key establish a relationship between two tables. This relationship is often referred to as a parent-child relationship. The table containing the primary key is the parent table, and the table containing the foreign key is the child table. The primary key in the parent table serves as a unique identifier for each record, and the foreign key in the child table references the corresponding record in the parent table.

For example:

database pk/fk

The orders table would be considered our main table in this situation. This table has relationships with the products and customers table via foreign keys. The orders table has an FK of customer_id this allows the orders table to have a relatioship with the customers table enabling the user to join data from customers to orders

The same goes for the FK product_id which enables the relationship with the products table.

Here is a code example using Postgres:

CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    first_name VARCHAR(50),
    last_name VARCHAR(50),
    email VARCHAR(100) UNIQUE,
    telphone VARCHAR(50) UNIQUE
);

CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(255),
    price FLOAT,
)

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    product_id INT,
    -- Other columns in the orders table
    CONSTRAINT fk_customer
    FOREIGN KEY (customer_id) REFERENCES customers (customer_id),

    CONSTRAINT fk_products
    FOREIGN KEY (product_id) REFERENCES products (product_id)
);

JOINS

Joins are used to combine data from two or more tables in a database based on a related column between them. Joins allow you to retrieve information from multiple tables in a single query, making it possible to get a more comprehensive view of your data.

Example tables

customers

+----+----------+
| ID | Name     |
+----+----------+
| 1  | Alice    |
| 2  | Bob      |
| 3  | Carol    |
+----+----------+

orders

+----------+------------+
| OrderID  | CustomerID |
+----------+------------+
| 101      | 1          |
| 102      | 2          |
| 103      | 1          |
| 104      | 4          |
+----------+------------+
INNER JOIN

An INNER JOIN returns only the rows where there is a match in both tables based on the specified condition.

SELECT customers.Name, orders.OrderID
FROM customers
INNER JOIN orders ON customers.ID = orders.CustomerID;

Result

+----------+----------+
| Name     | OrderID  |
+----------+----------+
| Alice    | 101      |
| Bob      | 102      |
| Alice    | 103      |
+----------+----------+
LEFT JOIN

A LEFT JOIN returns all rows from the left (first) table and the matching rows from the right (second) table. If there's no match, NULL values are returned for the columns of the right table.

SELECT customers.Name, orders.OrderID
FROM customers
LEFT JOIN orders ON customers.ID = orders.CustomerID;

Result

+----------+----------+
| Name     | OrderID  |
+----------+----------+
| Alice    | 101      |
| Bob      | 102      |
| Alice    | 103      |
| Carol    | NULL     |
+----------+----------+
RIGHT JOIN

A RIGHT JOIN is similar to a LEFT JOIN, but it returns all rows from the right (second) table and the matching rows from the left (first) table. If there's no match, NULL values are returned for the columns of the left table.

SELECT customers.Name, orders.OrderID
FROM customers
RIGHT JOIN orders ON customers.ID = orders.CustomerID;

Result

+----------+----------+
| Name     | OrderID  |
+----------+----------+
| Alice    | 101      |
| Bob      | 102      |
| Alice    | 103      |
| NULL     | 104      |
+----------+----------+

Mid-Level

Lead

Senior Level

ETL Process

Extract, Transform, Load

Projects

A selection of repos where I have built projects to harness my learnings

Videos

A selection of videos I have watched during my learning.