Welcome to Influence On Sales application

This project is directed by the University of Fribourg in the context of the course FS2023: 63091 Social Media Analytics

In order to use the Influence on Sales application, please follow the steps below for installation, user guide and information.

Installation

To install the Influence On Sales application, you will need to:

First create a directory on your computer and open a terminal on this directory.
Go on https://git-lfs.com/ and install the lfs tool as explained on the website.
1. Alternatively, download, copy, past the dataset amazon-meta.txt after the clone (command 3) on https://github.com/qnater/InfluenceOnSales/blob/master/dataset/origin_dataset/amazon-meta.txt into the folder : ./dataset/origin_dataset/
On the terminal, clone the GitHub repository. To do so, copy and paste this command : git clone https://github.com/qnater/InfluenceOnSales.git
Once clone enter inside the clone directory with the command : cd InfluenceOnSales
Install the requirements with the command : pip install -r requirements.txt
If an error appears, you may need to install pip.
Once installed, the application can be run with one of the following commands, according to your operating system: python project_launcher.py for Windows, python3 project_launcher.py for Mac
To understand how to use the different features and commands, please follow the user guide below.

User Guide

The Influence on Sales application enables the analysis of datasets of Amazon products, that are registered with the Amazon Standard Identification Number (ASIN). The app consists of different modules: Pre-Processing, Enrichment, Analytics, Exploration, Persistence and Visualization. These modules are integrated into the scenarios described below.
To call a scenario, simply run the project_launcher.py file on a terminal and write the scenario to launch on the console.

project_launcher.py

In this class you can find six scenarios to conduct the analysis modules.

Scenario 1 : Pre-Processing of the dataset

In this scenario, the initial dataset will be cleaned and sampled into four different graphs. This process will be displayed by providing information on the number of nodes and quality of the clustering for each graph. During the cleaning operation, we will remove unnecessary nodes (not out-edged, not in-edged and isolated).

Datasets

amazon-meta.txt (700'000 nodes), dataset_off_amazon_enrichment.txt (180'000 nodes), dataset_off_amazon_big.txt (120'000 nodes), dataset_off_amazon_small.txt (60'000)

Returned metrics

Runtime, Clustering Coefficient, number of nodes, number of edges, average degree.

Scenario 2 : Community Detection

In this scenario, we compare algorithms of community detection with different datasets. To do so, on each graph, three different community detection algorithms are executed (simple homemade, enhanced homemade with weight and networkX library), popular nodes are identified and community partition quality is evaluated with metrics such as Accuracy, Precision, Recall, Jaccard Similarity, Silhouette Index.

Datasets

dataset_off_amazon_enrichment.txt (180'000 nodes), dataset_off_amazon_big.txt (120'000 nodes)

Returned metrics

Runtime, silhouette index, accuracy, precision, recall, Jaccard similarity, communities detected, popular nodes of each community with centrality value.

Scenario 3 : Visualization

In this scenario, a small sample of the dataset will be used to visualize the graph. After running the community detection algorithm, the graph will be plotted with communities in different colors, and the most popular node inside each highlighted.

Datasets

dataset_off_amazon_test.txt (11'000 nodes)

Returned object

Plot image

Scenario 4 : Exploration

In this scenario, a small sample of the initial dataset will be used to conduct a deep analysis of the quality of the graph, as well as the connections between nodes and communities (paths).

Datasets

dataset_off_amazon_test.txt (11'000 nodes)

Returned object

Plot image

Scenario 5 : Statistical Analysis

This scenario consists of analysing the report between the betweeness centrality of the popular nodes of each community and their actual sale ranks.

Datasets

dataset_off_amazon_enrichment.txt (180'000 nodes), dataset_off_amazon_test.txt (11'000 nodes)

Returned objects

ASIN, betweeness centrality value, sale rank

Scenario 6 : Overall run

This scenario consists of all four above described scenarios, which will be conducted in succession.

Datasets

Based on the user choice: dataset_off_amazon_enrichment.txt (180'000 nodes), dataset_off_amazon_big.txt (120'000 nodes), dataset_off_amazon_small.txt (60'000 nodes), dataset_off_amazon_test.txt (11'000 nodes)

Returned objects

Runtime, Clustering Coefficient, number of nodes, number of edges, average degree, silhouette index, accuracy, precision, recall, Jaccard similarity, community detected, popular nodes of each community with centrality measures, plot images.

z_circle_ci_unit_test.py

This file allows the unit tests of every possible implementation in the GitHub Circle CI.

z_compare_algo_launcher.py

This file compares the different algorithms for group-based community detection.

z_enrichment_launcher.py

This file allows the merge of the main amazon dataset with the enriched dataset.

z_persistence_launcher.py & Neo4J DB

This file allows the population of the online database Neo4J.

You can find the database on this link : https://workspace-preview.neo4j.io/workspace/query

To connect, please go to "Query", click on the central button "No connection", then on "Connect".

Connection URL	95147e5a.databases.neo4j.io:7687
Database user	neo4j
Password	GslPkJDwnmAZC_COZUcHQ1hFymVSQTzS_f6loACAyNY

To import the queries, please go on "Saved Cypher" and import the file "./docs/neo4j_queries.csv" of the project tree.

Name		Name	Last commit message	Last commit date
Latest commit History 187 Commits
.github/workflows		.github/workflows
.idea		.idea
analytics		analytics
dataset		dataset
docs		docs
enrichment		enrichment
explore		explore
export		export
persistence		persistence
plots		plots
preprocessing		preprocessing
results		results
statistics		statistics
visualization		visualization
.gitattributes		.gitattributes
.gitignore		.gitignore
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
project_launcher.py		project_launcher.py
requirements.txt		requirements.txt
z_circle_ci_unit_test.py		z_circle_ci_unit_test.py
z_compare_algo_launcher.py		z_compare_algo_launcher.py
z_enrichment_launcher.py		z_enrichment_launcher.py
z_persistence_launcher.py		z_persistence_launcher.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Welcome to Influence On Sales application

Installation

User Guide

project_launcher.py

Scenario 1 : Pre-Processing of the dataset

Datasets

Returned metrics

Scenario 2 : Community Detection

Datasets

Returned metrics

Scenario 3 : Visualization

Datasets

Returned object

Scenario 4 : Exploration

Datasets

Returned object

Scenario 5 : Statistical Analysis

Datasets

Returned objects

Scenario 6 : Overall run

Datasets

Returned objects

z_circle_ci_unit_test.py

z_compare_algo_launcher.py

z_enrichment_launcher.py

z_persistence_launcher.py & Neo4J DB

About

Releases

Packages

Contributors 3

Languages

qnater/InfluenceOnSales

Folders and files

Latest commit

History

Repository files navigation

Welcome to Influence On Sales application

Installation

User Guide

project_launcher.py

Scenario 1 : Pre-Processing of the dataset

Datasets

Returned metrics

Scenario 2 : Community Detection

Datasets

Returned metrics

Scenario 3 : Visualization

Datasets

Returned object

Scenario 4 : Exploration

Datasets

Returned object

Scenario 5 : Statistical Analysis

Datasets

Returned objects

Scenario 6 : Overall run

Datasets

Returned objects

z_circle_ci_unit_test.py

z_compare_algo_launcher.py

z_enrichment_launcher.py

z_persistence_launcher.py & Neo4J DB

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages