- Download tar.gz file from zenodo
- Unzip, untar tar.gz file from Zenodo into CSVs
- Extract Pypi data from CSVs a. Rename and filter projects CSV b. Rename and filter dependencies CSV c. Rename and filter versions CSV
- Start Neo4j, install Graph algorithms and APOC
- Run
schema.cypher
- CREATE the Pypi
Platform
, the PythonLanguage
, and create their relationship,HAS_DEFAULT_LANGUAGE
- Run
projects_apoc.cypher
- Run
versions_apoc.cypher
- Run
dependencies_apoc.cypher
- Run
create_sqlite_db.py
- Run
initiate_sqlite_db_with_neo4j_project_names.py
- Run
request_libraries_io_load_sqlite.py
a. Retrieve project names from SQLitebatch_size
at a time, only the ones that have not been queried at all yet i. If there are none left, exit with code 1 ii. Otherwise, go to step [b] b. Request Libraries.io API, project contributors endpoint for each project name c. Store the result of part [b] into SQLite (successful or not) d. Return to step [a] - Run
request_libraries_io_load_sqlite.py
, but querying for records that haveapi_has_been_queried=1 AND api_query_succeeded=0
. Do as [12] - Use Cypher to run
set_merged_contributors_property.cypher
. This script adds amerged_contributors
property to every PythonProject
node, with the value -1. N.b.
- The value -1 indicates that the particular node has not attempted
to merge its
Contributor
s yet - The value is changed to 1 if the merge operation is successful
- The value is changed to 0 if the merge operation is not successful
- Run
merge_projects.py /path/to/SQLite.db -1
, using SQLite records in whichapi_has_been_queried=1 AND api_query_succeeded=1
. a. Get all project names, contributors that represent PythonProject
s on Pypi b. For each project, make a py2neo.Node for each of its contributors c. For each contributor, MERGE that contributor then MERGE its relationship with the project - For the nodes that failed (15), they have property
merged_contributors=0
. So, runmerge_projects_with_py2neo.py /path/to/SQLite.db 0
in order to repeat the process from (15) for the Project nodes theContributor
s of which were not MERGEd - Run the Cypher script
remove_merged_contributors_property.cypher
to remove themerged_contributors
property from all nodes. It was only necessary during the previous operation, so can safely be unset. - Finally, run query for degree centrality to find the most influential contributor on Pypi