Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this package to rank Web pages ??? #1

Open
chegmarco1989 opened this issue Oct 31, 2021 · 2 comments
Open

How to use this package to rank Web pages ??? #1

chegmarco1989 opened this issue Oct 31, 2021 · 2 comments

Comments

@chegmarco1989
Copy link

Hi.

We want to know:

1 - How to use it in the case of ranking of web pages ???
What can we insert into the $datasource variable to successfully classify our web page ???
Should we just put the list of urls as datasource ???

2 - Is this package capable of classifying a large or large database of the order of millions or even billions of data ???

Thank you for informing us please.

@chegmarco1989 chegmarco1989 changed the title How to use this package to ranking Web pages ??? How to use this package to rank Web pages ??? Oct 31, 2021
@DavidBelicza
Copy link
Member

DavidBelicza commented Nov 1, 2021

Hi @chegmarco1989

The data source is a nested array of integers or a graph if you like. These integers represent the IDs of the entities that being page ranked.

Btw, pagerank is not definitely for ranking webpages. It can rank entities if the relationship between these entities is known. It is called to "Page rank" because the name of the inventor is "Larry Page".

The functional tests shows the usage: https://github.com/PHP-Science/PageRank/blob/master/tests/functional/Service/PageRankAlgorithmTest.php#L86

The array contains a list of entities with their IDs. And it also contains the incoming and outgoing connections.

This method shows how to build up the object and where to put the data source: createPageRankAlgorithm
And the method testRun shows the usage.

Too many entities or too high iteration number will consume more time to execute the algorithm. I believe, in real world, a pagerank algorithm runs in parallel in smaller topics - sometimes for weeks. (Also the optimised search algorithms weren't as efficient as the PHP builtin search algorithms.)

@chegmarco1989
Copy link
Author

chegmarco1989 commented Nov 1, 2021

Thank you very much @DavidBelicza for your answer.

But what do you mean by "Also the optimised search algorithms weren't as efficient as the PHP builtin search algorithms" ???

Can you give us some examples of native PHP algorithms that can deliver results more efficiently than "PageRank" ???

Thank you for responding to us please.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants