-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-implementation of open source methods in another language #16
Comments
Do you have a reference paper to target for the replication ? |
Yes. This is the paper: http://jmlr.csail.mit.edu/papers/volume15/wager14a/wager14a.pdf, and the previous implementation: https://github.com/swager/randomForestCI |
It is okay as long as you do not make a simple "translation" of the R code. The idea of the replication is really to check if the original article is self-sufficient when describing method or model (i.e. without the accompanying code) or if some information is incorrect or missing. In the end, the original article + your article should be sufficient for future replications. @khinsen What do you think ? |
This seems like a better fit for a project like On Tue, 19 Apr 2016 at 21:23 Nicolas P. Rougier [email protected]
|
Didn't know this. But anyway, you can do both actually (publication and contribution). |
@rougier I agree: the main point of ReScience is doing replication in the sense of writing a new implementation that should produce results identical to published ones. If a published implementation already, we should ask for a "clean-room reimplementation" although this can of course not be verified. In my personal experience, a second independent implementation is a great way to find mistakes (in both implementations), so I am tempted to suggest that we even encourage that kind of submission for ReScience. |
There's an additional issue as well: most code in R is licensed GPL, while On Wed, 20 Apr 2016 at 17:57 Konrad Hinsen [email protected] wrote:
|
Replication is not a derivative work for me. |
I am not a lawyer, but I believe that if you look at GPL code while you On Wed, 20 Apr 2016 at 18:02 Nicolas P. Rougier [email protected]
|
Might be generally of relevance, but not for this particular case. We did On Wed, Apr 20, 2016 at 9:04 AM, Federico Vaggi [email protected]
|
You have of course the right to look at the code, but the idea is to start from the paper and to look at the code only if there is a missing piece of information in the paper or something remains obscure. Else, if the original author made a mistake, you could end up just translating that mistake in your code. |
When you say mistake you mean a bug (of whatever type) as opposed to a mistake in the journal article, right? |
No, I mean a mistake in the code in the sense that the code does not implement what is advertised in the paper. For example you can write you're integrating an equation using the Runge-Kutta numerical method while the code actually uses the explicit Euler methods. In some cases this won't make a difference, but in some other cases, this could lead to different results and hence, this must be reported in the new article. |
I think we disagree on terminology, but not on the solution. If the implementation (code) doesn't match the specification (journal article), I would class that as a bug (a mistake in the code, specifically can be seen as a logic error) and as a mistake in the journal article. |
To clarify, in case not clear from above, I do not mean that the presence of logic errors means there is a mistake in the journal article. There might be many logic errors without any mistakes in the article, merely because it does not matter in those specific cases that logical errors exist. But all mismatches between reported specification and implementation, directly require/imply a mistake in the journal article in the case where the journal article serves as the only spec. |
I agree. This is precisely the goal of replication in ReScience: to spot such mistake (an also missing information) and to report them such that the two articles (original + replication) constitutes now a complete spec. For me the added value of replications in ReScience is more the article than the code. For me, bug (or errors) are something different (and worse) because they can invalidate results. For example let's imagine you're using a fixed seed in your random generator (for debug) and you forgot to remove it before making stats using several runs of your model. This may very well invalidate all the results. |
I think it's just terminology/jargon that we disagree on. Basically 100% agreed. 😄 |
Fig. 3a) and 3b)
Dear Rescience editors. In the course of our work, we have created a Python implementation of a method that was previously available as open-source R code. Is this implementation within the scope of Rescience? Thanks! cc:@kpolimis, @bhazelton
The text was updated successfully, but these errors were encountered: