-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix unsafe updates to current upload #4244
base: master
Are you sure you want to change the base?
Conversation
5f7c0b0
to
a198f1e
Compare
@fogninid BTW, can you tell how did you identify those places where locks are needed? |
I just observed a recurring crash in this special case. I am sure that there are tools that use that |
Are you able to provide us a steps to reprodouce it? |
@@ -711,8 +711,10 @@ public void setItems(OCUpload... items) { | |||
void fixAndSortItems(OCUpload... array) { | |||
FileUploader.FileUploaderBinder binder = parentActivity.getFileUploaderBinder(); | |||
|
|||
for (OCUpload upload : array) { | |||
upload.setDataFixed(binder); | |||
synchronized (binder.uploadStartLock()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pls explain what is the exact cause of the crash and how the comparator is fixing it?
It looks like we're trying to sort a data that is mutated from a background thread via magic wormhole and I cannot see how this can be fixed without a radical refactoring. :-(
TBH I'd prefer to have a complete report on the issue, with a beefy comment, so the future maintainer can safely defuse this bomb, instead of hiding it behind fixes for few edge cases.
As you wrote earlier, there is more of it in other places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the stacktrace:
04-28 21:55:34.428 3535 3535 E AndroidRuntime: java.lang.RuntimeException: Unable to start activity ComponentInfo{com.nextcloud.client/com.owncloud.android.ui.activity.UploadListActivity}: java.lang.IllegalArgumentException: Comparison method violates its general contract!
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3086)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3229)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:78)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:108)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:68)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1926)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.os.Handler.dispatchMessage(Handler.java:106)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.os.Looper.loop(Looper.java:214)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.ActivityThread.main(ActivityThread.java:6981)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at java.lang.reflect.Method.invoke(Native Method)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1445)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract!
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at java.util.TimSort.mergeHi(TimSort.java:899)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at java.util.TimSort.mergeAt(TimSort.java:516)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at java.util.TimSort.sort(TimSort.java:254)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at java.util.Arrays.sort(Arrays.java:1432)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.owncloud.android.ui.adapter.UploadListAdapter$UploadGroup.fixAndSortItems(UploadListAdapter.java:696)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.owncloud.android.ui.adapter.UploadListAdapter$2.refresh(UploadListAdapter.java:163)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.owncloud.android.ui.adapter.UploadListAdapter.loadUploadItemsFromDb(UploadListAdapter.java:570)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.owncloud.android.ui.adapter.UploadListAdapter.<init>(UploadListAdapter.java:174)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.owncloud.android.ui.activity.UploadListActivity.setupContent(UploadListActivity.java:160)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at com.owncloud.android.ui.activity.UploadListActivity.onCreate(UploadListActivity.java:137)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.Activity.performCreate(Activity.java:7326)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.Activity.performCreate(Activity.java:7317)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1271)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3066)
04-28 21:55:34.428 3535 3535 E AndroidRuntime: ... 11 more
The problem is that the current comparator tries to sort the data with the assumption that there is exactly one isUploadingNow==true
element.
The implementation already fixes the relevant data before sorting to avoid the background change in the elements while sorting:
for (OCUpload upload : array) {
upload.setDataFixed(binder);
}
See here for the attempt at a solution 06f45bd#diff-24801376b7c7edeb4e3089df542a4de8
Unfortunately this is not sufficient because the binder can switch to the next upload while the iteration is in progress, so it ends up with multiple elements with isUploadingNow==true
(or even with elements with inconsistent state, such as state==FAILED and isUploadingNow==true)
That's why I locked the iteration in the upload list using the uploadStartLock.
By fixing the comparator to avoid making those assumption the java.lang.IllegalArgumentException: Comparison method violates its general contract!
cannot happen anymore, but the internal inconsistency cannot be solved without proper Thread handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please put this comment in a comment for the FileUploader
class?
It looks like you've found a serious design issue that is non-trivial to reproduce and analyze and I find this analysis result is pretty valuable.
Somebody (probably me) will pick it up at some time when we're ready to modernize this part of the code, but in the meantime I don't want to loose valuable out-of-band information.
I'm thinking about something like:
// To future maintainer:
//
// ... rest of your analysis ...
//
@tobiasKaminsky @AndyScherzinger Second opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Permalink to the GitHub comment would also work, if we don't want to pollute the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second opinion?
I think having it within the comment would be nice since it is then a single source of information, when you come by this piece of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, please put it in code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to condense it in a comment and add a link to this discussion. Apart from that what do you think about the code change? Is it worth it or should we just spend more time in a broader fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fogninid this area needs some bold refactoring but we're not ready for it until we sort out diagnostics first, because this is a high-risk endeavour.
Did you spot any crashes happening around this code or maybe it's just a speculative fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
During the last weeks the app was crashing every time that I opened it, but I looked into the exceptions in logcat only a few times: the only crash I observed directly was that inside the comparator. Did you get any other crash report involving these classes?
The problem in the comparator is mitigated by my other pull request: making no assumption on the consistency of the data to sort avoids the crash even in the absence of proper synchronization.
This PR reduces the amount of inconsistencies, but as we saw it does not solve all problems.
Should I go ahead and bring this PR to a mergeable state solving the small problems that I see (and putting comments in the code as we discussed)? Or would you prefer that I just put comments without changing code and open another issue to track them?
As all race conditions, this happens only under load. I had a very long list of pending uploads (think about 2000 files for about 5Gb) and many of those where not on the device anymore so they failed almost immediately. Opening the UploadListView consistently crashed with the Comparator contract violation. |
Ok, I'm keen on moving forward with the for the time being, but it stil needs a proper handling.
@tobiasKaminsky what do you think?
|
A long outstanding step is to switch to WorkManager so that we can rely on Androids background system. Or am I totally on the wrong side? (outside I have 36°C and inside 28°C…) |
That's seems a very good solution, it would also solve all other synchronization problems when registering the notifiers after checking which file is uploading. |
@tobiasKaminsky @ezaquarii
I am not going to add lock-based synchronization in that loop because the called function is doing a DB lookup, holding a lock while calling it is very dangerous and a recipe for deadlocks (even if at present time probably safe: everywhere else I took care not to hold the lock when doing anything more than trivial memory operations). WDYT? |
Could you pls put this a as a comment too? Those are very valuable insights that we're going to revisit when ready for refactoring of this area. |
Signed-off-by: Daniele Fognini <[email protected]>
Signed-off-by: Daniele Fognini <[email protected]>
… rationale Signed-off-by: Daniele Fognini <[email protected]>
a198f1e
to
b214c1c
Compare
APK file: https://www.kaminsky.me/nc-dev/android-artifacts/10226.apk |
Codacy278Lint
SpotBugs (new)
SpotBugs (master)
|
Codecov Report
@@ Coverage Diff @@
## master #4244 +/- ##
============================================
- Coverage 16.03% 15.98% -0.06%
Complexity 1 1
============================================
Files 340 340
Lines 31181 31206 +25
Branches 4433 4435 +2
============================================
- Hits 5000 4988 -12
- Misses 25318 25352 +34
- Partials 863 866 +3
|
1 similar comment
Codecov Report
@@ Coverage Diff @@
## master #4244 +/- ##
============================================
- Coverage 16.03% 15.98% -0.06%
Complexity 1 1
============================================
Files 340 340
Lines 31181 31206 +25
Branches 4433 4435 +2
============================================
- Hits 5000 4988 -12
- Misses 25318 25352 +34
- Partials 863 866 +3
|
FileUploader is used from multiple thread for the sync service, and from the activity threads.
Synchronize access to mutable fields.