Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix unsafe updates to current upload #4244

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

fogninid
Copy link
Member

FileUploader is used from multiple thread for the sync service, and from the activity threads.
Synchronize access to mutable fields.

@ezaquarii
Copy link
Collaborator

@fogninid BTW, can you tell how did you identify those places where locks are needed?

@fogninid
Copy link
Member Author

@fogninid BTW, can you tell how did you identify those places where locks are needed?

I just observed a recurring crash in this special case. I am sure that there are tools that use that @GuardedBy and similar annotations that I put there, but I don't know how to configure them right now.

@ezaquarii
Copy link
Collaborator

@fogninid BTW, can you tell how did you identify those places where locks are needed?

I just observed a recurring crash in this special case. I am sure that there are tools that use that @GuardedBy and similar annotations that I put there, but I don't know how to configure them right now.

Are you able to provide us a steps to reprodouce it?

@@ -711,8 +711,10 @@ public void setItems(OCUpload... items) {
void fixAndSortItems(OCUpload... array) {
FileUploader.FileUploaderBinder binder = parentActivity.getFileUploaderBinder();

for (OCUpload upload : array) {
upload.setDataFixed(binder);
synchronized (binder.uploadStartLock()) {
Copy link
Collaborator

@ezaquarii ezaquarii Jul 23, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pls explain what is the exact cause of the crash and how the comparator is fixing it?

It looks like we're trying to sort a data that is mutated from a background thread via magic wormhole and I cannot see how this can be fixed without a radical refactoring. :-(

TBH I'd prefer to have a complete report on the issue, with a beefy comment, so the future maintainer can safely defuse this bomb, instead of hiding it behind fixes for few edge cases.

As you wrote earlier, there is more of it in other places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the stacktrace:

04-28 21:55:34.428  3535  3535 E AndroidRuntime: java.lang.RuntimeException: Unable to start activity ComponentInfo{com.nextcloud.client/com.owncloud.android.ui.activity.UploadListActivity}: java.lang.IllegalArgumentException: Comparison method violates its general contract!
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3086)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.ActivityThread.handleLaunchActivity(ActivityThread.java:3229)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.servertransaction.LaunchActivityItem.execute(LaunchActivityItem.java:78)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.servertransaction.TransactionExecutor.executeCallbacks(TransactionExecutor.java:108)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.servertransaction.TransactionExecutor.execute(TransactionExecutor.java:68)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.ActivityThread$H.handleMessage(ActivityThread.java:1926)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.os.Handler.dispatchMessage(Handler.java:106)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.os.Looper.loop(Looper.java:214)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.ActivityThread.main(ActivityThread.java:6981)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at java.lang.reflect.Method.invoke(Native Method)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.android.internal.os.RuntimeInit$MethodAndArgsCaller.run(RuntimeInit.java:493)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:1445)
04-28 21:55:34.428  3535  3535 E AndroidRuntime: Caused by: java.lang.IllegalArgumentException: Comparison method violates its general contract!
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at java.util.TimSort.mergeHi(TimSort.java:899)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at java.util.TimSort.mergeAt(TimSort.java:516)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at java.util.TimSort.mergeForceCollapse(TimSort.java:457)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at java.util.TimSort.sort(TimSort.java:254)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at java.util.Arrays.sort(Arrays.java:1432)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.owncloud.android.ui.adapter.UploadListAdapter$UploadGroup.fixAndSortItems(UploadListAdapter.java:696)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.owncloud.android.ui.adapter.UploadListAdapter$2.refresh(UploadListAdapter.java:163)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.owncloud.android.ui.adapter.UploadListAdapter.loadUploadItemsFromDb(UploadListAdapter.java:570)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.owncloud.android.ui.adapter.UploadListAdapter.<init>(UploadListAdapter.java:174)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.owncloud.android.ui.activity.UploadListActivity.setupContent(UploadListActivity.java:160)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at com.owncloud.android.ui.activity.UploadListActivity.onCreate(UploadListActivity.java:137)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.Activity.performCreate(Activity.java:7326)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.Activity.performCreate(Activity.java:7317)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.Instrumentation.callActivityOnCreate(Instrumentation.java:1271)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        at android.app.ActivityThread.performLaunchActivity(ActivityThread.java:3066)
04-28 21:55:34.428  3535  3535 E AndroidRuntime:        ... 11 more

The problem is that the current comparator tries to sort the data with the assumption that there is exactly one isUploadingNow==true element.

The implementation already fixes the relevant data before sorting to avoid the background change in the elements while sorting:

for (OCUpload upload : array) {
  upload.setDataFixed(binder);
}

See here for the attempt at a solution 06f45bd#diff-24801376b7c7edeb4e3089df542a4de8

Unfortunately this is not sufficient because the binder can switch to the next upload while the iteration is in progress, so it ends up with multiple elements with isUploadingNow==true (or even with elements with inconsistent state, such as state==FAILED and isUploadingNow==true)

That's why I locked the iteration in the upload list using the uploadStartLock.

By fixing the comparator to avoid making those assumption the java.lang.IllegalArgumentException: Comparison method violates its general contract! cannot happen anymore, but the internal inconsistency cannot be solved without proper Thread handling.

Copy link
Collaborator

@ezaquarii ezaquarii Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please put this comment in a comment for the FileUploader class?

It looks like you've found a serious design issue that is non-trivial to reproduce and analyze and I find this analysis result is pretty valuable.

Somebody (probably me) will pick it up at some time when we're ready to modernize this part of the code, but in the meantime I don't want to loose valuable out-of-band information.

I'm thinking about something like:

// To future maintainer:
//
// ... rest of your analysis ...
//

@tobiasKaminsky @AndyScherzinger Second opinion?

Copy link
Collaborator

@ezaquarii ezaquarii Jul 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Permalink to the GitHub comment would also work, if we don't want to pollute the code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second opinion?

I think having it within the comment would be nice since it is then a single source of information, when you come by this piece of code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, please put it in code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to condense it in a comment and add a link to this discussion. Apart from that what do you think about the code change? Is it worth it or should we just spend more time in a broader fix?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fogninid this area needs some bold refactoring but we're not ready for it until we sort out diagnostics first, because this is a high-risk endeavour.

Did you spot any crashes happening around this code or maybe it's just a speculative fix?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the last weeks the app was crashing every time that I opened it, but I looked into the exceptions in logcat only a few times: the only crash I observed directly was that inside the comparator. Did you get any other crash report involving these classes?

The problem in the comparator is mitigated by my other pull request: making no assumption on the consistency of the data to sort avoids the crash even in the absence of proper synchronization.

This PR reduces the amount of inconsistencies, but as we saw it does not solve all problems.
Should I go ahead and bring this PR to a mergeable state solving the small problems that I see (and putting comments in the code as we discussed)? Or would you prefer that I just put comments without changing code and open another issue to track them?

@fogninid
Copy link
Member Author

@fogninid BTW, can you tell how did you identify those places where locks are needed?

I just observed a recurring crash in this special case. I am sure that there are tools that use that @GuardedBy and similar annotations that I put there, but I don't know how to configure them right now.

Are you able to provide us a steps to reprodouce it?

As all race conditions, this happens only under load. I had a very long list of pending uploads (think about 2000 files for about 5Gb) and many of those where not on the device anymore so they failed almost immediately. Opening the UploadListView consistently crashed with the Comparator contract violation.

@ezaquarii
Copy link
Collaborator

ezaquarii commented Jul 24, 2019 via email

@tobiasKaminsky
Copy link
Member

A long outstanding step is to switch to WorkManager so that we can rely on Androids background system.
This way we can assure that only one upload job is running and it is handled properly by Android.
For viewing pending jobs we then could use LiveData and get notified about changes.

Or am I totally on the wrong side? (outside I have 36°C and inside 28°C…)

@fogninid
Copy link
Member Author

fogninid commented Jul 25, 2019

A long outstanding step is to switch to WorkManager so that we can rely on Androids background system.
This way we can assure that only one upload job is running and it is handled properly by Android.
For viewing pending jobs we then could use LiveData and get notified about changes.

Or am I totally on the wrong side? (outside I have 36°C and inside 28°C…)

That's seems a very good solution, it would also solve all other synchronization problems when registering the notifiers after checking which file is uploading.
Any idea how much work would be to implement it? I'm also not keen on changing code when I know that much more is broken and at the end it remains only slightly less broken.

@fogninid
Copy link
Member Author

fogninid commented Aug 1, 2019

A long outstanding step is to switch to WorkManager so that we can rely on Androids background system.
This way we can assure that only one upload job is running and it is handled properly by Android.
For viewing pending jobs we then could use LiveData and get notified about changes.
Or am I totally on the wrong side? (outside I have 36°C and inside 28°C…)

That's seems a very good solution, it would also solve all other synchronization problems when registering the notifiers after checking which file is uploading.
Any idea how much work would be to implement it? I'm also not keen on changing code when I know that much more is broken and at the end it remains only slightly less broken.

@tobiasKaminsky @ezaquarii
The current multithreading approach is so bad that just reading code looking for the right place to put the comment about synchronization makes me see other bugs:

com.owncloud.android.ui.adapter.UploadListAdapter#loadUploadItemsFromDb updates three groups of uploads without synchronization. If the status of an upload changes while the thread iterating is jumping from a group to the next, the upload could end up in more groups or in no group at all.

I am not going to add lock-based synchronization in that loop because the called function is doing a DB lookup, holding a lock while calling it is very dangerous and a recipe for deadlocks (even if at present time probably safe: everywhere else I took care not to hold the lock when doing anything more than trivial memory operations).

WDYT?

@ezaquarii
Copy link
Collaborator

WDYT?

Could you pls put this a as a comment too? Those are very valuable insights that we're going to revisit when ready for refactoring of this area.

@AndyScherzinger AndyScherzinger mentioned this pull request Aug 2, 2019
68 tasks
@fogninid fogninid force-pushed the fileuploadThreadSafety branch from a198f1e to b214c1c Compare August 2, 2019 12:07
@nextcloud-android-bot
Copy link
Collaborator

APK file: https://www.kaminsky.me/nc-dev/android-artifacts/10226.apk

qrcode

To test this change/fix you can simply download above APK file and install and test it in parallel to your existing Nextcloud app.

@nextcloud-android-bot
Copy link
Collaborator

Codacy

278

Lint

TypemasterPR
Warnings5959
Errors00

SpotBugs (new)

Warning TypeNumber
Bad practice Warnings24
Correctness Warnings69
Internationalization Warnings12
Malicious code vulnerability Warnings4
Multithreaded correctness Warnings9
Performance Warnings112
Security Warnings46
Dodgy code Warnings136
Total412

SpotBugs (master)

Warning TypeNumber
Bad practice Warnings24
Correctness Warnings69
Internationalization Warnings12
Malicious code vulnerability Warnings4
Multithreaded correctness Warnings9
Performance Warnings112
Security Warnings46
Dodgy code Warnings136
Total412

@codecov
Copy link

codecov bot commented Aug 2, 2019

Codecov Report

Merging #4244 into master will decrease coverage by 0.05%.
The diff coverage is 3.61%.

@@             Coverage Diff              @@
##             master    #4244      +/-   ##
============================================
- Coverage     16.03%   15.98%   -0.06%     
  Complexity        1        1              
============================================
  Files           340      340              
  Lines         31181    31206      +25     
  Branches       4433     4435       +2     
============================================
- Hits           5000     4988      -12     
- Misses        25318    25352      +34     
- Partials        863      866       +3
Impacted Files Coverage Δ Complexity Δ
...owncloud/android/ui/adapter/UploadListAdapter.java 0% <0%> (ø) 0 <0> (ø) ⬇️
.../owncloud/android/files/services/FileUploader.java 9.6% <4.16%> (-0.03%) 0 <0> (ø)
...va/com/owncloud/android/ui/SquareLinearLayout.java 0% <0%> (-50%) 0% <0%> (ø)
...in/java/com/owncloud/android/utils/ThemeUtils.java 50.85% <0%> (-1.71%) 0% <0%> (ø)
.../third_parties/daveKoeller/AlphanumComparator.java 80.95% <0%> (-1.2%) 0% <0%> (ø)
...owncloud/android/ui/adapter/OCFileListAdapter.java 30.52% <0%> (-0.85%) 0% <0%> (ø)
...ncloud/android/ui/fragment/OCFileListFragment.java 24.47% <0%> (-0.27%) 0% <0%> (ø)
...cloud/android/ui/activity/FileDisplayActivity.java 20.46% <0%> (-0.17%) 0% <0%> (ø)
...in/java/com/owncloud/android/datamodel/OCFile.java 64.67% <0%> (+1.37%) 0% <0%> (ø) ⬇️

1 similar comment
@codecov
Copy link

codecov bot commented Aug 2, 2019

Codecov Report

Merging #4244 into master will decrease coverage by 0.05%.
The diff coverage is 3.61%.

@@             Coverage Diff              @@
##             master    #4244      +/-   ##
============================================
- Coverage     16.03%   15.98%   -0.06%     
  Complexity        1        1              
============================================
  Files           340      340              
  Lines         31181    31206      +25     
  Branches       4433     4435       +2     
============================================
- Hits           5000     4988      -12     
- Misses        25318    25352      +34     
- Partials        863      866       +3
Impacted Files Coverage Δ Complexity Δ
...owncloud/android/ui/adapter/UploadListAdapter.java 0% <0%> (ø) 0 <0> (ø) ⬇️
.../owncloud/android/files/services/FileUploader.java 9.6% <4.16%> (-0.03%) 0 <0> (ø)
...va/com/owncloud/android/ui/SquareLinearLayout.java 0% <0%> (-50%) 0% <0%> (ø)
...in/java/com/owncloud/android/utils/ThemeUtils.java 50.85% <0%> (-1.71%) 0% <0%> (ø)
.../third_parties/daveKoeller/AlphanumComparator.java 80.95% <0%> (-1.2%) 0% <0%> (ø)
...owncloud/android/ui/adapter/OCFileListAdapter.java 30.52% <0%> (-0.85%) 0% <0%> (ø)
...ncloud/android/ui/fragment/OCFileListFragment.java 24.47% <0%> (-0.27%) 0% <0%> (ø)
...cloud/android/ui/activity/FileDisplayActivity.java 20.46% <0%> (-0.17%) 0% <0%> (ø)
...in/java/com/owncloud/android/datamodel/OCFile.java 64.67% <0%> (+1.37%) 0% <0%> (ø) ⬇️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants