Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RESEARCH] Performance of Foam in large projects #1375

Open
pderaaij opened this issue Jul 24, 2024 · 29 comments
Open

[RESEARCH] Performance of Foam in large projects #1375

pderaaij opened this issue Jul 24, 2024 · 29 comments

Comments

@pderaaij
Copy link
Collaborator

Describe the bug

Various users reported performance issues with large notes or projects with many notes. This issue serves as a collection of those reports. It acts as the documentation of ongoing research on performance.

Small Reproducible Example

No response

Steps to Reproduce the Bug or Issue

..

Expected behavior

We want to optimise the performance of Foam, even in large projects.

Screenshots or Videos

No response

Operating System Version

All OS

Visual Studio Code Version

Latest at least

Additional context

No response

@pderaaij
Copy link
Collaborator Author

For now, I am doing some tests and research with https://github.com/github/docs. A relatively large project with many markdown files that I can use for research.

@pderaaij
Copy link
Collaborator Author

Just came across https://github.com/rcvd/interconnected-markdown/tree/main/Markdown. This contains not only many notes, but also highly linked. Great for researching the performance of Foam

@pderaaij
Copy link
Collaborator Author

Some initial investigation points to the function listByIdentifier in workspace.ts This function is used in both find as in wikilink-diagnostics.ts.

The function is defined as:

  public listByIdentifier(identifier: string): Resource[] {
    const needle = normalize('/' + identifier);
    const mdNeedle =
      getExtension(needle) !== this.defaultExtension
        ? needle + this.defaultExtension
        : undefined;
    const resources: Resource[] = [];
    for (const key of this._resources.keys()) {
      if (key.endsWith(mdNeedle) || key.endsWith(needle)) {
        resources.push(this._resources.get(normalize(key)));
      }
    }
    return resources.sort(Resource.sortByPath);
  }

For my test repo thethis._resources is a Map of 10,000 entries. This function is called for every processing of a wikilink. Both on boot and graph update. I am thinking that the for loop is too inefficient for large projects. Whether it is many notes or many links.

I will do some experiments with optimising the for loop in this area and see if that boosts performance.

@DrakeWhu
Copy link

I am interested in this. I have a graph of aproximately 3k notes and 8k links. I also have some Python scripts I am using to make community detection over the graph. It results in around 50~100 or more communities. On the visualization, everything shows no problem.

The problem arises if I want to explore only one community or a couple of them. The community of each note is saved at it's type so I can take out communities in the FOAM viz. If I only want to show one community of around 50 notes, it shows but then the physics break, the force directed approach can't handle it for some reason. If I try to select two communities, the moment I select the second one, the physics break and the links dissapear. I don't know what the reason might be but maybe the nodes not shown are still loaded and the physics engine tries to calculate anyways? whatever it is, the most types existing on the graph, the worse the performance gets.

I've been thinking several solutions to this for some time and I've though a couple:

  • There could be a command where you just plot a set of types.
  • A force-directed approach is not ideal for big networks. Asking the webview to calculate approximately 1600 forces each frame and do the according displacements is overkill. We could try to use a better thought approach as clustering far away nodes or something alike. Even getting rid of the physics all together might be a solution but they are always nice to have. Something like a statistical approach would make things smoother

I can share some plots I've made but using python. Not that we should change the viz, but the physics engine of the plots I mention is less realistic and more aimed towards aesthetics not physical realism. I know that D3.js has a lot of options for visualization and that FOAM uses force-graph, which in turn uses D3, but I've never personally used it.

Anyway I will gladly help to research this.

@pderaaij
Copy link
Collaborator Author

pderaaij commented Sep 4, 2024

I've been looking at the initial workspace loading time. At first sight, not much to be gained in this area. Most time is spent in reading the files from the datastore. Perhaps the parser could be made more efficient in the end, but don't see an opportunity here in short term.

@theAkito
Copy link

theAkito commented Nov 3, 2024

Second this.

Long story short, it is so extremely terrible, I have no choice but to change to Dendron and see, if it works with that framework, at least.

Whenever I want to write any document, the editor has too extreme lags. This extension makes it unusable.
The second Foam is disabled, everything is as smooth as gliding down butter.

@theAkito
Copy link

theAkito commented Nov 6, 2024

To make this "research" succeed, one has to strip the whole project from all advanced features. I would personally start with only letting it generate files from templates. Then, proceed with enabling single features inside the same huge repository of files and directories. Then, once a specific feature is added and performance starts to degrade, we know the culprit. Maybe it's also due to a set of features, as a whole.

@riccardoferretti
Copy link
Collaborator

@theAkito - The work that @pderaaij has done to improve performance has just been released (0.26.2), would love to hear how things are for you now

@theAkito
Copy link

theAkito commented Nov 6, 2024

@riccardoferretti

Nice, good timing. Yesterday, I was trying to figure out how to quickly build the extension locally with the most recent commits, but then my limited time ran out.

I will check out the update today!

@theAkito
Copy link

theAkito commented Nov 6, 2024

@riccardoferretti

I just used it in production and it is not unbearably slow anymore, it is only very slow now. Lot of lags still continue to bother the author, however, it is manageable to some degree.

@pderaaij
Copy link
Collaborator Author

pderaaij commented Nov 6, 2024

Could you elaborate which actions are slow? Would help me to look to dive into specifics. Is the project publicly available? Perhaps I can open it on my machine and do some research.

@theAkito
Copy link

theAkito commented Nov 6, 2024

Could you elaborate which actions are slow?

Typing is super slow. Typing anything, anywhere in the project. It does not matter.
It gets especially extremely slow, when pressing Enter for a newline or pressing Backspace for deleting characters. Feels like I'm typing on nano through an SSH tunnel, connected via dial-up modem.

My gut tells me, that it constantly scans all text for links/highlights or whatever feature needs to scan all text.

I'm additionally 95% sure, that it gets worse with very large markdown files, again pointing at the scanning idea.
It's a long time ago, but I had some notes, where I copy pasted a lot of script dumps into and these were particularly slow. Hence, I stopped doing it back then, as it it became unbearable quickly.

That said, I just wish, there were a lot of feature flags. For me, the most important feature is to create notes from templates.
I never use the graph for seeing linked notes, ever, as its performance was always abysmally atrocious, especially when back then you weren't able to ignore specific folder.
I never use the right sidebar for selecting chapters.
I set up tags, but do not actively use them.
I rarely link notes between each other.
However, I really need the templating feature. It's the big selling point for me.

So, I'd be already happy, when I could just strip down the extension to this basic feature, of allowing me to generate new notes from a template. With only this feature enabled, there shouldn't be any way to slow down anything, as it's only a one-shot feature, which never requires any kind of consistent or frequent document scanning.

I would go as far as saying, that a stripped down extension like that could easily be published as a separate extension.
Yesterday, I have been researching Visual Studio Code extensions, which provide just this templating feature and found exactly zero results. Any result that came close to it, aren't maintained, anymore.

Is the project publicly available? Perhaps I can open it on my machine and do some research.

It's my private knowledge base. You would look into my brain, if you would see it.

However, I'm pretty sure, if you continue to test with those huge projects you already had mentioned, you should be able to roughly replicate what I'm going through.

Here is some further data, in case it helps.

  • I have a big PC. Not the freshest one, but 64GB DDR4 high speed RAM with a pretty nice i7 CPU from just over 2 years ago should be enough for some Visual Studio Code note taking extension.
  • I was experimenting with GitDoc a while ago, so the project has almost 4000 commits. Not using it anymore, though, so not the culprit, right now.
  • I have a directory hierarchy in my project. It's not super deep or anything, but I imagine it is probably deeper than what most people do. I guess, 3 levels of directory depth should be about the average. If I count in the top folder, where all my relevant notes are in, you can count it as 5 levels of directory depth on average, roughly.
  • For troubleshooting this madness, I had turned off all editor suggestions, which I thought were the culprit for document scanning. Did not help, performance remained the same. Only disabling the Foam extension helped.
  • I have plenty of extensions, but none of them are document related. They are all related to technologies like Docker, Terraform, Ansible, etc. So, at most, related to YAML documents.
  • My project is based on the official Foam template for Foam projects.

If you have any further questions, that would help you fix this performance hell, I would be glad to be of assistance.

@theAkito
Copy link

theAkito commented Nov 6, 2024

Typing is super slow. Typing anything, anywhere in the project. It does not matter.

Let me elaborate a tiny bit on what "slow" means.

It's like packet loss.
You type, type, nothing is shown, then 5(!!!) seconds later, all the typed text suddenly appears.
It feels like, you have 900ms ping in Counter Strike or whatever.
This does not happen rarely, or sometimes. It always happens, constantly & reproducibly.
As already mentioned multiple times, disabling the Foam extension fixes all these issues within the blink of an eye.

@pderaaij
Copy link
Collaborator Author

pderaaij commented Nov 6, 2024

Thanks for the info. I'll use it to have a look at things later this week. To be honest, I am not sure that it is some continuous scanning process. Especially as Foam builds the graph on load and after that it uses file watchers to monitor changes. But, I might be missing something here.

Just wanted to check if you see anything strange in CPU usage? For example, the symptoms mentioned in #1161.

@riccardoferretti
Copy link
Collaborator

riccardoferretti commented Nov 7, 2024

I just used it in production and it is not unbearably slow anymore, it is only very slow now.

@theAkito I will take that as a win for now :) glad to hear that @pderaaij 's work made quite an improvement.

But you are right that there is also some live parsing we do on the document itself, see document-decorator.ts, hover-provider.ts, navigation-provider.ts and wikilink-diagnostic.ts.
Currently each one of these processes is parsing the file independently; this is inefficient, and we could optimize the process by, in some way, sharing this process.
I haven't thought about this, nor I know how much it would improve the situation (I guess best case scenario is 75%, but there are lots of caveats there), but I would expect the improvement to be quite noticeable for large documents.

Regarding your point on feature flags, I understand what you mean, but I am not planning to implement that just yet.
I see how it could be a good workaround, but most of the computation is necessary for Foam regardless, so I believe the performance improvements we are discussing here are the right approach to make things better at this stage.
I would prefer to disable long processes for large files rather than turning off the feature altogether. I want things to work out of the box, I don't want users to tinker just to get Foam to work with their repo. I am not idealistic about it, but this is my current thinking.

Thanks for sharing your thoughts and pointing things out to us!

@pderaaij
Copy link
Collaborator Author

Diving into the issue brought me to WikilinkCompletionProvider.provideCompletionItems. In this method, we list all resources of the workspace and iterate over each resource.

const resources = this.ws.list().map(resource => {
      const resourceIsDocument =
        ['attachment', 'image'].indexOf(resource.type) === -1;

      const identifier = this.ws.getIdentifier(resource.uri);

This is not very performant in large workspaces and that causes the delay. It seems vscode simply hangs on this function and stalls the editor.

Will tinker about a solution here..

@theAkito
Copy link

Small update report incoming.
On average, the performance has improved noticeably following the most recent update.
I do not have to necessarily disable the extension, anymore.

Keep going and please reduce more overhead in whatever the extension does!

@theAkito
Copy link

Diving into the issue brought me to WikilinkCompletionProvider.provideCompletionItems. In this method, we list all resources of the workspace and iterate over each resource.

This sounds like what I initially expected to be the culprit. My initial debugging step was to disable all completion methods, as it seemed like it hung the most and the hardest, whenever it tried to give me a 100 useless suggestions, I neither needed nor wanted in that particular case.

However, after disabling all auto-completions known to me, it did not help. Maybe this type of completion isn't even configurable via settings...

@pderaaij
Copy link
Collaborator Author

Could you do a little check for me? Just open a fresh vscode session. After Foam is loaded, open a note and hover over a wikilink. I would expect this would show a tooltip quickly.

@theAkito
Copy link

we list all resources of the workspace and iterate over each resource.

One workaround for this would be to implement some type of cache, so it iterates only on certain occasions, which should be configurable via settings.
For example, it iterates at least on project startup, when launching Visual Studio Code.
Then, it may be configured to index every hour, or whatever interval the user chooses.
Or, its schedule can be disabled altogether and the user has to manually run the Visual Studio Code command "Foam: Re-index workspace" or something like that.
Of which the latter is the one, I would personally prefer, because it is again such a rarely used functionality, I do not even need to have it run every 8 hours. Maybe on project launch, at most.

@theAkito
Copy link

I would expect this would show a tooltip quickly.

Yes, although I would debate the term quickly in this context. ;)

@pderaaij
Copy link
Collaborator Author

Thanks for testing. It helps me to validate my hypothesis.

My current hypothesis is that the problem is the collection of resources in WikilinkCompletionProvider.provideCompletionItems. We are never cancelling this operation, for example, after the user types the next character. This overflows the process. Next to that, it seems we are not filtering the resources based on the present input.

I want to see if we can improve this.

@pderaaij
Copy link
Collaborator Author

@theAkito I am a bit on a limp here, but I am quite certain the work of #1411 will improve the performance in your situation. Please let us know after the release of the PR. I am curious!

@theAkito
Copy link

theAkito commented Nov 14, 2024

@theAkito I am a bit on a limp here, but I am quite certain the work of #1411 will improve the performance in your situation. Please let us know after the release of the PR. I am curious!

I just used it in production twice, today.

First, a document, which started at 79 lines and grew to 510 lines @ 20535 characters over 3 hours.

Second, a document which grew from ~30 mostly empty lines to 82 lines @ 3068 characters.

Both times, the lags were much worse, right from the beginning. At some point, I wanted to disable the Foam extension to work around the problem, as I needed quick response, however I did not manage to find the extension within the few seconds I had time to do it.

I cannot explain, why the performance got worse. The only thing I can think of, is that, as far as I remember, Visual Studio Code was already on and idling for about 10 hours, before I started to write those documents.
Does not seem related, but I have no other hint or remotely reasonable explanation for why the performance degraded with the new version, suddenly, instead of at least staying the same.

For next time, I can specifically make sure to restart Visual Studio Code, before using it in production, just to make sure.

Maybe, I will find a way to test again soon, however I won't use it in production until next week, so the real life test will have to wait.

The types of lags were very familiar. Pressing Enter and Backspace were especially lagging heavily. Sometimes, I typed a word like "word", but it typed it as "wrod" or something, because some characters are apparently delayed in such laggy situations.

That said, I have only minimal word completion enabled. I can only tab-complete Markdown snippets. No completion for files or anything requiring to traverse through any hierarchy are enabled.

@theAkito
Copy link

Used it in production with a fresh Visual Studio Code instance, for only ~15 minutes. Performance seemed, as it was before this patch, however the timespan to test was too short, to give a clear view on the situation.
Still, it seems like a fresh instance works better, right away.

@pderaaij
Copy link
Collaborator Author

After some fiddling, it feels the problem concerns around document-decorator.onDidChangeTextDocument. In this event listener we debounce the document decoration. The decorator requires parsing of the document, and that will happen quite a lot. I am guessing this is problematic with larger documents.

I am fiddling around a bit with the settings of _debounce. That helps in my test case, but I am not too certain about the mechanics of _debounce to understand if this makes sense. This is what I have:

const debouncedUpdateDecorations = debounce(
    immediatelyUpdateDecorations,
    500,
    { leading: true, maxWait: 500 }
  );

I need to do some follow-up research, but perhaps @riccardoferretti has ideas or thoughts.

@pderaaij
Copy link
Collaborator Author

Interestingly, I was able to recreate the lagging problems in the 10000s repository by disabling the spellright extension. As if that extension throttles some timings or executions. @theAkito perhaps as a little experiment you could enable the spellright extension and see if that changes performance. Just to see if it has an effect.

@riccardoferretti
Copy link
Collaborator

I need to do some follow-up research, but perhaps @riccardoferretti has ideas or thoughts.

mmmm.. the debouncing should help with perf, unless there is some weird bug in its setting.
If it turns out that the problem lies here, we might want to look at point solutions (e.g. for bigger documents we either increase the debouncing or turn off the feature altogether). Before going to such extremes I would want to make sure that's the case.

Tbh I have also considered using a faster parser, I came across this in the past:
https://github.com/rsms/markdown-wasm

Not suggesting we go there now, but given the task I think it's worth sharing the info

@theAkito
Copy link

perhaps as a little experiment you could enable the spellright extension and see if that changes performance. Just to see if it has an effect.

So far, I haven't noticed any effect from installing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants