Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BuildChecks Dogfooding: Collect info from sample repos #10726

Open
Tracked by #10548
JanKrivanek opened this issue Sep 30, 2024 · 6 comments
Open
Tracked by #10548

BuildChecks Dogfooding: Collect info from sample repos #10726

JanKrivanek opened this issue Sep 30, 2024 · 6 comments
Assignees
Labels
Area: BuildCheck Cost:M Work that requires one engineer up to 2 weeks Priority:2 Work that is important, but not critical for the release triaged

Comments

@JanKrivanek
Copy link
Member

JanKrivanek commented Sep 30, 2024

Context

As part of #10548 we want to invest into dogfooding of buildchecks.
As a first step towards it - we need to proactively run and collect issues and perf impact on some sample repos.

Goals

  • Prepare a list of sample repos - ideally 4 bigger ones (e.g. msbuild, roslyn, runtime, sdk) and 3 average ones (e.g. templating, debugger-contracts not on .net9, crank is on .net8)
  • Run build on those on a choosed version - make sure those are clean of errors. Collect 'wall clock time' perf stats for builds
  • Run the same builds on same versions of the repos with buildcheck opted in (/check)
    • Collect and log the issues
    • Collect the perfstats
  • Evaluate the perf impact and the blocking issues
@JanKrivanek JanKrivanek added Cost:M Work that requires one engineer up to 2 weeks Area: BuildCheck labels Sep 30, 2024
@JanKrivanek JanKrivanek changed the title BuildChecks Dogfooding: Collect infor from sample repos BuildChecks Dogfooding: Collect info from sample repos Sep 30, 2024
@AR-May AR-May added Priority:2 Work that is important, but not critical for the release triaged labels Oct 1, 2024
@YuliiaKovalova YuliiaKovalova self-assigned this Jan 6, 2025
@YuliiaKovalova
Copy link
Member

YuliiaKovalova commented Jan 22, 2025

Build Check Dogfooding Results

Related PRs

Summary of Findings

  • Total repositories analyzed: 6
  • Total issues detected: ~442+ issues (it's a minimum number, because for some of the repos e.g. rolsyn/runtime after enabling buildCheck were unable to finish build)
  • Most common error types: BC0201 and BC0202
  • Common infrastructure issue: "ContextID XX should have been in the ID-to-project file mapping" Enabling BuildCheck causes error MSB4166: Child node "1" exited prematurely #11326
  • Some issues come from arcade (e.g. 11 BC0201 come from Tools.proj)

Detailed Analysis by Repository

Template Engine

  • Total issues: 80
  • Breakdown:
    • BC0201: 22 issues
    • BC0202: 37 issues
    • BC0102: 20 issues
    • BC0101: 1 issue
  • Performance impact:
    • Without build check: ~1h 58min
    • With build check: ~1h 49min
      No visible perf degradation because CI runs with /bl enabled.

Extensions

  • Total issues: 50
  • Breakdown:
    • BC0202: 40 issues
    • BC0201: 10 issues
  • Notable: Infrastructure errors regarding ContextID mapping
  • Performance impact:
    • Without build check: ~24 min
    • With build check: ~22min
      No visible perf degradation because CI runs with /bl enabled.

SDK

  • Total issues: 64
  • Breakdown:
    • BC0201: 27 issues
    • BC0202: 15 issues
    • BC0102: 20 issues
    • BC0101: 2 issues
  • Infrastructure challenges: ContextID mapping issues present
    It's hard to evaluate the execution time due to the msbuild issue.

Roslyn

Runtime

  • Total issues: 110+ (exact count unavailable due to build issues)
  • Notable: Widespread ContextID xxx mapping issues (in fact, these are everywhere)
    It's hard to evaluate the execution time due to the msbuild issue.

MSBuild

  • Total issues: 71
  • Breakdown:
    • BC0201: 20 issues
    • BC0202: 20 issues
    • BC0102: 20 issues
    • BC0101: 1 issue
    • BC0105: 10 issues
  • Performance impact:
    • Without build check: ~2h 24min
    • With build check: ~2h 16min
    • Similar to Template Engine, showing slightly improved execution time with build check enabled

Common Patterns and Issues

Error Types Distribution

  1. BC0201: Most prevalent across all repositories
  2. BC0202: Second most common error type
  3. BC0102: Consistently present across multiple repositories
  4. BC0101: Less frequent but present in most repositories

Performance Impact

  • Contrary to expectations, repositories with complete timing data (Template Engine and MSBuild) showed slight improvements in build times with build check enabled
  • Template Engine: ~9 minute reduction (1h 58min → 1h 49min)
  • MSBuild: ~8 minute reduction (2h 24min → 2h 16min)
    No visible perf degradation because CI runs with /bl enabled.

Infrastructure Challenges

  • Consistent ContextID mapping issues across multiple repositories
  • Larger size repositories experiencing build stability issues with the feature enabled
  • Need to investigate for the false positives.

Impact Assessment

Enabling build check has successfully identified numerous potential issues across all repositories. While this has led to an increase in reported problems and longer build times, it represents an important step toward improving build quality and catching issues earlier in the development process.

Future actions

I would definitely suggest to tackle the problems for the middle-size repos: arcade, extensions, Template engine and based on the results decide how to proceed with the larger ones.

@YuliiaKovalova
Copy link
Member

cc: @baronfel

@YuliiaKovalova
Copy link
Member

The binlog examples used for analysis.
BuildCheckLogsExamples.zip

@SimaTian
Copy link
Member

Could you elaborate on how you measured the performance impact please?

I'm curious about how the performance gain came to be as it is rather counter-intuitive:

  • I understand that the BuildChecks are piggy-backing on the Binlog as they use the same events. Is that accurate?
    • if yes, then I would expect the overall speed to be ~same.
    • now the difference in speed isn't too large, but it is consistently better for the Buildcheck run.
  • was there some sort of refactor/overall improvement done as a part of buildchecks?
    • if yes, why isn't it speeding up the Binlog only build? Can we maybe achieve that?
  • across how many runs was the performance measured?
    • if it was only one run, I'd chalk it up to variance and be perfectly happy
    • However you're getting this result consistently across multiple repositories which makes this argument harder to make.
  • was there any delay between the buildcheck build and non-build check build?
    • I would assume the pipeline starts completely clean. Is that always the case?
  • what was the order of runs?

If there is a performance upgrade, I'm happy. Even then I would like to know why is that happening, if for nothing else than as a learning opportunity.

@YuliiaKovalova
Copy link
Member

@SimaTian , the numbers were taken from CI runs.

  • Yes, BuildCheck uses the same events as binlogs.
  • BuildCheck was done on the top of existing event models. Can't recall having any optimizations applied (do you @JanKrivanek?)
  • The idea was to prove that for the repos where bin logs collection was already enabled, adding /check switch won't be much noticeable. I took numbers from 2-3 runs.
  • I believe the pipeline starts completely clean.
    It's not a perf upgrade, the numbers sometimes were better because the build was interrupted due to build checks reporting the errors.
    It need to be remeasured again once blocking issue is addressed: Enabling BuildCheck causes error MSB4166: Child node "1" exited prematurely #11326

@SimaTian
Copy link
Member

SimaTian commented Jan 27, 2025

Thank you for the clarification. I was confused by the

Similar to Template Engine, showing slightly improved execution time with build check enabled

statement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: BuildCheck Cost:M Work that requires one engineer up to 2 weeks Priority:2 Work that is important, but not critical for the release triaged
Projects
None yet
Development

No branches or pull requests

4 participants