ref(build): Run repo-level builds in parallel #5094

lobsterkatie · 2022-05-12T21:25:49Z

This parallelizes our repo-level build yarn scripts where possible, in order to make them run a bit faster. In the end, the time savings isn't enormous, but still worthwhile: in a series of 10 time trials on GHA, the old build averaged ~9 minutes, while the new one averaged a bit over 8 minutes. (The time savings isn't that dramatic because the current lack of parallelity at the repo level isn't the biggest driver of our slow build speed; that would be the lack of parallelity at the individual package level, a problem which will be tackled in future PRs. Doing this now will let us take full advantage of that future work.)

The over-arching goal here was to make as few tasks as possible have to wait on other tasks. Simply because of computing power, it isn't realistic to run all tasks, for the entire repo, simultaneously. But the closer we can get to maxing out the theoretical potential for infinite parallelization, the better off we'll be, because then the only constraint does become the build runner. In order to accomplish this maximum-possible parallelization, a few things had to be considered:

Is there any interdependency between packages which means that a particular task can't be done in all packages simultaneously?
Is there any interdependency between tasks within a package that means that they can't be done simultaneously?
How do we make sure that at both the repo and package level, yarn build:dev builds only what's needed for local development and testing and yarn build builds everything?
Is there a way to organize things such that it works for every package, even ones with non-standard build setups?

After investigation, it turned out that the key constraints were:

Types can't be built entirely in parallel across packages, because tsc checks that imports are being used correctly, which it can't do if types for the packages exporting those imports aren't yet themselves built.
Rollup and bundle builds can happen in parallel across packages, because each individual package's tasks are independent of the same tasks in other packages. Further, type-, rollup-, and bundle-building as overall process are also independent (so, for example, rollup builds don't have to wait on type builds). [Update: The last sentence turns out to be almost, but not quite, true. Bundle building does depend on a subset of the rollup build results. See fix(build): Ensure bundle builds have needed dependencies #5119. ]
Some packages have build tasks in addition to types, rollup, and bundles, and in some cases those tasks can't happen until after the rest of the build completes.
Some packages (angular and ember) have their own build commands, and don't have types, rollup, or bundle builds.

(This is starting to feel like taking the LSAT all over again...)

To solve these constraints, the build system now follows these principles:

Every build task, in every package, is now represented in its package in exactly one of four scripts: yarn build:types, yarn build:rollup, yarn build:bundle, and a new yarn build:extras. Tasks can be parts of other package-level scripts (types and rollup builds together forming each package's yarn build:dev, for example), but as long as every task is in one of the canonical four, running those canonical scripts in every package that has them means we're guaranteed not to miss anything.
Types are build using lerna's --stream option, now with no limit on concurrency, in order to parallelize where possible without breaking dependency ordering.
Types are built independently of the other three kinds of tasks, since everything else can be fully parallelized. This means not using package-level yarn build or yarn build:dev commands in the repo-level build, since they tie other build tasks to types build tasks.
To make things as easy as possible to reason about, not just types, but all four task kinds are therefore addressed separately by the repo-level build.
Angular and ember's build tasks are each aliased to a yarn build:extras script, so as not to be left out.
Because some "extras" tasks can't be done until types, rollup, and/or bundle tasks are done, make it so all "extras" tasks are held until the other three kinds have finished. This does push a few later than need be, but it's worth it in order to standardize the logic.
All of these principles are duplicated at the package level.

Or, for the visual learners among us (forgive the slightly janky, random-free-tool-on-the-internet diagram):

(Everything except the grey grouping is part of yarn build.)

Oh, and while I was in there, I got rid of yarn build:dev:filter. It doesn't fit the new system, and no one uses it (not even me, and I made it up).

github-actions · 2022-05-12T21:38:30Z

size-limit report 📦

Path	Size
@sentry/browser - ES5 CDN Bundle (gzipped + minified)	18.75 KB (-6.91% 🔽)
@sentry/browser - ES5 CDN Bundle (minified)	58.19 KB (-9.95% 🔽)
@sentry/browser - ES6 CDN Bundle (gzipped + minified)	17.49 KB (-7.25% 🔽)
@sentry/browser - ES6 CDN Bundle (minified)	52.43 KB (-9.56% 🔽)
@sentry/browser - Webpack (gzipped + minified)	19.33 KB (-16.84% 🔽)
@sentry/browser - Webpack (minified)	61.44 KB (-24.82% 🔽)
@sentry/react - Webpack (gzipped + minified)	19.35 KB (-16.88% 🔽)
@sentry/nextjs Client - Webpack (gzipped + minified)	42.81 KB (-10.92% 🔽)
@sentry/browser + @sentry/tracing - ES5 CDN Bundle (gzipped + minified)	24.41 KB (-6.4% 🔽)
@sentry/browser + @sentry/tracing - ES6 CDN Bundle (gzipped + minified)	22.96 KB (-6.24% 🔽)

Lms24

Now this is a great PR description! LGTM and even if it doesn't speed up building by a ton, having a bit more standardization in the build commands should make things more clear.

Apprectiated the diagram btw. This makes me think of something else:
Should we include this diagram and some more information about building in a README file or some other MD file which explains how we build our stuff? Might be a good thing to have when more people are gonna work on the repo in the future. WDYT?

lobsterkatie · 2022-05-13T13:24:43Z

Now this is a great PR description!

Thanks! 🙂

Should we include this diagram and some more information about building in a README file or some other MD file which explains how we build our stuff? Might be a good thing to have when more people are gonna work on the repo in the future. WDYT?

Yeah, I actually had a similar thought. I'll work on it.

In #5094, a change was made to parallelize our repo-level build commands as much as possible. In that PR, it was stated that `build:bundle` could run independent of, and therefore in parallel with, the types and rollup builds, and at the time that was true. When TS (through rollup) builds a bundle, it creates a giant AST out of the bundled package and all of its monorepo dependencies, based on the original source code, then transpiles the whole thing - no prework needed. But in #5111 we switched our ES6 bundles to use sucrase for transpilation (still through rollup), and that changed things. Sucrase (along with every other non-TS transpilation tool) only considers files one by one, not as part of a larger whole, and won't reach across package boundaries, even within the monorepo. As a result, rollup needs all dependencies to already exist in transpiled form, since sucrase doesn't touch them, which becomes a problem if both processes are happening at once. (_But how has CI even been passing since that second PR, then?_, you may ask. The answer is, sucrase is very fast, and lerna can only start so many things at once. It ends up being a race condition between sucrase finishing with the dependencies and lerna kicking off the bundle builds, and almost all the time, sucrase wins. And the other situations in which this is broken all involve using something other than the top-level `build` script to create bundles in a repo with no existing build artifacts, and CI just never does that.) So TL;DR, we need to have already transpiled a packages's monorepo dependencies before that package can be turned into a bundle. For `build:bundle` at both the repo and package level, and for `build` at the package level, this means that if they're not there, we have to build them. For `build` at the repo level, where transpilation of all packages does in fact already happen, we have two options: 1) Push bundle builds to happen alongside `build:extras`, after rollup builds have finished. 2) Mimic what happens when the race condition is successful, but in a way that isn't flaky. In other words, continue to run `build:bundle` in parallel with the types and npm package builds, but guarantee that the needed dependencies have finished building themselves before starting the bundle build. Of the two options, the first is certainly simpler, but it also forces the two longest parts of the build (bundle and types builds) to be sequential, which is the exact _opposite_ of what we want, given that the goal all along has been to make the builds noticeably faster. Choosing the second solution also gives us an excuse to add the extra few lines of code needed to fix the repo-level-build:bundle/package-level-build/package-level-build:bundle problem, which we otherwise probably wouldn't fix. This implements that second strategy, by polling at 5 second intervals for the existence of the needed files, for up to 60 seconds, before beginning the bundle builds. (In practice, it fairly reliably seems to take two retries, or ten seconds, before the bundle build can begin.) It also handles the other three situations above, by building the missing files when necessary. Finally, it fixes a type import to rely on the main types package, rather than a built version of it. This prevents our ES5 bundles (which still use TS for transpilation, in order to also take advantage of its ability to down-compile) from running to the same problem as the sucrase bundles are having.

This parallelizes our repo-level build yarn scripts where possible, in order to make them run a bit faster. In the end, the time savings isn't enormous, but still worthwhile: in a series of 10 time trials on GHA, the old build averaged ~9 minutes, while the new one averaged a bit over 8 minutes. (The time savings isn't that dramatic because the current lack of parallelity at the repo level isn't the biggest driver of our slow build speed; that would be the lack of parallelity at the individual package level, a problem which will be tackled in future PRs. Doing this now will let us take full advantage of that future work.) The over-arching goal here was to make as few tasks as possible have to wait on other tasks. Simply because of computing power, it isn't realistic to run _all_ tasks, for the entire repo, simultaneously. But the closer we can get to maxing out the theoretical potential for infinite parallelization, the better off we'll be, because then the only constraint does become the build runner. In order to accomplish this maximum-possible parallelization, a few things had to be considered: - Is there any interdependency between packages which means that a particular task can't be done in all packages simultaneously? - Is there any interdependency between tasks within a package that means that they can't be done simultaneously? - How do we make sure that at both the repo and package level, `yarn build:dev` builds only what's needed for local development and testing and `yarn build` builds everything? - Is there a way to organize things such that it works for every package, even ones with non-standard build setups? After investigation, it turned out that the key constraints were: - Types can't be built entirely in parallel across packages, because `tsc` checks that imports are being used correctly, which it can't do if types for the packages exporting those imports aren't yet themselves built. - Rollup and bundle builds can happen in parallel across packages, because each individual package's tasks are independent of the same tasks in other packages. Further, type-, rollup-, and bundle-building as overall process are also independent (so, for example, rollup builds don't have to wait on type builds). - Some packages have build tasks in addition to types, rollup, and bundles, and in some cases those tasks can't happen until after the rest of the build completes. - Some packages (angular and ember) have their own build commands, and don't have types, rollup, or bundle builds. To solve these constraints, the build system now follows these principles: - Every build task, in every package, is now represented in its package in exactly one of four scripts: `yarn build:types`, `yarn build:rollup`, `yarn build:bundle`, and a new `yarn build:extras`. Tasks can be parts of other package-level scripts (types and rollup builds together forming each package's `yarn build:dev`, for example), but as long as every task is in one of the canonical four, running those canonical scripts in every package that has them means we're guaranteed not to miss anything. - Types are build using lerna's `--stream` option, now with no limit on concurrency, in order to parallelize where possible without breaking dependency ordering. - Types are built independently of the other three kinds of tasks, since everything else _can_ be fully parallelized. This means not using package-level `yarn build` or `yarn build:dev` commands in the repo-level build, since they tie other build tasks to types build tasks. - To make things as easy as possible to reason about, not just types, but all four task kinds are therefore addressed separately by the repo-level build. - Angular and ember's build tasks are each aliased to a `yarn build:extras` script, so as not to be left out. - Because some "extras" tasks can't be done until types, rollup, and/or bundle tasks are done, make it so all "extras" tasks are held until the other three kinds have finished. This does push a few later than need be, but it's worth it in order to standardize the logic. - All of these principles are duplicated at the package level. For the visual learners among us, there is diagram illustrating this in the PR description. Oh, and while I was in there, I got rid of `yarn build:dev:filter`. It doesn't fit the new system, and no one uses it (not even me, and I made it up).

In #5094, a change was made to parallelize our repo-level build commands as much as possible. In that PR, it was stated that `build:bundle` could run independent of, and therefore in parallel with, the types and rollup builds, and at the time that was true. When TS (through rollup) builds a bundle, it creates a giant AST out of the bundled package and all of its monorepo dependencies, based on the original source code, then transpiles the whole thing - no prework needed. But in #5111 we switched our ES6 bundles to use sucrase for transpilation (still through rollup), and that changed things. Sucrase (along with every other non-TS transpilation tool) only considers files one by one, not as part of a larger whole, and won't reach across package boundaries, even within the monorepo. As a result, rollup needs all dependencies to already exist in transpiled form, since sucrase doesn't touch them, which becomes a problem if both processes are happening at once. (_But how has CI even been passing since that second PR, then?_, you may ask. The answer is, sucrase is very fast, and lerna can only start so many things at once. It ends up being a race condition between sucrase finishing with the dependencies and lerna kicking off the bundle builds, and almost all the time, sucrase wins. And the other situations in which this is broken all involve using something other than the top-level `build` script to create bundles in a repo with no existing build artifacts, and CI just never does that.) So TL;DR, we need to have already transpiled a packages's monorepo dependencies before that package can be turned into a bundle. For `build:bundle` at both the repo and package level, and for `build` at the package level, this means that if they're not there, we have to build them. For `build` at the repo level, where transpilation of all packages does in fact already happen, we have two options: 1) Push bundle builds to happen alongside `build:extras`, after rollup builds have finished. 2) Mimic what happens when the race condition is successful, but in a way that isn't flaky. In other words, continue to run `build:bundle` in parallel with the types and npm package builds, but guarantee that the needed dependencies have finished building themselves before starting the bundle build. Of the two options, the first is certainly simpler, but it also forces the two longest parts of the build (bundle and types builds) to be sequential, which is the exact _opposite_ of what we want, given that the goal all along has been to make the builds noticeably faster. Choosing the second solution also gives us an excuse to add the extra few lines of code needed to fix the repo-level-build:bundle/package-level-build/package-level-build:bundle problem, which we otherwise probably wouldn't fix. This implements that second strategy, by polling at 5 second intervals for the existence of the needed files, for up to 60 seconds, before beginning the bundle builds. (In practice, it fairly reliably seems to take two retries, or ten seconds, before the bundle build can begin.) It also handles the other three situations above, by building the missing files when necessary. Finally, it fixes a type import to rely on the main types package, rather than a built version of it. This prevents our ES5 bundles (which still use TS for transpilation, in order to also take advantage of its ability to down-compile) from running to the same problem as the sucrase bundles are having.

lobsterkatie added 5 commits May 12, 2022 11:56

don't build types for tracing bundle index file

649a04d

remove build:dev:filter

384eef5

parallelize top-level rollup and types builds

81e8599

add build:extras script to packages with extra build processes

0196e54

split up and parallelize top-level build and build:dev scripts

4970dae

Lms24 approved these changes May 13, 2022

View reviewed changes

lobsterkatie merged commit ce665e2 into 7.x May 13, 2022

lobsterkatie deleted the kmclb-run-builds-in-parallel branch May 13, 2022 13:28

AbhiPrasad added this to the 7.0.0 milestone May 13, 2022

lobsterkatie mentioned this pull request May 17, 2022

fix(build): Ensure bundle builds have needed dependencies #5119

Merged

AbhiPrasad mentioned this pull request Aug 2, 2022

Update @sentry/ember dependencies #5494

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ref(build): Run repo-level builds in parallel #5094

ref(build): Run repo-level builds in parallel #5094

Uh oh!

lobsterkatie commented May 12, 2022 •

edited

Loading

Uh oh!

github-actions bot commented May 12, 2022

Uh oh!

Lms24 left a comment

Uh oh!

lobsterkatie commented May 13, 2022

Uh oh!

Uh oh!

Uh oh!

ref(build): Run repo-level builds in parallel #5094

ref(build): Run repo-level builds in parallel #5094

Uh oh!

Conversation

lobsterkatie commented May 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented May 12, 2022

size-limit report 📦

Uh oh!

Lms24 left a comment

Choose a reason for hiding this comment

Uh oh!

lobsterkatie commented May 13, 2022

Uh oh!

Uh oh!

lobsterkatie commented May 12, 2022 •

edited

Loading